this post was submitted on 18 Nov 2025

336 points (98.0% liked)

Fediverse memes

2244 readers

93 users here now

Memes about the Fediverse.

Rules

General

Be respectful
Post on topic
No bigotry or hate speech
Memes should not be personal attacks towards other users

Specific

We are not YPTB. If you have a problem with the way an instance or community is run, then take it up over at !yepowertrippinbastards@lemmy.dbzer0.com.

Addendum: Yes we know that you think ml/hexbear/grad are tankies and or .world are a bunch of liberals but it gets old quickly. Try and come up with new material.

Elsewhere in the Fediverse

Other relevant communities:

founded 1 year ago

MODERATORS

BoozeOrWater@feddit.uk

flamingos@feddit.uk

Emperor@feddit.uk

wingsfortheirsmiles@feddit.uk

336

This should have end up differently (media.piefed.social)

submitted 2 days ago by Kierunkowy74@piefed.social to c/fedimemes@feddit.uk

37 comments fedilink hide all child comments

top 37 comments

sorted by: hot top controversial new old

[–] Dave@lemmy.nz 55 points 2 days ago (5 children)

Running an instance without cloudflare in front is hard work, because AI scrapers bring it to it's knees. It's a never ending battle to block them even with Cloudflare, at least Cloudflare can help reduce the load, and even the free version comes with many tools to identify and block problematic bots.

Though if you turn on bot blocking you break federation, so you have to be a lot more refined in your security rules.

[–] oftheair@lemmy.blahaj.zone 2 points 1 day ago (1 children)

because AI scrapers bring it to it’s knees

There are three (at least) piece of web software to protect from AI Scrapers currently, it should be more than possible without Cloudflare.

[–] Dave@lemmy.nz 4 points 1 day ago (1 children)

It's not even possible to do a good job of it with Cloudflare. What are the three you are referring to? The most commonly known one is Anubis, which Codeberg found AI bots had learnt to solve them.

[–] oftheair@lemmy.blahaj.zone 3 points 1 day ago* (last edited 1 day ago) (1 children)

Okay, seems there are only two as it seems nepenthes is no longer developed.

[–] Dave@lemmy.nz 3 points 1 day ago (1 children)

Yeah so anubis is the bot blocking one, already breached by bots.

Iocaine is an LLM maze and poisoner, intended to trap a bot but your site still needs the resources to serve all the requests, and it's not clear what happens when a user is accidentally identified as a bot.

[–] oftheair@lemmy.blahaj.zone 2 points 1 day ago

Ah, okay.

Thanks for the info!

[–] jelloeater85@lemmy.world 11 points 2 days ago

Truth.

[–] Redacted@lemmy.zip 5 points 2 days ago (1 children)

Why does turning off bots turn off federation?

[–] Dave@lemmy.nz 12 points 2 days ago* (last edited 2 days ago) (3 children)

Cloudflare's bot detection triggers the blocking because federation looks a lot like a bot (well, it is a bot).

For example, Lemmy.world will send my instance hundreds of thousands if not millions of requests a day, in a near steady stream. It's telling my instance about every post, comment, or vote. AI scrapers send hundreds of thousands of requests or millions in a near steady stream each day.

For all intents and purposes, federation is bot traffic and looks just like it. Typically I block by identifying high traffic ASNs (a group of IPs run by the same entity, because blackhat AI scrapers use many IPs) and showing a cloudflare challenge (which will typically have a 0% pass rate). If it's from 1IP then it's probably a federated instance, but I typically see many IPs from the same area spread with an even spread of requests.

I also try to exclude federation/API endpoints, which can help stop false positives as scrapers are generally loading the web page.

This is something Lemmy (and PieFed, Mbin) admins try to help each other with strategies for because one day a bot will find you and suddenly your instance is down because they are hammering you too hard.

I bet if you are in China, Brazil, Singapore, Argentina, etc then you will see a lot of blocked content on Lemmy, as this is often where the bot traffic comes from (Google, Facebook, OpenAI, Amazon, etc will typically respect the robots.txt so US traffic is less of an issue).

[–] rekabis@lemmy.ca 2 points 1 day ago (1 children)

Lemmy.world will send my instance hundreds of thousands if not millions of requests a day, in a near steady stream. It's telling my instance about every post, comment, or vote.

And yet, federation means that each instance should know all the other domain names, yes? So do daily DNS lookups of all IP addresses associated with federation and auto-whitelist them.

Sure, if you have to then configure cloudflare with these IPs, it’ll require an API to do so automatically.

But otherwise if you are running some sort of throttling protection on the actual box or VM the instance is sitting on, it should be rather trivial to update it directly, especially if said throttling software is doing Linux correctly and drawing its whitelist from a flat file.

[–] Dave@lemmy.nz 1 points 1 day ago (1 children)

New instances (and not just Lemmy instances, but Mastodon and other fediverse instances) are coming online all the time, so you need a way to let them through to start the federation process. There are thousands, so it needs to be automatic, you can't require a new instance sends whitelisting requests to ever server one of their users might want to interact with (instances aren't linked unless a local user subscribes to something on a remote instance).

Given the AI bots seem to just be indiscriminately scraping web pages, I excluded API endpoints from blocking anyway. Another admin showed me a nice Cloudflare rule to do this, though media can still be a problem due to how it's individual users on other instances that are loading it so it's hard to block scrapers without blocking users, which is another way Cloudflare helps (static media files are easily cached by their CDN).

[–] rekabis@lemmy.ca 2 points 1 day ago* (last edited 1 day ago) (1 children)

you need a way to let them through to start the federation process.

This isn’t via an API endpoint explicitly for that purpose that bots would normally not utilize?

And why not have a process by which admins from a new instance poke the admins of another instance - any other instance, so long as it’s already a part of the network - to do an initial manual whitelist that could cascade through the entire system?

Then there should be ways that the software itself can auth with other instances of itself, via a common encryption protocol. While this would only work with like software, the key point being that only a toehold is needed to start propagating.

The point being, there are options. Some of them quite simple.

[–] Dave@lemmy.nz 1 points 1 day ago

Realistically, federation is not the main concern. You can leave all your API endpoints open to bots and not have a problem because they are loading the web app. Just block the web app for suspicious traffic.

ActivityPub already uses authentication to some extent with other instances, it's the first contact where you have to have trust.

My main concern is still that media is loaded directly from users in most cases, the APIs are not a problem right now as the bots aren't specifically targeting Lemmy. There are ways to address this but Lemmy (and other threadiverse services) don't have full time dev teams, they work on what they can or want to work on given the very low hourly rate.

[–] Cooper8@feddit.online 5 points 2 days ago (1 children)

The thing that confuses me is, wouldn't a whitelist for federated instances and request frequency throttling at the account level solve this issue?

I suppose this would require that the client not have a public front end that keeps full navigation functionality, but for a smaller instance that seems like an easy sacrifice to make in exchange for stability.

"But then how will new instances get federated?" maybe they have to actually talk to the admins of other instances to get vouched in to the whitelist. Just because the network is distributed doesnt mean it needs to be fully inclusive by default, and in fact it explicitly isn't.

I'm assuming I'm missing something super basic that makes all this not enough, bots spoofing the requests with the credentials of a whitelisted instance maybe?

Seems like maybe the instances should have encrypted keys that handshake each other with batch requests.

Am I on to something or just wildly gesticulating?

[–] Dave@lemmy.nz 5 points 2 days ago (1 children)

There are thousands of instances and it's not really about admins. If a Mastodon user wants to go and follow a Lemmy community, they can. They shouldn't need to ask their admin to contact the admin of the Lemmy instance to be allowed to.

However, there is something called Fediseer which allows a chain of trust. Some instances guarantee other instances who then guarantee others down a chain. If an instance turns out bad then their guarantor can revoke it and any instances lower in the chain (that the spammy instance guarantees) also lose their trusted status. It doesn't share IPs to my knowledge though, and outbound IPs are different than the inbound one on the domain if there is a CDN like Cloudflare in the mix. The intent is actually to identify and block instances set up to spam (or other reasons to defederate).

I think the other part missing is that it's not just instances. If you upload an image to Lemmy.world and then someone on feddit.online views it, the feddit.online user's browser loads that image directly from Lemmy.world. That means if you block any IP that's not an instance, people won't be able to see content uploaded by your users. So you have to be able to tell what is a Brazil-hosted AI bot and what's a Brazilian user viewing a meme your user uploaded.

There are of course different parts that you can or can't block which is basically the idea, working out which endpoints can be blocked and which will break things for genuine users. With static images they can be basically ignored because Cloudflare will cache it, but having thousands of post or feed loads in a hurry can bring down an instance.

[–] Cooper8@feddit.online 2 points 2 days ago* (last edited 1 day ago) (1 children)

Fediseer seems like a good solution, essentially a whitelist vouch system with touching at second hand.

Regarding the media hosting, again it seems like something that could rely on a method of identifying the user request directly with their user account before responding to the request. Cookies could be an option for this, though they are falling out of favor. Alternately, and more securely, it could be a cryptographic handshake where the user's home instance and the instance hosting the post generate a public key using their two private keys for the user, and the user provides the public key when making pull requests from the federated instance. The keys could be batch generated when an instance first federates content with another and then assigned to user accounts the first time the user makes a pull request through a link from their home instance to the federated instance.

Secure Scuttlebutt Protocol already deved the encryption methodology that could be cross applied for a lot of this: https://ssbc.github.io/scuttlebutt-protocol-guide/ though I am of course not suggesting SSP be adopted whole cloth, and there are a bunch of other OS projects with encryption that could be used. This is just the one that comes to mind.

(edit: also I am in favor of finding methodologies that work whether CloudFlare is used by the instance or not, obviously CloudFlare has advantages but as we have seen also is a vulnerability of the network.)

[–] Dave@lemmy.nz 2 points 1 day ago (1 children)

Regarding the media hosting, again it seems like something that could rely on a method of identifying the user request directly with their user account before responding to the request.

Yeah, so far it works to just check for a JWT in the cookie (regardless of what it is) to allow logged in users to bypass the rules. This works on Lemmy because the bots aren't specifically targetting Lemmy so they don't try to fake this (although if there were, just make an instance and our instances will send you all the data lol).

Alternately, and more securely, it could be a cryptographic handshake where the user’s home instance and the instance hosting the post generate a public key using their two private keys for the user, and the user provides the public key when making pull requests from the federated instance.

This is already basically how ActivityPub works for communication between instances. But the activities are one thing, it's the page loads that are the killer because of the database queries needed to compile a unique, sorted home page of subscriptions. You could block logged out users but that impacts many lurkers.

For media, that's difficult as media is often being loaded from a remote instance that doesn't know who you are, along with the problem that the media provider is not technically part of Lemmy (it's a separate service called pict-rs) so doesn't know if you're logged in. I'm not sure how that worked on PieFed or Mbin, but regardless you might not be logged in at all, and you should still be allowed to browse content.

Lemmy has a proxy option where the instance can fetch content from the other servers to provide to the user, which does get around this issue for logged out users. But the proxy caches the media, and when this happens you are now the host of whatever media is in any post that made it's way to your instance, along with all the legal risks that involves.

(edit: also I am in favor of finding methodologies that work whether CloudFlare is used by the instance or not, obviously CloudFlare has advantages but as we have seen also is a vulnerability of the network.)

All of the things being discussed around mitigations in Cloudflare are also possible to do without Cloudflare, but it just means setting it all up yourself. I'll just wait for someone smarter than me to build a tool I can host myself that does all this automatically, then I'll consider it 😅

[–] Cooper8@feddit.online 2 points 1 day ago (1 children)

"you could block logged out users but that would impact many lurkers"

"regardless you might not be logged in at all, you should still be allowed to browse content"

Fundamentally, what I'm suggesting is a fork in the road. Either an instance admin can set up to eliminate scrapers by making the instance private to only registered users,

or they can maintain their instance as public and deal with more arcane methods to attempt to eliminate scraping.

The issue is that if the infrastructure isn't in place for the instance operator to decide to make their service private, then everyone is opted in to the Scrapers vs Countermeasures war with no alternative.

Privacy and encryption just work, it seems like not building the infrastructure to enable the network to function with them in place is a mistake.

To me, and to many users, what we want is fast load times, quick federation, and reliable service, all things that benefit from reducing traffic load to only registered users.

[–] Dave@lemmy.nz 1 points 1 day ago (1 children)

Fundamentally, what I’m suggesting is a fork in the road. Either an instance admin can set up to eliminate scrapers by making the instance private to only registered users,

Yeah, it would require perhaps more changes (since instances newly subscribed to a community need the ability to ad hoc fetch content), but even just not showing the website when someone isn't logged in would probably make a big difference. That might be pretty easy, just redirect requests to load the web app (except the login page) to the login page, and exclude the API. Apps would still get logged out access but I doubt that's much of a problem compared to the website, since the bots seem to just be indiscriminately scraping web pages.

[–] Cooper8@feddit.online 2 points 1 day ago

Definitely true.

[–] Redacted@lemmy.zip 2 points 2 days ago

Thank you for the detailed response :) i even understood most of it

[–] TehBamski@lemmy.world 5 points 2 days ago (1 children)

I'm very much a Jon Snow when it comes to how DDoS mitigation, DNS, etc, etc., actually work. But surely there are other corporations that offer the same services that work just as good or better than Cloudflare, no?

[–] Dave@lemmy.nz 13 points 2 days ago (1 children)

Cloudflare has a generous free tier. I think thats why it got so popular.

[–] dohpaz42@lemmy.world 7 points 2 days ago* (last edited 2 days ago) (1 children)

Begs the question; when will it go the route of other services with its generous free tier?

[–] Dave@lemmy.nz 8 points 2 days ago* (last edited 18 hours ago)

A good chance. Depends on if they think the free tier is still stacking up for them.

E.g. getting their name out there with hobbyists means people recognise the name at work and have staff already familiar, is this still important? Probably not, considering how widespread they are now.

Being able to say in sales speeches they mitigate X billion DDOS attacks and X trillion GB of data saved etc, maybe that is still worth it to them to keep the free tier in order to win big contracts?

Since they dropped their no video streaming clause from the T&Cs of free accounts, I'm guessing they aren't about to back down on the unlimited bandwidth but over time they are adding more and more value add premium features, which may be their core strategy.

But I do not doubt that they will drop or enshittify the free tier as soon as they think it's the best strategic move.

[–] dragonfucker@lemmy.nz 3 points 2 days ago (1 children)

What about Anubis?

[–] Dave@lemmy.nz 5 points 2 days ago (1 children)

Yeah so anubis is like a Cloudflare challenge, it fits in to a certain part of the process.

My point is basically that Cloudflare provides a service that stands in for many things an admin could be doing. There are many instances that don't use Cloudflare, and I commend them for that. It's more work but certainly possible.

There's also the additional problem that AI bots are breaking through anubis so it can't be the only line of defence.

E.g. https://news.ycombinator.com/item?id=44914773

[–] dragonfucker@lemmy.nz 2 points 2 days ago

Interesting, thanks

[–] sk1nnym1ke@piefed.social 47 points 2 days ago* (last edited 2 days ago) (1 children)

Too lazy to create the meme. Insert the two astronauts looking at earth meme

Wait, there is no decentralized internet?

Always has been.

[–] Kierunkowy74@piefed.social 16 points 2 days ago (1 children)

Apparently there is a decentralized internet out there. Just we are not experiencing it right now. Skill issue, huh?

insert cursed wojak reaction

[–] irelephant@lemmy.dbzer0.com 4 points 2 days ago

cursed wojak

[–] CheeseNoodle@lemmy.world 4 points 1 day ago

So my takeaway from this thread is existing mega corporations have found a legal way (deliberately or not) to run endless denial of service attacks on potential competition?

[–] TheMinions@lemmy.dbzer0.com 19 points 2 days ago

Today was when I realized dbzer0 doesn’t use Cloudflare. Which is kinda on brand for them tbh.

[–] Blaze@piefed.zip 19 points 2 days ago (2 children)

Always have an extra alt

[–] Kierunkowy74@piefed.social 21 points 2 days ago (1 children)

Just only one extra alt, I swear...

[–] snooggums@piefed.world 4 points 2 days ago* (last edited 2 days ago)

Just one billion more alts bro!

[–] slothrop@lemmy.ca 7 points 2 days ago

And 43 proxies, a vpn, tor, ip2, irc, archie, finger, and fortran!