this post was submitted on 18 Nov 2025
332 points (98.0% liked)
Fediverse memes
2242 readers
81 users here now
Memes about the Fediverse.
Rules
General
- Be respectful
- Post on topic
- No bigotry or hate speech
- Memes should not be personal attacks towards other users
Specific
- We are not YPTB. If you have a problem with the way an instance or community is run, then take it up over at !yepowertrippinbastards@lemmy.dbzer0.com.
- Addendum: Yes we know that you think ml/hexbear/grad are tankies and or .world are a bunch of liberals but it gets old quickly. Try and come up with new material.
Elsewhere in the Fediverse
Other relevant communities:
- !fediverse@lemmy.world
- !yepowertrippinbastards@lemmy.dbzer0.com
- !lemmydrama@lemmy.world
- !fediverselore@lemmy.ca
- !bestofthefediverse@lemmy.ca
- !fedigrow@lemmy.zip
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Yeah, so far it works to just check for a JWT in the cookie (regardless of what it is) to allow logged in users to bypass the rules. This works on Lemmy because the bots aren't specifically targetting Lemmy so they don't try to fake this (although if there were, just make an instance and our instances will send you all the data lol).
This is already basically how ActivityPub works for communication between instances. But the activities are one thing, it's the page loads that are the killer because of the database queries needed to compile a unique, sorted home page of subscriptions. You could block logged out users but that impacts many lurkers.
For media, that's difficult as media is often being loaded from a remote instance that doesn't know who you are, along with the problem that the media provider is not technically part of Lemmy (it's a separate service called pict-rs) so doesn't know if you're logged in. I'm not sure how that worked on PieFed or Mbin, but regardless you might not be logged in at all, and you should still be allowed to browse content.
Lemmy has a proxy option where the instance can fetch content from the other servers to provide to the user, which does get around this issue for logged out users. But the proxy caches the media, and when this happens you are now the host of whatever media is in any post that made it's way to your instance, along with all the legal risks that involves.
All of the things being discussed around mitigations in Cloudflare are also possible to do without Cloudflare, but it just means setting it all up yourself. I'll just wait for someone smarter than me to build a tool I can host myself that does all this automatically, then I'll consider it π
"you could block logged out users but that would impact many lurkers"
"regardless you might not be logged in at all, you should still be allowed to browse content"
Fundamentally, what I'm suggesting is a fork in the road. Either an instance admin can set up to eliminate scrapers by making the instance private to only registered users,
or they can maintain their instance as public and deal with more arcane methods to attempt to eliminate scraping.
The issue is that if the infrastructure isn't in place for the instance operator to decide to make their service private, then everyone is opted in to the Scrapers vs Countermeasures war with no alternative.
Privacy and encryption just work, it seems like not building the infrastructure to enable the network to function with them in place is a mistake.
To me, and to many users, what we want is fast load times, quick federation, and reliable service, all things that benefit from reducing traffic load to only registered users.
Yeah, it would require perhaps more changes (since instances newly subscribed to a community need the ability to ad hoc fetch content), but even just not showing the website when someone isn't logged in would probably make a big difference. That might be pretty easy, just redirect requests to load the web app (except the login page) to the login page, and exclude the API. Apps would still get logged out access but I doubt that's much of a problem compared to the website, since the bots seem to just be indiscriminately scraping web pages.
Definitely true.