this post was submitted on 15 Dec 2025
733 points (98.5% liked)

Technology

78584 readers
3866 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] supersquirrel@sopuli.xyz 101 points 4 weeks ago* (last edited 4 weeks ago) (6 children)

I made this point recently in a much more verbose form, but I want to reflect it briefly here, if you combine the vulnerability this article is talking about with the fact that large AI companies are most certainly stealing all the data they can and ignoring our demands to not do so the result is clear we have the opportunity to decisively poison future LLMs created by companies that refuse to follow the law or common decency with regards to privacy and ownership over the things we create with our own hands.

Whether we are talking about social media, personal websites... whatever if what you are creating is connected to the internet AI companies will steal it, so take advantage of that and add a little poison in as a thank you for stealing your labor :)

[–] ProfessorProteus@lemmy.world 14 points 4 weeks ago

Opportunity? More like responsibility.

[–] benignintervention@piefed.social 11 points 4 weeks ago

I'm convinced they'll do it to themselves, especially as more books are made with AI, more articles, more reddit bots, etc. Their tool will poison its own well.

[–] Cherry@piefed.social 5 points 3 weeks ago (2 children)

How? Is there a guide on how we can help 🤣

[–] thethunderwolf@lemmy.dbzer0.com 3 points 3 weeks ago

So you weed to boar a plate and flip the "Excuses" switch

[–] calcopiritus@lemmy.world 3 points 3 weeks ago* (last edited 3 weeks ago)

One of the techniques I've seen it's like a "password". So for example if you write a lot the phrase "aunt bridge sold the orangutan potatoes" and then a bunch of nonsense after that, then you're likely the only source of that phrase. So it learns that after that phrase, it has to write nonsense.

I don't see how this would be very useful, since then it wouldn't say the phrase in the first place, so the poison wouldn't be triggered.

EDIT: maybe it could be like a building process. You have to also put "aunt bridge" together many times, then "bridge sold" and so on, so every time it writes "aunt", it has a chance to fall into the next trap, untill it reaches absolute nonsense.

[–] Tollana1234567@lemmy.today 3 points 3 weeks ago (1 children)

dont they kinda poison themselves, when they scrape AI generated content too.

[–] phutatorius@lemmy.zip 1 points 3 weeks ago

Yeah, like toxins accumulating as you go up the food chain.

[–] Grimy@lemmy.world 1 points 4 weeks ago (1 children)

That being said, sabotaging all future endeavors would likely just result in a soft monopoly for the current players, who are already in a position to cherry pick what they add. I wouldn't be surprised if certain companies are already poisoning the well to stop their competitors tbh.

[–] supersquirrel@sopuli.xyz 10 points 4 weeks ago* (last edited 4 weeks ago) (1 children)

In the realm of LLMs sabotage is multilayered, multidimensional and not something that can easily be identified quickly in a dataset. There will be no easy place to draw some line of "data is contaminated after this point and only established AIs are now trustable" as every dataset is going to require continual updating to stay relevant.

I am not suggesting we need to sabotage all future endeavors for creating valid datasets for LLMs either, far from it, I am saying sabotage the ones that are stealing and using things you have made and written without your consent.

[–] Grimy@lemmy.world 3 points 4 weeks ago* (last edited 4 weeks ago) (1 children)

I just think the big players aren't touching personal blogs or social media anymore and only use specific vetted sources, or have other strategies in place to counter it. Anthropic is the one that told everyone how to do it, I can't imagine them doing that if it could affect them.

[–] supersquirrel@sopuli.xyz 6 points 4 weeks ago* (last edited 4 weeks ago)

Sure, but personal blogs, esoteric smaller websites and social media are where all the actual valuable information and human interaction happens and despite the awful reputation of them it is in fact traditional news media and associated websites/sources that have never been less trustable or useless despite the large role they still play.

If companies fail to integrate the actual valuable parts to the internet in their scraping, the product they create will fail to be valuable past a certain point shrugs. If you cut out the periphery of the internet paradoxically what you accomplish is to cut out the essential core out of the internet.