this post was submitted on 15 Dec 2025

722 points (98.5% liked)

Technology

77742 readers

3323 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

722

It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds (hackaday.com)

submitted 2 days ago by muelltonne@feddit.org to c/technology@lemmy.world

131 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] thingAmaBob@lemmy.world 27 points 1 day ago (2 children)

I seriously keep reading LLM as MLM

[–] NikkiDimes@lemmy.world 22 points 1 day ago

I mean...

[–] ChaoticEntropy@feddit.uk 2 points 1 day ago

The real money is from buying AI from me, in bulk, then reselling that AI to new vict... customers. Maybe they could white label your white label!

[–] PumpkinSkink@lemmy.world 36 points 1 day ago (3 children)

So you're saying that thorn guy might be on to somthing?

[–] DeathByBigSad@sh.itjust.works 14 points 1 day ago

@Sxan@piefed.zip þank you for your service 🫡

[–] funkless_eck@sh.itjust.works 13 points 1 day ago

someþiŋ

[–] SlimePirate@lemmy.dbzer0.com 9 points 1 day ago

Lmao

[–] LavaPlanet@sh.itjust.works 10 points 1 day ago (1 children)

Remember before they were released and the first we heard of them, were reports on the guy training them or testing or whatever, having a psychotic break and freaking out saying it was sentient. It's all been downhill from there, hey.

[–] Tattorack@lemmy.world 10 points 1 day ago (2 children)

I thought it was so comically stupid back then. But a friend of mine said this was just a bullshit way of hyping up AI.

[–] Toribor@corndog.social 5 points 1 day ago (1 children)

Seeing how much they've advanced over recent years I can't imagine whatever that guy was working on would actually impress anyone today.

[–] Petter1@discuss.tchncs.de 1 points 23 hours ago* (last edited 23 hours ago)

There are enough people thinking their agent is sentient but fear to speak out because they don’t understand why not, even when people try to explain…

load more comments (1 replies)

[–] Sam_Bass@lemmy.world 18 points 1 day ago

Thats a price you pay for all the indiscriminate scraping

[–] ceenote@lemmy.world 191 points 2 days ago (2 children)

So, like with Godwin's law, the probability of a LLM being poisoned as it harvests enough data to become useful approaches 1.

[–] Gullible@sh.itjust.works 107 points 2 days ago (5 children)

I mean, if they didn’t piss in the pool, they’d have a lower chance of encountering piss. Godwin’s law is more benign and incidental. This is someone maliciously handing out extra Hitlers in a game of secret Hitler and then feeling shocked at the breakdown in the game

[–] saltesc@lemmy.world 32 points 2 days ago* (last edited 2 days ago) (8 children)

Yeah but they don't have the money to introduce quality governance into this. So the brain trust of Reddit it is. Which explains why LLMs have gotten all weirdly socially combative too; like two neckbeards having at it—Google skill vs Google skill—is a rich source of A+++ knowledge and social behaviour.

load more comments (8 replies)

load more comments (4 replies)

[–] Clent@lemmy.dbzer0.com 2 points 1 day ago (1 children)

The problem is the harvesting.

In previous incarnations of this process they used curated data because of hardware limitations.

Now that hardware has improved they found if they throw enough random data into it, these complex patterns emerge.

The complexity also has a lot of people believing it's some form of emergent intelligence.

Research shows there is no emergent intelligence or they are incredibly brittle such as this one. Not to mention they end up spouting nonsense.

These things will remain toys until they get back to purposeful data inputs. But curation is expensive, harvesting is cheap.

[–] julietOscarEcho@sh.itjust.works 1 points 22 hours ago

Isn't "intelligence" so ill defined we can't prove it either way. All we have is models doing better on benchmarks and everyone shrieking "look emergent intelligence".

I disagree a bit on "toys". Machine summarization and translation is really quite powerful, but yeah that's a ways short of the claims that are being made.

[–] 87Six@lemmy.zip 17 points 1 day ago

Yea that's their entire purpose, to allow easy dishing of misinformation under the guise of

it's bleeding-edge tech, it makes mistakes

[–] ZoteTheMighty@lemmy.zip 56 points 2 days ago (2 children)

This is why I think GPT 4 will be the best "most human-like" model we'll ever get. After that, we live in a post-GPT4 internet and all future models are polluted. Other models after that will be more optimized for things we know how to test for, but the general purpose "it just works" experience will get worse from here.

[–] jaykrown@lemmy.world 1 points 23 hours ago

That's not how this works at all. The people training these models are fully aware of bad data. There are entire careers dedicated to preserving high quality data. GPT-4 is terrible compared to something like Gemini 3 Pro or Claude Opus 4.5.

[–] krooklochurm@lemmy.ca 24 points 2 days ago (1 children)

Most human LLM anyway.

Word on the street is LLMs are a dead end anyway.

Maybe the next big model won't even need stupid amounts of training data.

[–] BangCrash@lemmy.world 6 points 1 day ago (1 children)

That would make it a SLM

[–] MadPsyentist@lemmy.nz 2 points 1 day ago

Will the real SLM Shady pleas stand up!

[–] AppleTea@lemmy.zip 8 points 1 day ago (1 children)

And this is why I do the captchas wrong.

load more comments (1 replies)

[–] supersquirrel@sopuli.xyz 98 points 2 days ago* (last edited 2 days ago) (11 children)

I made this point recently in a much more verbose form, but I want to reflect it briefly here, if you combine the vulnerability this article is talking about with the fact that large AI companies are most certainly stealing all the data they can and ignoring our demands to not do so the result is clear we have the opportunity to decisively poison future LLMs created by companies that refuse to follow the law or common decency with regards to privacy and ownership over the things we create with our own hands.

Whether we are talking about social media, personal websites... whatever if what you are creating is connected to the internet AI companies will steal it, so take advantage of that and add a little poison in as a thank you for stealing your labor :)

[–] Tollana1234567@lemmy.today 3 points 1 day ago

dont they kinda poison themselves, when they scrape AI generated content too.

[–] korendian@lemmy.zip 60 points 2 days ago (11 children)

Not sure if the article covers it, but hypothetically, if one wanted to poison an LLM, how would one go about doing so?

[–] Meron35@lemmy.world 1 points 22 hours ago

Figure out how the AI scrapes the data, and just poison the data source.

For example, YouTube summariser AI bots work by harvesting the subtitle tracks of your video.

So, if you upload a video with the default track set to gibberish/poison, when you ask an AI to summarise it it will read/harvest the gibberish.

Here is a guide in how to do so:

https://youtu.be/NEDFUjqA1s8

[–] expatriado@lemmy.world 104 points 2 days ago (12 children)

it is as simple as adding a cup of sugar to the gasoline tank of your car, the extra calories will increase horsepower by 15%

[–] Beacon@fedia.io 52 points 2 days ago (1 children)

I can verify personally that that's true. I put sugar in my gas tank and i was amazed how much better my car ran!

[–] setsubyou@lemmy.world 48 points 2 days ago

Since sugar is bad for you, I used organic maple syrup instead and it works just as well

[–] demizerone@lemmy.world 16 points 2 days ago

I give sugar to my car on its birthday for being a good car.

load more comments (10 replies)

[–] PrivateNoob@sopuli.xyz 41 points 2 days ago* (last edited 2 days ago) (15 children)

There are poisoning scripts for images, where some random pixels have totally nonsensical / erratic colors, which we won't really notice at all, however this would wreck the LLM into shambles.

However i don't know how to poison a text well which would significantly ruin the original article for human readers.

Ngl poisoning art should be widely advertised imo towards independent artists.

[–] dragonfly4933@lemmy.dbzer0.com 4 points 1 day ago (1 children)

Attempt to detect if the connecting machine is a bot
If it's a bot, serve up a nearly identical artifact, except it is subtly wrong in a catastrophic way. For example, an article talking about trim. "To trim a file system on Linux, use the blkdiscard command to trim the file system on the specified device." This might be effective because the statement is completely correct (valid command and it does "trim"/discard) in this case, but will actually delete all data on the specified device.
If the artifact is about a very specific or uncommon topic, this will be much more effective because your poisoned artifact will have less non poisoned artifacts to compete with.

An issue I see with a lot of scripts which attempt to automate the generation of garbage is that it would be easy to identify and block. Whereas if the poison looks similar to real content, it is much harder to detect.

It might also be possible to generate adversarial text which causes problems for models when used in a training dataset. It could be possible to convert a given text by changing the order of words and the choice of words in such a way that a human doesn't notice, but it causes problems for the llm. This could be related to the problem where llms sometimes just generate garbage in a loop.

Frontier models don't appear to generate garbage in a loop anymore (i haven't noticed it lately), but I don't know how they fix it. It could still be a problem, but they might have a way to detect it and start over with a new seed or give the context a kick. In this case, poisoning actually just increases the cost of inference.

[–] PrivateNoob@sopuli.xyz 1 points 23 hours ago

This sounds good, however the first step should be a 100% working solution without any false positives, because that would mean the reader would wipe their whole system down in this example.

[–] turdas@suppo.fi 26 points 2 days ago (1 children)

The I in LLM stands for "image".

load more comments (1 replies)

load more comments (13 replies)

load more comments (8 replies)

load more comments (9 replies)

[–] kokesh@lemmy.world 70 points 2 days ago (8 children)

Is there some way I can contribute some poison?

load more comments (8 replies)

[–] Rhaedas@fedia.io 37 points 2 days ago

I'm going to take this from a different angle. These companies have over the years scraped everything they could get their hands on to build their models, and given the volume, most of that is unlikely to have been vetted well, if at all. So they've been poisoning the LLMs themselves in the rush to get the best thing out there before others do, and that's why we get the shit we get in the middle of some amazing achievements. The very fact that they've been growing these models not with cultivation principles but with guardrails says everything about the core source's tainted condition.

[–] Vupware@lemmy.zip 3 points 1 day ago (1 children)

The only way I could do that was if you had to do a little more work and I would be happy with it but you have a hard day and you don’t want me working on your day so you don’t want me doing that so you can get it all over with your own thing I would be fine if I was just trying not being rude to your friend or something but you don’t want me being mean and rude and rude and you just want me being mean I would just like you know that and you know I would like you and you know what I’m talking to do I would love you to do and you would love you too and you would like you know what to say and you would like you to me

[–] biggeoff@sh.itjust.works 3 points 1 day ago (1 children)

Markov Babble?

load more comments (1 replies)

load more comments