15
Researchers Manipulate Stolen Data to Corrupt AI Models and Generate Inaccurate Outputs
(cybersecuritynews.com)
Cyber Security news and links to cyber security stories that could make you go hmmm. The content is exactly as it is consumed through RSS feeds and wont be edited (except for the occasional encoding errors).
This community is automagically fed by an instance of Dittybopper.
Hide fake data in with your real data. Then, if an AI is trained (not just reading) that data, it will be poisoned.
Yeah, OK.
That's not how this works. That's not how any of this works.
Nobody is going to steal data specifically for training AI with it. They're going to use an existing AI model to analyze the data and it will notice and point out the problems with the poisoned set. Then the person analyzing the data will be like, "what the fuck is this garbage?" And delete it.
LLMs are mostly being trained with synthetic data these days anyway (which is interesting... These generated texts are so bizarre!). Generative image AI still needs images though but that's basically impossible to poison at this point because all the images go through pre-training to narrow down the bounding boxes (for the metadata) which negates any intentional poisoning. Furthermore, the image metadata databases are constantly in a state of pruning and improving. Trying to sneak a poisoned image into them is all but impossible except for academic stuff whcih... Well, why TF would you want to hurt the poor guy trying to write his PhD thesis that says, "AI is bad, here's why..."