this post was submitted on 29 Dec 2025
364 points (98.4% liked)

Technology

78435 readers
2736 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

A talk from the hacker conference 39C3 on how AI generated content was identified via a simple ISBN checksum calculator (in English).

top 50 comments
sorted by: hot top controversial new old
[–] ChillCapybara@discuss.tchncs.de 140 points 1 week ago (27 children)

TL;DW:

He wrote checksum verifier for ISBN and discovered AI generated content on Wikipedia with hallucinated sources. He used Claude to write the checksum verifier and the irony is not lost on him. He tracked down those who submitted the fake articles and determined many are doing so out of a misplaced desire to help, without an understanding of the limitations and pitfalls of using LLM gen content without verification.

load more comments (27 replies)
[–] kadu@scribe.disroot.org 52 points 1 week ago (3 children)

WHY poison Wikipédia with AI? I don't get it, what satisfaction does a person get from adding text they didn't write to Wikipédia? It's not like you get a gold medal, or your neighbor sees you as a hero, or you get a discount at McDonald's, there's no glamour on being an editor, so why even bother if you're not going to write quality content out of passion?

Please leave Wikipédia alone, you don't have to shove your AI bullshit everywhere, please leave somewhere in this god forsaken internet alone. Why must AI bros touch everything.

[–] MimicJar@lemmy.world 65 points 1 week ago (1 children)

It's possibly from people trying to help, but don't understand AI hallucinations.

For example a Wikipedia article might say, "John Smith spent a year Oxford University before moving to London.[Citation Needed]" So the article already contains information, but lacks proper citation.

Someone comes along and says, "Ah ha! AI can solve this and asks AI, 'Did John Smith spend a year at Oxford before moving to London, please provide citations.' and the AI returns, "Yes of course he did according to the book 'John Smith: Biography of a Man' ISBN 123456789"

So someone adds that as a citation and now Wikipedia has been improved.

Or... has it? The ISBN 123456789 is invalid. No book could possibly have that number. If the ISBN is invalid, then the book is also likely invalid, and the citation is also invalid.

So the satisfaction was someone who couldn't previously help Wikipedia, now thinking they can help Wikipedia. At face value that's a good thing, someone who wants to help Wikipedia. The problem is that they think they're helping, but they're actually harming.

[–] Appoxo@lemmy.dbzer0.com 9 points 1 week ago (1 children)

I thought wikipedia mods are overly zealous with checking submissions by new users.
Seemingly not so. Kinda worrisome

[–] addie@feddit.uk 16 points 1 week ago (1 children)

The problem is that the volume of slop available completely overwhelms all efforts at quality control. Zealotry only goes so far at turning back the tsunami of shite.

[–] Appoxo@lemmy.dbzer0.com 4 points 1 week ago (1 children)

I'd assume Wikipedia implemented a whitelist rule for trusted edits and new accounts edits are hold in limbo until they are confirmes entries, no?

If Wikipedia actually allows for live edits to the page without any further checks, it seems very naive or very stupid to me.

[–] maxwellfire@lemmy.world 5 points 1 week ago

Almost all Wikipedia pages allow not only live edits but anonymous ones as well. It worked remarkably well until the hallucination machines arrived.

[–] Michal@programming.dev 14 points 1 week ago

People like to help, they don't know that LLMs generate bullshit. This should answer the "why".

[–] jballs@sh.itjust.works 3 points 1 week ago

He talks about the reasons why people are doing this at 21:30 in the video.

[–] fubarx@lemmy.world 51 points 1 week ago

He notes that LLM vendors have been training their models on Wikipedia content. But if the content contains incorrect information and citations, you get the sort of circular (incorrect) reference that leads to misinformation.

One irony, he says, is that LLM vendors are now willing to pay for training data unpolluted by the hallucinated output their own products generate.

[–] Buckshot@programming.dev 32 points 1 week ago

Sort of related, I was fact checking some of the content on my own website that had been provided by someone who later turned out to be less than reliable.

There was one claim I was completely unable to find a source for and suspected it was an AI hallucination. Turned to chatgpt and tried to find a source with that and it provided nearly the exact same sentence and cited my website as the source thus completing the hallucination cycle.

I just deleted it all from my site and started over.

[–] errer@lemmy.world 24 points 1 week ago (2 children)

You could just verify all ISBNs are valid on each edit, which would have some value in finding typos, but not doing that leaves this honeypot that could be used to identify AI slop accounts. Clever. Though maybe they’re wise to it now.

[–] Thorry@feddit.org 31 points 1 week ago* (last edited 1 week ago) (1 children)

He also points out that there are many ISBNs that are "wrong", but are actually correct in the real world. This is because publishers don't always understand about the checksum and just increment the ISBN when publishing a new book. In many library systems there is this checkbox next to the ISBN entry field where you can say something like "I understand this ISBN is wrong, but it is correct in the real world".

So just flagging wrong ISBNs would lead to a lot of false positives and would need specific structures to deal with that.

[–] hereiamagain@sh.itjust.works 18 points 1 week ago* (last edited 1 week ago) (1 children)

That's extremely frustrating. Like, it's literally your job to get that number correct..

People frustrate me

[–] anomnom@sh.itjust.works 2 points 1 week ago (1 children)

I think most of the contributors aren’t going to get wife to it, because they have no idea what they are doing is wrong.

[–] timt@lemmy.world 2 points 1 week ago

*wise to it

[–] daannii@lemmy.world 9 points 1 week ago

A cylindrical human centipede is a really good analogy to how LLMs work.

People go to libraries asking for books that don't even exist and some think that the libraries are hiding them from them, but the ISBNs are just halucinations... 😅

[–] Akasazh@lemmy.world 6 points 1 week ago (1 children)

Can someone tl;dw me? I'm curious as to how comment gets tied to isbn

[–] mjr@infosec.pub 20 points 1 week ago (1 children)

Not watched yet, but I suspect AI edits are using hallucinated citations with ISBNs that don't even pass a checksum test. AI may improve on this if someone trains them about ISBNs better, but it's cool if this sort of test weeds some slop out for now.

[–] Akasazh@lemmy.world -5 points 1 week ago (1 children)

Can someone tl;dw me? I'm curious as to how comment gets tied to isbn

!commentmitosis@lemmy.dbzer0.com

load more comments
view more: next ›