this post was submitted on 18 Feb 2026

584 points (97.2% liked)

Fuck AI

5920 readers

1704 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

584

"Just ask ChatGPT" (Art by Shave_your_eyebrows) (pawb.social)

submitted 1 day ago by ThefuzzyFurryComrade@pawb.social to c/fuck_ai@lemmy.world

62 comments fedilink hide all child comments

Unavailable at source.

you are viewing a single comment's thread
view the rest of the comments

[–] FauxLiving@lemmy.world 0 points 6 hours ago

as I said, the text has a 0% error rate about the contents of the text, which is what the LLM is summarising, and to which it adds it’s own error rate. Then you read that and add your error rate.

Error rates that you simultaneously haven't defined and also have declared as too high to be usable.

These tools clearly work, much like a search engine clearly works. They have errors (find me clean search results) but we use them.

You could make the same argument about search. If you issued a query to Google and compared the results generated by the machine learning systems and then had a human read the entire Internet specifically trying to answer your query you would probably find that in the end (after a few decades) the human results would probably be more responsive to your query and the Google results, once you get to page 3 or 4 start to become random nonsense.

By any measure the Google results are worse than what a human would choose. This is why you have to 'learn' to search and to issue queries in a specific way, because otherwise you get errors/bad results.

The problem with the accurate human results is that if you had all of the people on the planet working full-time 365 days a year could not service a single minute worth of the queries that the Google machine learning algorithms serve up 24/7.

Could you read 3 books and find the answer that you want? Or craft some regular expression search to find it? Sure, but you can't do it faster than it takes to run a RAG search and inference 10 million tokens worth of text.

The whole point of search is that looking through every document every time that you want to find something is a waste of effort, using summarization allows you to more accurately survey larger volumes of data and search in what you're looking for. You never trust the output of the model, just like you don't cite Google's search results page or Wikipedia, because they are there to point you to information, not provide it. A RAG system gives you the citations for the data so once the summarization indicates that it has found what you're looking for then you can read for yourself.

the question is can we make a system that has an error rate that is close to or lower than a person’s

can we???

Yes.

Here is a peer reviewed article published in Nature Medicine - https://pmc.ncbi.nlm.nih.gov/articles/PMC11479659/

The relevant section from the abstract:

A clinical reader study with 10 physicians evaluated summary completeness, correctness and conciseness; in most cases, summaries from our best-adapted LLMs were deemed either equivalent (45%) or superior (36%) compared with summaries from medical experts.

Another published peer reviewed article posted in npj digital medicine - https://www.nature.com/articles/s41746-025-01670-7

Our clinical error metrics were derived from 18 experimental configurations involving LLMs for clinical note generation, consisting of 12,999 clinician-annotated sentences. We observed a 1.47% hallucination rate and a 3.45% omission rate. By refining prompts and workflows, we successfully reduced major errors below previously reported human note-taking rates, highlighting the framework’s potential for safer clinical documentation.

why… would I want that? I read novels because I like reading novels? I also think that on summaries LLMs are especially bad, since there is no distinction between “important” and “unimportant” in the architecture. The point of a summary is to only get the important points, so it clashes.

Novel is given as a human unit of text, because you may not know what 10 million tokens means in terms of actual length. I'm clearly not talking about fictional novels read for entertainment.

Meanwhile an LLM could produce a summary, with citations generated and tracked by non-AI systems, with an error rate comparable to a human (assuming the human was given a few months to work on the problem) in seconds.

I still have not seen any evidence for this, and it still does not adress the point that the summary would be pretty much unreadable

https://lemmy.world/post/43275879/22220800

This is an example of a commercial tool which returns both the non-LLM generation of citations and the accurate summation of the contents of the article as it relates to the question.