this post was submitted on 07 Jul 2025

917 points (98.1% liked)

Technology

72580 readers

3441 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

917

AI agents wrong ~70% of time: Carnegie Mellon study (www.theregister.com)

submitted 2 days ago by eli001@lemmy.world to c/technology@lemmy.world

191 comments fedilink hide all child comments

(page 4) 41 comments

sorted by: hot top controversial new old

[–] lmagitem@lemmy.zip 2 points 2 days ago

Color me surprised

[–] dylanmorgan@slrpnk.net 2 points 2 days ago (1 children)

Claude why did you make me an appointment with a gynecologist? I need an appointment with my neurologist, I’m a man and I have Parkinson’s.

load more comments (1 replies)

[+] surph_ninja@lemmy.world -7 points 1 day ago* (last edited 1 day ago) (15 children)

This is the same kind of short-sighted dismissal I see a lot in the religion vs science argument. When they hinge their pro-religion stance on the things science can’t explain, they’re defending an ever diminishing territory as science grows to explain more things. It’s a stupid strategy with an expiration date on your position.

All of the anti-AI positions, that hinge on the low quality or reliability of the output, are defending an increasingly diminished stance as the AI’s are further refined. And I simply don’t believe that the majority of the people making this argument actually care about the quality of the output. Even when it gets to the point of producing better output than humans across the board, these folks are still going to oppose it regardless. Why not just openly oppose it in general, instead of pinning your position to an argument that grows increasingly irrelevant by the day?

DeepSeek exposed the same issue with the anti-AI people dedicated to the environmental argument. We were shown proof that there’s significant progress in the development of efficient models, and it still didn’t change any of their minds. Because most of them don’t actually care about the environmental impacts. It’s just an anti-AI talking point that resonated with them.

The more baseless these anti-AI stances get, the more it seems to me that it’s a lot of people afraid of change and afraid of the fundamental economic shifts this will require, but they’re embarrassed or unable to articulate that stance. And it doesn’t help that the luddites haven’t been able to predict a single development. Just constantly flailing to craft a new argument to criticize the current models and tech. People are learning not to take these folks seriously.

load more comments (15 replies)

[–] NuXCOM_90Percent@lemmy.zip 2 points 2 days ago

While I do hope this leads to a pushback on "I just put all our corporate secrets into chatgpt":

In the before times, people got their answers from stack overflow... or fricking youtube. And those are also wrong VERY VERY VERY often. Which is one of the biggest problems. The illegally scraped training data is from humans and humans are stupid.

[–] lemmy_outta_here@lemmy.world 2 points 2 days ago

Rookie numbers! Let’s pump them up!

To match their tech bro hypers, the should be wrong at least 90% of the time.

[–] atticus88th@lemmy.world 1 points 2 days ago

this study was written with the assistance of an AI agent.

[–] esc27@lemmy.world 0 points 2 days ago* (last edited 2 days ago)

30% might be high. I've worked with two different agent creation platforms. Both require a huge amount of manual correction to work anywhere near accurately. I'm really not sure what the LLM actually provides other than some natural language processing.

Before human correction, the agents i've tested were right 20% of the time, wrong 30%, and failed entirely 50%. To fix them, a human has to sit behind the curtain and manually review conversations and program custom interactions for every failure.

In theory, once it is fully setup and all the edge cases fixed, it will provide 24/7 support in a convenient chat format. But that takes a lot more man hours than the hype suggests...

Weirdly, chatgpt does a better job than a purpose built, purchased agent.

[–] MagicShel@lemmy.zip -4 points 2 days ago

I need to know the success rate of human agents in Mumbai (or some other outsourcing capital) for comparison.

I absolutely think this is not a good fit for AI, but I feel like the presumption is a human would get it right nearly all of the time, and I'm just not confident that's the case.

[–] Melvin_Ferd@lemmy.world -1 points 2 days ago

How often do tech journalist get things wrong?

load more comments