this post was submitted on 07 Jul 2025

872 points (98.2% liked)

Technology

72499 readers

3441 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

872

AI agents wrong ~70% of time: Carnegie Mellon study (www.theregister.com)

submitted 1 day ago by eli001@lemmy.world to c/technology@lemmy.world

174 comments fedilink hide all child comments

(page 3) 50 comments

sorted by: hot top controversial new old

[–] Affidavit@lemmy.world 6 points 1 day ago (2 children)

"...for multi-step tasks"

load more comments (2 replies)

[+] surph_ninja@lemmy.world -7 points 14 hours ago* (last edited 13 hours ago) (15 children)

This is the same kind of short-sighted dismissal I see a lot in the religion vs science argument. When they hinge their pro-religion stance on the things science can’t explain, they’re defending an ever diminishing territory as science grows to explain more things. It’s a stupid strategy with an expiration date on your position.

All of the anti-AI positions, that hinge on the low quality or reliability of the output, are defending an increasingly diminished stance as the AI’s are further refined. And I simply don’t believe that the majority of the people making this argument actually care about the quality of the output. Even when it gets to the point of producing better output than humans across the board, these folks are still going to oppose it regardless. Why not just openly oppose it in general, instead of pinning your position to an argument that grows increasingly irrelevant by the day?

DeepSeek exposed the same issue with the anti-AI people dedicated to the environmental argument. We were shown proof that there’s significant progress in the development of efficient models, and it still didn’t change any of their minds. Because most of them don’t actually care about the environmental impacts. It’s just an anti-AI talking point that resonated with them.

The more baseless these anti-AI stances get, the more it seems to me that it’s a lot of people afraid of change and afraid of the fundamental economic shifts this will require, but they’re embarrassed or unable to articulate that stance. And it doesn’t help that the luddites haven’t been able to predict a single development. Just constantly flailing to craft a new argument to criticize the current models and tech. People are learning not to take these folks seriously.

load more comments (15 replies)

[–] brsrklf@jlai.lu 23 points 1 day ago (1 children)

In one case, when an agent couldn't find the right person to consult on RocketChat (an open-source Slack alternative for internal communication), it decided "to create a shortcut solution by renaming another user to the name of the intended user.

Ah ah, what the fuck.

This is so stupid it's funny, but now imagine what kind of other "creative solutions" they might find.

load more comments (1 replies)

[–] kameecoding@lemmy.world 2 points 23 hours ago (4 children)

For me as a software developer the accuracy is more in the 95%+ range.

On one hand the built in copilot chat widget in Intellij basically replaces a lot my google queries.

On the other hand it is rather fucking good at executing some rewrites that is a fucking chore to do manually, but can easily be done by copilot.

Imagine you have a script that initializes your DB with some test data. You have an Insert into statement with lots of columns and rows so

Inser into (column1,....,column n) Values row1, Row 2 Row n

Addig a new column with test data for each row is a PITA, but copilot handles it without issue.

Similarly when writing unit tests you do a lot of edge case testing which is a bunch of almost same looking tests with maybe one variable changing, at most you write one of those tests, then copilot will auto generate the rest after you name the next unit test, pretty good at guessing what you want to do in that test, at least with my naming scheme.

So yeah, it's way overrated for many-many things, but for programming it's a pretty awesome productivity tool.

[–] Nalivai@discuss.tchncs.de 1 points 18 hours ago (1 children)

Keep doing what you do. Your company will pay me handsomely to throw out all your bullshit and write working code you can trust when you're done. If your company wants to have a product in the future that is.

[–] kameecoding@lemmy.world 1 points 18 hours ago* (last edited 18 hours ago) (7 children)

Lmao, okay buddy, based on how many interviews I have sat on in, the chances that you are a worse programmer than me are much higher than you being better than me.

Being a pompous ass dismissive of new tooling makes you chances even worse 😕

[–] PotentialProblem@sh.itjust.works 1 points 17 hours ago (3 children)

I’ve been in the industry awhile and your assessment is dead on.

As long as you’re not blindly committing the code, it’s a huge time saver for a number of mundane tasks.

It’s especially fantastic for writing throwaway tooling. Need data massaged a specific way? Ez pz. Need a script to execute an api call on each entry in a spreadsheet? No problem.

The guy above you is a nutter. Not sure if people haven’t tried leveraging LLMs or what. It has a ton of faults, but it really does speed up the mundane work. Also, clearly the person is either brand new to the field or doesn’t even work in it. Otherwise they would have seen the barely functional shite that actual humans churn out.

Part of me wonders if code organization is going to start optimizing for interpretation by these models rather than humans.

load more comments (3 replies)

load more comments (6 replies)

load more comments (3 replies)

[–] NarrativeBear@lemmy.world 23 points 1 day ago (3 children)

The ones being implemented into emergency call centers are better though? Right?

[–] TeddE@lemmy.world 23 points 1 day ago

Yes! We've gotten them up to 94℅ wrong at the behest of insurance agencies.

[–] Ulrich@feddit.org 12 points 1 day ago (4 children)

I called my local HVAC company recently. They switched to an AI operator. All I wanted was to schedule someone to come out and look at my system. It could not schedule an appointment. Like if you can't perform the simplest of tasks, what are you even doing? Other than acting obnoxiously excited to receive a phone call?

load more comments (4 replies)

[–] Tollana1234567@lemmy.today 1 points 1 day ago

i wonder how the evil palintir uses its AI.

[–] floofloof@lemmy.ca 18 points 1 day ago* (last edited 1 day ago)

"Gartner estimates only about 130 of the thousands of agentic AI vendors are real."

This whole industry is so full of hype and scams, the bubble surely has to burst at some point soon.

[–] fossilesque@mander.xyz 9 points 1 day ago (1 children)

Agents work better when you include that the accuracy of the work is life or death for some reason. I've made a little script that gives me bibtex for a folder of pdfs and this is how I got it to be usable.

[–] HertzDentalBar@lemmy.blahaj.zone 3 points 1 day ago (1 children)

Did you make it? Or did you prompt it? They ain't quite the same.

[–] fossilesque@mander.xyz 2 points 22 hours ago* (last edited 22 hours ago)

It calls ollama with a prompt, it's a bit complex because it renames and moves stuff too and sorts it.

[–] lepinkainen@lemmy.world 10 points 1 day ago (7 children)

Wrong 70% doing what?

I’ve used LLMs as a Stack Overflow / MSDN replacement for over a year and if they fucked up 7/10 questions I’d stop.

Same with code, any free model can easily generate simple scripts and utilities with maybe 10% error rate, definitely not 70%

load more comments (7 replies)

[–] FenderStratocaster@lemmy.world 9 points 1 day ago

I tried to order food at Taco Bell drive through the other day and they had an AI thing taking your order. I was so frustrated that I couldn't order something that was on the menu I just drove to the window instead. The guy that worked there was more interested in lecturing me on how I need to order. I just said forget it and drove off.

If you want to use AI, I'm not going to use your services or products unless I'm forced to. Looking at you Xfinity.

[–] kinsnik@lemmy.world 8 points 1 day ago

I haven't used AI agents yet, but my job is kinda pushing for them. but i have used the google one that creates audio podcasts, just to play around, since my coworkers were using it to "learn" new things. i feed it with some of my own writing and created the podcast. it was fun, it was an audio overview of what i wrote. about 80% was cool analysis, but 20% was straight out of nowhere bullshit (which i know because I wrote the original texts that the audio was talking about). i can't believe that people are using this for subjects that they have no knowledge. it is a fun toy for a few minutes (which is not worth the cost to the environment anyway)

[–] lmagitem@lemmy.zip 2 points 1 day ago

Color me surprised

load more comments