this post was submitted on 17 Mar 2026
615 points (98.0% liked)

Programming

26121 readers
662 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
 

Excerpt:

"Even within the coding, it's not working well," said Smiley. "I'll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven't engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence."

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

"We don't know what those are yet," he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That's the kind of thing that needs to be assessed to determine whether AI helps an organization's engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

"It passed all the unit tests, the shape of the code looks right," he said. It's 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It's a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

"Coding works if you measure lines of code and pull requests," he said. "Coding does not work if you measure quality and team performance. There's no evidence to suggest that that's moving in a positive direction."

top 50 comments
sorted by: hot top controversial new old
[–] tomiant@piefed.social 1 points 1 hour ago

REPENT! The end is nigh!

...no but seriously 

[–] melsaskca@lemmy.ca 20 points 13 hours ago

Businesses were failing even before AI. If I cannot eventually speak to a human on a telephone then the whole human layer is gone and I no longer want to do business with that entity.

[–] olafurp@lemmy.world -3 points 6 hours ago (3 children)

I got a hot take on this. People are treating AI as a fire and forget tool when they really should be treating it like a junior dev.

Now here's what I think, it's a force multiplier. Let's assume each dev has a profile of...

2x feature progress, 2x tech debt removed 1x tech debt added.

Net tech debt adjusted productivity at 3x

Multiply by AI for 2 you have a 6x engineer

Now for another case, but a common one 1x feature, net tech debt -1.5x = -0.5x comes out as -1x engineer.

The latter engineer will be as fast as the prior in cranking out features without AI but will make the code base worse way faster.

Now imagine that the latter engineer really leans into AI and gets really good at cranking out features, gets commended for it and continues. He'll end up just creating bad code at an alarming pace until the code becomes brittle and unweildy. This is what I'm guessing is going to happen over the next years. More experienced devs will see a massive benefit but more junior devs will need to be reined in a lot.

Going forward architecture and isolation of concerns will be come more important so we can throw away garbage and rewrite it way faster.

[–] Buddahriffic@lemmy.world 3 points 2 hours ago

It's not even a junior dev. It might "understand" a wider and deeper set of things than a junior dev does, but at least junior devs might have a sense of coherency to everything they build.

I use gen AI at work (because they want me to) and holy shit is it "deceptive". In quotes because it has no intent at all, but it is just good enough to make it seem like it mostly did what was asked, but you look closer and you'll see it isn't following any kind of paradigms, it's still just predicting text.

The amount of context it can include in those predictions is impressive, don't get me wrong, but it has zero actual problem solving capability. What it appears to "solve" is just pattern matching the current problem to a previous one. Same thing with analysis, brainstorming, whatever activity can be labelled as "intelligent".

Hallucinations are just cases where it matches a pattern that isn't based on truth (either mispredicting or predicting a lie). But also goes the other way where it misses patterns that are there, which is horrible for programming if you care at all about efficiency and accuracy.

It'll do things like write a great helper function that it uses once but never again, maybe even writing a second copy of it the next time it would use it. Or forgetting instructions (in a context window of 200k, a few lines can easily get drowned out).

Code quality is going to suffer as AI gets adopted more and more. And I believe the problem is fundamental to the way LLMs work. The LLM-based patches I've seen so far aren't going to fix it.

Also, as much as it's nice to not have to write a whole lot of code, my software dev skills aren't being used very well. It's like I'm babysitting an expert programmer with alzheimer's but thinks they are still at their prime and don't realize they've forgotten what they did 5 minutes ago, but my company pays them big money and get upset if we don't use his expertise and probably intend to use my AI chat logs to train my replacement because everything I know can be parsed out of those conversations.

[–] forrgott@lemmy.zip 4 points 5 hours ago

Or maybe don't try and drive a screw in with a hammer?

It's just not good for 99% of the shit it's marketed for. Sorry.

[–] TheReturnOfPEB@reddthat.com 3 points 5 hours ago

WALL OF TEXT that says inadvertently that junior devs should be treated like machines not people.

[–] python@lemmy.world 26 points 17 hours ago (5 children)

Recently had to call out a coworker for vibecoding all her unit tests. How did I know they were vibe coded? None of the tests had an assertion, so they literally couldn't fail.

[–] ch00f@lemmy.world 17 points 16 hours ago (1 children)

Vibe coding guy wrote unit tests for our embedded project. Of course, the hardware peripherals aren’t available for unit tests on the dev machine/build server, so you sometimes have to write mock versions (like an “adc” function that just returns predetermined values in the format of the real analog-digital converter).

Claude wrote the tests and mock hardware so well that it forgot to include any actual code from the project. The test cases were just testing the mock hardware.

[–] 87Six@lemmy.zip 8 points 16 hours ago

Not realizing that should be an instant firing. The dev didn't even glance a look at the unit tests...

[–] urandom@lemmy.world 3 points 13 hours ago (2 children)

That's weird. I've made it write a few tests once, and it pretty much made them in the style of other tests in the repo. And they did have assertions.

[–] clif@lemmy.world 1 points 6 hours ago* (last edited 6 hours ago)

My company is pushing LLM code assistants REALLY hard (like, you WILL use it but we're supposedly not flagging you for termination if you don't... yet). My experience is the same as yours - unit tests are one of the places where it actually seems to do pretty good. It's definitely not 100%, but in general it's not bad and does seem to save some time in this particular area.

That said, I did just remove a test that it created that verified that IMPORTED_CONSTANT is equal to localUnitTestConstantWithSameHardcodedValueAsImportedConstant. It passed ; )

[–] rumba@lemmy.zip 3 points 12 hours ago (1 children)

Trust with verification. I've had it do everything right, I've had it do thing so incredibly stupid that even a cursory glance at the could would me more than enough to /clear and start back over.

claude code is capable of producing code and unit tests, but it doesn't always get it right. It's smart enough that it will keep trying until it gets the result, but if you start running low on context it'll start getting worse at it.

I wouldn't have it contribute a lot of code AND unit tests in the same session. new session, read this code and make unit tests. new session read these unit tests, give me advice on any problems or edge cases that might be missed.

To be fair, if you're not reading what it's doing and guiding it, you're fucking up.

I think it's better as a second set of eyes than a software architect.

[–] urandom@lemmy.world 2 points 10 hours ago

I think it’s better as a second set of eyes than a software architect.

A rubber ducky that talks back is also a good analogy for me.

I wouldn’t have it contribute a lot of code

Yeah, I tried that once, for a tedious refactoring. It would've been faster if I did it myself tbh. Telling it to do small tedious things, and keeping the interesting things for yourself (cause why would you deprive yourself of that ...) is currently where I stand with this tool

[–] fallaciousBasis@lemmy.world 6 points 17 hours ago

Hahaha 🤣

[–] nutsack@lemmy.dbzer0.com 5 points 17 hours ago

if you reject her pull requests, does she fix it? is there a way for management to see when an employee is pushing bad commits more frequently than usual?

load more comments (1 replies)
[–] Not_mikey@lemmy.dbzer0.com 26 points 18 hours ago* (last edited 18 hours ago) (1 children)

Guy selling ai coding platform says other AI coding platforms suck.

This just reads like a sales pitch rather than journalism. Not citing any studies just some anecdotes about what he hears "in the industry".

Half of it is:

You're measuring the wrong metrics for productivity, you should be using these new metrics that my AI coding platform does better on.

I know the AI hate is strong here but just because a company isn't pushing AI in the typical way doesn't mean they aren't trying to hype whatever they're selling up beyond reason. Nearly any tech CEO cannot be trusted, including this guy, because they're always trying to act like they can predict and make the future when they probably can't.

[–] yabbadabaddon@lemmy.zip 8 points 18 hours ago

My take exactly. Especially the bits about unit tests. If you cannot rely on your unit tests as a first assessment of your code quality, your unit tests are trash.

And not every company runs GitHub. The metrics he's talking about are DevOps metrics and not development metrics. For example In my work, nobody gives a fuck about mean time to production. We have a planning schedule and we need the ok from our customers before we can update our product.

[–] garbage_world@lemmy.world 6 points 15 hours ago (2 children)

I find this hard to believe, unless it's talking about 100% vibecoding

[–] Jakeroxs@sh.itjust.works 2 points 11 hours ago

Yeah it is, it brings up a lot of good points that often don't get talked about by the anti-AI folks (the sky is falling/AI is horrible) and extreme pro-AI folks ("we're going to replace all the workers with AI")

You absolutely have to know what the AI is doing at least somewhat to be able to call it out when it's clearly wrong/heading down a completely incorrect path.

[–] 87Six@lemmy.zip 6 points 15 hours ago (1 children)

recent attempt to rewrite SQLite in Rust using AI

I think it is talking 100% vibe code. And yea it's pretty useful if you don't abuse it

[–] rumba@lemmy.zip 4 points 12 hours ago (1 children)

Yeah, it's really good at short bursts of complicated things. Give me a curl statement to post this file as a snippet into slack. Give me a connector bot from Ollama to and from Meshtastic, it'll give you serviceable, but not perfect code.

When you get to bigger, more complicated things, it needs a lot of instruction, guard rails and architecture. You're not going to just "Give me SQLite but in Rust, GO" and have a good time.

I've seen some people architect some crazy shit. You do this big long drawn out project, tell it to use a small control orchestrator, set up many agents and have each agent do part of the work, have it create full unit tests, be demanding about best practice, post security checks, oroborus it and let it go.

But it's expensive, and we're still getting venture capital tokens for less than cost, and you'll still have hard-to-find edge cases. Someone may eventually work out a fairly generic way to set it up to do medium scale projects cleanly, but it's not now and there are definite limits to what it can handle. And as always, you'll never be able to trust that it's making a safe app.

[–] 87Six@lemmy.zip 2 points 11 hours ago* (last edited 11 hours ago)

Yea I find that I need to instruct it comparably to a junior to do any good work...And our junior standard - trust me - is very very low.

I usually spam the planning mode and check every nook of the plan to make sure it's right before the AI even touches the code.

I still can't tell if it's faster or not compared to just doing things myself...And as long as we aren't allocated time to compare end to end with 2 separate devs of similar skill there's no point even trying to guess imho. Though I'm not optimistic. I may just be wasting time.

And yea, the true costs per token are probably double than what they are today, if not more...

[–] magiccupcake@lemmy.world 30 points 21 hours ago

I love this bit especially

Insurers, he said, are already lobbying state-level insurance regulators to win a carve-out in business insurance liability policies so they are not obligated to cover AI-related workflows. "That kills the whole system," Deeks said. Smiley added: "The question here is if it's all so great, why are the insurance underwriters going to great lengths to prohibit coverage for these things? They're generally pretty good at risk profiling."

[–] toad@sh.itjust.works 2 points 13 hours ago

It IS working well for what it is - a word processor that's super expensive to run. It's because there idiots thought the world was gonna end and that we were gonna have flying cars going around.

[–] Thorry@feddit.org 103 points 1 day ago (3 children)

Yeah these newer systems are crazy. The agent spawns a dozen subagents that all do some figuring out on the code base and the user request. Then those results get collated, then passed along to a new set of subagents that make the actual changes. Then there are agents that check stuff and tell the subagents to redo stuff or make changes. And then it gets a final check like unit tests, compilation etc. And then it's marked as done for the user. The amount of tokens this burns is crazy, but it gets them better results in the benchmarks, so it gets marketed as an improvement. In reality it's still fucking up all the damned time.

Coding with AI is like coding with a junior dev, who didn't pay attention in school, is high right now, doesn't learn and only listens half of the time. It fools people into thinking it's better, because it shits out code super fast. But the cognitive load is actually higher, because checking the code is much harder than coming up with it yourself. It's slower by far. If you are actually going faster, the quality is lacking.

[–] merc@sh.itjust.works 4 points 7 hours ago

checking the code is much harder than coming up with it yourself

That's always been true. But, at least in the past when you were checking the code written by a junior dev, the kinds of mistakes they'd make were easy to spot and easy to predict.

LLMs are created in such a way that they produce code that genuinely looks perfect at first. It's stuff that's designed to blend in and look plausible. In the past you could look at something and say "oh, this is just reversing a linked list". Now, you have to go through line by line trying to see if the thing that looks 100% plausible actually contains a tiny twist that breaks everything.

[–] chunkystyles@sopuli.xyz 17 points 22 hours ago (1 children)

This is very different from my experience, but I've purposely lagged behind in adoption and I often do things the slow way because I like programming and I don't want to get too lazy and dependent.

I just recently started using Claude Code CLI. With how I use it: asking it specific questions and often telling it exactly what files and lines to analyze, it feels more like taking to an extremely knowledgeable programmer who has very narrow context and often makes short-sighted decisions.

I find it super helpful in troubleshooting. But it also feels like a trap, because I can feel it gaining my trust and I know better than to trust it.

load more comments (1 replies)
load more comments (1 replies)
[–] jimmux@programming.dev 46 points 1 day ago (1 children)

We never figured out good software productivity metrics, and now we're supposed to come up with AI effectiveness metrics? Good luck with that.

[–] Senal@programming.dev 14 points 20 hours ago (1 children)

Sure we did.

"Lines Of Code" is a good one, more code = more work so it must be good.

I recently had a run in with another good one : PR's/Dev/Month.

Not only it that one good for overall productivity, it's a way to weed out those unproductive devs who check in less often.

This one was so good, management decided to add it to the company wide catchup slides in a section espousing how the new AI driven systems brought this number up enough to be above other companies.

That means other companies are using it as well, so it must be good.

[–] SaharaMaleikuhm@feddit.org 12 points 19 hours ago (2 children)

Why is it always the dumbest people who become managers?

[–] DickFiasco@sh.itjust.works 76 points 1 day ago (7 children)

AI is a solution in search of a problem. Why else would there be consultants to "help shepherd organizations towards an AI strategy"? Companies are looking to use AI out of fear of missing out, not because they need it.

[–] ultimate_worrier@lemmy.dbzer0.com 36 points 1 day ago* (last edited 1 day ago)

Exactly. I’ve heard the phrase “falling behind” from many in upper management.

load more comments (6 replies)
[–] Avicenna@programming.dev 3 points 15 hours ago (1 children)

"Codestrap founders"

https://www.codestrap.com/

Let me guess they will spearhead the correct way to use AI?

load more comments (1 replies)
[–] CubitOom@infosec.pub 59 points 1 day ago

Generative models, which many people call "AI", have a much higher catastrophic failure rate than we have been lead to believe. It cannot actually be used to replace humans, just as an inanimate object can't replace a parent.

Jobs aren't threatened by generative models. Jobs are threatened by a credit crunch due to high interest rates and a lack of lenders being able to adapt.

"AI" is a ruse, a useful excuse that helps make people want to invest, investors & economists OK with record job loss, and the general public more susceptible to data harvesting and surveillance.

[–] raven@lemmy.org 4 points 17 hours ago (3 children)

I once saw someone sending ChatGPT and Gemini Pro in a constant loop by asking "Is seahorse emoji real?". That responses were in a constant loop. I have heard that the theory of "Mandela Effect" in this case is not true. They say that the emoji existed on Microsoft's MSN messenger and early stages of Skype. Don't know how much of it is true. But it was fun seeing artificial intelligence being bamboozled by real intelligence. The guy was proving that AI is just a tool, not a permanent replacement of actual resources.

load more comments (3 replies)
[–] luciole@beehaw.org 39 points 1 day ago (3 children)

This is all fine and dandy but the whole article is based on an interview with "Dorian Smiley, co-founder and CTO of AI advisory service Codestrap". Codestrap is a Palantir service provider, and as you'd expect Smiley is a Palantir shill.

The article hits different considering it's more or less a world devourer zealot taking a jab at competing world devourers. The reporter is an unsuspecting proxy at best.

load more comments (3 replies)
load more comments
view more: next ›