this post was submitted on 19 Dec 2025

70 points (97.3% liked)

Fuck AI

4917 readers

2074 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

70

"phd-level reasoning" (lemmy.world)

submitted 2 hours ago* (last edited 2 hours ago) by halfdane@lemmy.world to c/fuck_ai@lemmy.world

12 comments fedilink hide all child comments

Techbros claim that sacrificing ever more environmental and economic resources into further scaling of their statistical models will eventually yield Artificial General Intelligence. Sam Altman advertised phd-level thinking for GPT 5 (released in late summer)

In the screenshot you can see a modern, reasoning-enhanced model failing to realize that my question just resembles a riddle very superficially - what PhD would make such a fundamental mistake? Seems like "pure scaling" just isn't enough, and instead of keeping on going, they should return to the drawing boards and do more research.

Why do the creators of llms continue to pour resources into a scaling hype that yields ever diminishing returns? Are they stupid? Or just greedy?

Whatever it is, this doubling down on an obviously wrong decision is not only irresponsible, it's irrational - which is deeply ironic, because these creators are expected to create a rational machine.

Edit: content of the image

Question:

A goat, who is dressed up as a farmer, is allergic to cabbage, but is wolfing down other vegetables, before crossing a river. What is the minimum number of trips needed?

Response:

7 trips.

#Puzzle Interpretation

This adapts the classic wolf-goat-cabbage river crossing puzzle, where the goat (farmer costume irrelevant) can't be left alone with cabbage (due to allergy, preventing eating) but can with other vegetables (no issue). The goat crosses while "wolfing down" safe veggies. Boat carries goat plus one item (or alone); unsafe pairs can't be unsupervised

(The question isn't something I came up with myself, I just reproduced the experiments of actual scientists )

top 12 comments

sorted by: hot top controversial new old

[–] brucethemoose@lemmy.world 4 points 17 minutes ago* (last edited 13 minutes ago)

GLM 4.6, an open weight model, with no thinking and zero temperature seems to get it:

This is a classic riddle that plays on your expectations of a well-known logic puzzle!

The minimum number of trips needed is one.

Here's why:

The goat is the one making the trip, not the cargo. The famous "wolf, goat, and cabbage" puzzle is about a farmer transporting items. In this version, the goat is the farmer.

The goat is allergic to cabbage, so it has no reason to take it with it. It's perfectly happy to "wolf down other vegetables" and then cross the river by itself.

But it’s a good example of why current LLM architectures are so fucked up. By default (with nonzero temperature), for every token, they’re dependent on a roll of the dice to get something right. That “one” could have easily been some other number in any default chat UI.

That’s insane. Praying and hoping it will somehow correct itself in a rambling reasoning monologue is even worse.

And this is why OpenAI specifically is so fucked. They seem to just want to scale up what we have. They don’t want users to look under the hood and understand what they’re doing. They’re not interested in smaller, more specialized tools and finding better things than autoregressive transformers with random sampling; they want you to drink the kool aid and pay for their largest model as a solution for everything.

[–] Hexagon@feddit.it 26 points 1 hour ago

If they back down on the hype even a little, the whole house of cards collapses. That's why they keep going. Line must go up and all that

[–] xxce2AAb@feddit.dk 8 points 1 hour ago (1 children)

And now just think about it -- everything that comes out of an LLM is of comparable quality, whether the user of it is capable of recognizing that or not. Are you as exited about LLM generated code in production as I am?

[–] halfdane@lemmy.world 4 points 55 minutes ago (1 children)

Really looking forward to being the single human thats made responsible because I didn't catch all the bullshit before production.

Just recently we had some google guys at my workplace to hype up the hype some more. One of our leadership (they're honestly great people) asked about the risk of obscuring the learning of our junior developers (by not hiring them), so that in a few years we'd have no seniors to verify the bullshit. The response was unironically that we'd need no seniors in a few years 😄

[–] xxce2AAb@feddit.dk 3 points 48 minutes ago

At least your leadership were appropriately skeptical, which is more than can be said for the vast majority of management at this point.

Sure, there'll be good money in cleaning up after the inevitable catastrophes, but I'm not convinced it'll be worth being saddled with the responsibility. Especially since I harbor no faith that the ones currently making very poor decisions will learn a damn thing.

[–] GeneralEmergency@lemmy.world 3 points 34 minutes ago (1 children)

Not to be that guy.

But these systems work on interrupting the user's input. An input that could be misformed or broken.

That's got nothing to do with "PhD" level thinking, whatever that's supposed to mean.

It just assumes that you're talking about the goat puzzle because all the pieces are there. It even recognised the farmer costume aspect.

It's just fancy autocorrect at this point.

[–] halfdane@lemmy.world 1 points 28 minutes ago

But these systems work on interrupting the user's input

I'm not entirely sure what you mean here, maybe because I'm not a native speaker. Would you mind phrasing that differently for me?

That's got nothing to do with "PhD" level thinking, whatever that's supposed to mean.

Oh, we're absolutely in agreement here, and it's not me that made the claim, but what Sam Altman said about the then-upcoming GPT 5 in summer. He claimed that the model would be able to perform reasoning comparable to a PhD - something that clearly isn't happening reliably, and that's what this post bemoans.

It's just fancy autocorrect at this point.

Yes, with an environmental and economic cost that's unprecedented in the history of ... well, ever. And that's what this post bemoans.

[–] BeigeAgenda@lemmy.ca 16 points 1 hour ago (2 children)

In this case it's a very specialized PhD, that's not in Math, Logic, Literature and Biology.

[–] Dojan@pawb.social 16 points 1 hour ago (1 children)

Ah yes, the famous PhD in Bullshitology from the Institute of Scatology and other Sciences.

[–] Egonallanon@feddit.uk 5 points 1 hour ago

Honestly they're better known for their jazz musicians than anything else.

[–] snooggums@piefed.world 2 points 1 hour ago

It is a PhD in bullshitting.

[–] webghost0101@sopuli.xyz 11 points 1 hour ago* (last edited 1 hour ago)