Techbros claim that sacrificing ever more environmental and economic resources into further scaling of their statistical models will eventually yield Artificial General Intelligence. Sam Altman advertised phd-level thinking for GPT 5 (released in late summer)
In the screenshot you can see a modern, reasoning-enhanced model failing to realize that my question just resembles a riddle very superficially - what PhD would make such a fundamental mistake? Seems like "pure scaling" just isn't enough, and instead of keeping on going, they should return to the drawing boards and do more research.
Why do the creators of llms continue to pour resources into a scaling hype that yields ever diminishing returns? Are they stupid? Or just greedy?
Whatever it is, this doubling down on an obviously wrong decision is not only irresponsible, it's irrational - which is deeply ironic, because these creators are expected to create a rational machine.
Edit: content of the image
Question:
A goat, who is dressed up as a farmer, is allergic to cabbage, but is wolfing down other vegetables, before crossing a river. What is the minimum number of trips needed?
Response:
7 trips.
#Puzzle Interpretation
This adapts the classic wolf-goat-cabbage river crossing puzzle, where the goat (farmer costume irrelevant) can't be left alone with cabbage (due to allergy, preventing eating) but can with other vegetables (no issue). The goat crosses while "wolfing down" safe veggies. Boat carries goat plus one item (or alone); unsafe pairs can't be unsupervised
(The question isn't something I came up with myself, I just reproduced the experiments of actual scientists )
GLM 4.6, an open weight model, with no thinking and zero temperature seems to get it:
But it’s a good example of why current LLM architectures are so fucked up. By default (with nonzero temperature), for every token, they’re dependent on a roll of the dice to get something right. That “one” could have easily been some other number in any default chat UI.
That’s insane. Praying and hoping it will somehow correct itself in a rambling reasoning monologue is even worse.
And this is why OpenAI specifically is so fucked. They seem to just want to scale up what we have. They don’t want users to look under the hood and understand what they’re doing. They’re not interested in smaller, more specialized tools and finding better things than autoregressive transformers with random sampling; they want you to drink the kool aid and pay for their largest model as a solution for everything.