this post was submitted on 07 Apr 2025
38 points (100.0% liked)

TechTakes

1778 readers
106 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
 

"Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as "trivial", even when their validity was crucial."

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 25 points 6 days ago* (last edited 6 days ago) (3 children)

You didn't link to the study; you linked to the PR release for the study. This and this are the papers linked in the blog post.

Note that the papers haven't been published anywhere other than on Anthropic's online journal. Also, what the papers are doing is essentially tea leaf reading. They take a look at the swill of tokens, point at some clusters, and say, "there's a dog!" or "that's a bird!" or "bitcoin is going up this year!". It's all rubbish dawg

[–] [email protected] 18 points 6 days ago (1 children)

To be fair, the typesetting of the papers is quite pleasant and the pictures are nice.

[–] [email protected] 10 points 6 days ago

they gotta make up for all those scary cave-wall pictures somehow

[–] [email protected] 9 points 6 days ago

It's an anti-fun version of listening to dark side of the moon while watching the wizard of oz.