overview for BigMuffN69

AI coders think they’re 20% faster — but they’re actually 19% slower in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 10 points 4 months ago* (last edited 4 months ago)

Yeah, METR was the group that made the infamous AI IS DOUBLING EVERY 4-7 MONTHS GRAPH where the measurement was 50% success at SWE tasks based on the time it took a human to complete it. Extremely arbitrary success rate, very suspicious imo. They are fanatics trying to pinpoint when the robo god recursive self improvement loop starts.

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 13th July 2025 in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 7 points 4 months ago

One more comment, idk if ya'll remember that forecast that came out in April(? iirc ?) where the thesis was the "time an AI can operate autonomously is doubling every 4-7 months." AI-2027 authors were like "this is the smoking gun, it shows why are model is correct!!"

They used some really sketchy metric where they asked SWEs to do a task, measured the time it took and then had the models do the task and said that the model's performance was wherever it succeeded at 50% of the tasks based on the time it took the SWEs (wtf?) and then they drew an exponential curve through it. My gut feeling is that the reason they choose 50% is because other values totally ruin the exponential curve, but I digress.

Anyways they just did the metrics for Claude 4, the first FrOnTiEr model that came out since they made their chart and... drum roll no improvement... in fact it performed worse than O3 which was first announced last December (note instead of using the date O3 was announced in 2024, they used the date where it was released months later so on their chart it make 'line go up'. A valid choice I guess, but a choice nonetheless.)

This world is a circus tent, and there still aint enough room for all these fucking clowns.

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 13th July 2025 in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 11 points 4 months ago

https://www.wired.com/story/openworm-worm-simulator-biology-code/

Really interesting piece about how difficult it actually is to simulate "simple" biological structures in silicon.

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 13th July 2025 in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 11 points 4 months ago (2 children)

It's kind of telling that it's only been a couple months since that fan fic was published and there is already so much defensive posturing from the LW/EA community. I swear the people who were sharing it when it dropped and tacitly endorsing it as the vision of the future from certified prophet Daniel K are like, "oh it's directionally correct, but too aggressive" Note that we are over halfway through 2025 and the earliest prediction of agents entering the work force is already fucked. So if you are a 'super forecaster' (guru) you can do some sleight of hand now to come out against the model knowing the first goal post was already missed and the tower of conditional probabilities that rest on it is already breaking.

Funniest part is even one of authors themselves seem to be panicking too as even they can tell they are losing the crowd and is falling back on this "It's not the most likely future, it's the just the most probable." A truly meaningless statement if your goal is to guide policy since events with arbitrarily low probability density can still be the "most probable" given enough different outcomes.

Also, there's literally mass brain uploading in AI-2027. This strikes me as physically impossible in any meaningful way in the sense that the compute to model all molecular interactions in a brain would take a really, really, really big computer. But I understand if your religious beliefs and cultural convictions necessitate big snake 🐍 to upload you, then I will refrain from passing judgement.

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 13th July 2025 in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 13 points 4 months ago (2 children)

Bummer, I wasn't on the invite list to the hottest SF wedding of 2025.

Update your mental models of Claude lads.

Because if the wife stuff isn't true, what else could Claude be lying about? The vending machine business?? The blackmail??? Being bad at Pokemon????

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 6th July 2025 in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 9 points 4 months ago* (last edited 4 months ago) (1 children)

Bruh, there's a part where he laments that he had a hard time getting into meditation because he was paranoid that it was a form of wire heading. Beyond parody. The whole profile is 🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 6th July 2025 in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 11 points 4 months ago (1 children)

To be clear, I strongly disagree with the claim. I haven't seen any evidence that "reasoning" models actually address any of the core blocking issues- especially reliably working within a given set of constraints/being dependable enough to perform symbolic algorithms/or any serious solution to confabulations. I'm just not going to waste my time with curve pointers who want to die on the hill of NeW sCaLiNG pArAdIgM. They are just too deep in the kool-aid at this point.

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 6th July 2025 in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 7 points 4 months ago* (last edited 4 months ago) (1 children)

gross. You'd think the guy running the site directly insulting him would make him realize maybe lw simply aint it

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 6th July 2025 in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 16 points 4 months ago* (last edited 4 months ago)

One thing I have wondered about. The rats always have that graphic of the IQ of Einstein vs the village idiot being almost imperceptible vs the IQ of the super robo god. If that's the case, why the hell do we only want our best and brightest doing "alignment research"? The village idiot should be almost just as good!

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 6th July 2025 in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 16 points 4 months ago* (last edited 4 months ago) (10 children)

Actually burst a blood vessel last weekend raging. Gary Marcus was bragging about his prediction record in 2024 being flawless

Gary continuing to have the largest ego in the world. Stay tuned for his upcoming book "I am God" when 2027 comes around and we are all still alive. Imo some of these are kind of vague and I wouldn't argue with someone who said reasoning models are a substantial advance, but my God the LW crew fucking lost their minds. Habryka wrote a goddamn essay about how Gary was a fucking moron and is a threat to humanity for underplaying the awesome power of super-duper intelligence and a worse forecaster than the big brain rationalist. To be clear Habryka's objections are overall- extremely fucking nitpicking totally missing the point dogshit in my pov (feel free to judge for yourself)

https://xcancel.com/ohabryka/status/1939017731799687518#m

But what really made me want to drive a drill to the brain was the LW brigade rallying around the claim that AI companies are profitable. Are these people straight up smoking crack? OAI and Anthropic do not make a profit full stop. In fact they are setting billions of VC money on fire?! (strangely, some LWers in the comments seemed genuinely surprised that this was the case when shown the data, just how unaware are these people?) Oliver tires and fails to do Olympic level mental gymnastics by saying TSMC and NVDIA are making money, so therefore AI is extremely profitable. In the same way I presume gambling is extremely profitable for degenerates like me because the casino letting me play is making money. I rank the people of LW as minimally truth seeking and big dumb out of 10. Also weird fun little fact, in Daniel K's predictions from 2022, he said by 2023 AI companies would be so incredibly profitable that they would be easily recuperating their training cost. So I guess monopoly money that you can't see in any earnings report is the official party line now?

Stubsack: Stubsack: weekly thread for sneers not worth an entire post, week ending 6th July 2025 in c/techtakes@awful.systems

[–] BigMuffN69@awful.systems 11 points 4 months ago

An interesting takedown of "superforecasting" from Ben Recht, a 3 part series on his substack where he accuses so called super forecasters of abusing scoring rewards over actually being precogs. First (and least technical) part linked below...

https://www.argmin.net/p/in-defense-of-defensive-forecasting

"The term Defensive Forecasting was coined by Vladimir Vovk, Akimichi Takemura, and Glenn Shafer in a brilliant 2005 paper, crystallizing a general view of decision making that dates back to Abraham Wald. Wald envisions decision making as a game. The two players are the decision maker and Nature, who are in a heated duel. The decision maker wants to choose actions that yield good outcomes no matter what the adversarial Nature chooses to do. Forecasting is a simplified version of this game, where the decisions made have no particular impact and the goal is simply to guess which move Nature will play. Importantly, the forecaster’s goal is not to never be wrong, but instead to be less wrong than everyone else.*

*Yes, I see what I did there."