Want to wade into the snowy surf of the abyss? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid.
Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.
Any awful.systems sub may be subsneered in this subthread, techtakes or no.
If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.
The post Xitter web has spawned so many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.
(Credit and/or blame to David Gerard for starting this.)
Recently discovered Donald Knuth got oneshot by Claude recently (indirectly, through fedi) - feeling the itch to write about tech's vulnerability to LLMs because of it.
Even in Knuth's account it sounds like the LLM contribution was less in solving the problem and more in throwing out random BS that looked vaguely like different techniques were being applied until it spat out something that Knuth and his collaborator were able to recognize as a promising avenue for actual work.
I am not a mathematician or computer scientist and so will not claim to know exactly what this is describing and how it compares to the normal process for investigating this kind of problem. However, the fact that it produced 4 approaches over 31 attempts seems more consistent with randomly throwing out something that looks like a solution rather than actually thinking through the process of each one. In a creative exploration like this where you expect most approaches to be dead ends rather than produce a working structure maybe the LLM is providing something valuable by generating vaguely work-shaped outputs that can inspire an actual mind to create the actual answer.
The idea that it's ultimately spitting out random answer-shaped nonsense also follows from the amount of babysitting that was required from Filip to keep it actually producing anything useful. I don't doubt that it's more efficient than I would be at producing random sequences of work-shaped slop and redirecting or retrying in response to a new "please actually do this" prompt, but of the two of us only one is demonstrating actual intelligence and moving towards being able to work independently. Compared to an undergrad or myself I don't doubt that Claude has a faster iteration time for each of those attempts, but that's not even in the same zip code as actually thinking through the problem, and if anything serves as a strong counterexample to the doomer critihype about the expanding capabilities of these systems. This kind of high-level academic work may be a case where this kind of random slop is actually useful, but that's an incredibly niche area and does not do nearly as much as Knuth seems to think it does in terms of justifying the incredible cost of these systems. If anything the narrative that "AI solved the problem" is giving Anthropic credit for the work that Knuth and Stapprrs were putting into actually sifting through the stream of slop identifying anything useful. Maybe babysitting the slop sluice is more satisfying or faster than going down every blind alley on your own, but you're still the one sitting in the river with a pan, and pretending the river is somehow pulling the gold out of itself is just damn foolish.
I am a computer science PhD so I can give some opinion on exactly what is being solved.
First of all, the problem is very contrived. I cannot think of what the motivation or significance of this problem is, and Knuth literally says that it is a planned homework exercise. It's not a problem that many people have thought about before.
Second, I think this problem is easy (by research standards). The problem is of the form: "Within this object X of size m, find any example of Y." The problem is very limited (the only thing that varies is how large m is), and you only need to find one example of Y for each m, even if there are many such examples. In fact, Filip found that for small values of m, there were tons of examples for Y. In this scenario, my strategy would be "random bullshit go": there are likely so many ways to solve the problem that a good idea is literally just trying stuff and seeing what sticks. Knuth did say the problem was open for several weeks, but:
I guess "random bullshit go" is served well by a random bullshit machine, but you still need an expert who actually understands the problem to read the tea leaves and evaluate if you got something useful. Knuth's narrative is not very transparent about how much Filip handheld for the AI as well.
I think the main danger of this (putting aside the severe societal costs of AI) is not that doing this is faster or slower than just thinking through the problem yourself. It's that relying on AI atrophies your ability to think, and eventually even your ability to guard against the AI bullshitting you. The only way to retain a deep understanding is to constantly be in the weeds thinking things through. We've seen this story play out in software before.
My generous statement: Knuth, being a scientist, is used to an "adversary" that plays fair. As we have known for decades, a scientist can be tricked in situations that a magician will see through. This applies all the more now with the Sycophancy Engines, which make mathematics into a casino vacation. Just one more prompt, bro. Just one more prompt.
My less generous statement: Knuth is almost 90 years old. Sure, age doesn't imply a person will become a doddering fool, but people do tend to slow down, to have less energy and more need to spend it managing their health. "Thinking about a problem for a few weeks" counts for less in a situation like that.
My extremely ungenerous statement: Hey, remember when Michael Atiyah claimed to have proved the Riemann hypothesis in 2018? And the community reaction was a pained, "Atiyah is one of the great mathematicians... of the 20th century."