overview for scruiser

A narcissist and a schizophrenic walk into a bar in c/sneerclub@awful.systems

[–] scruiser@awful.systems 4 points 1 month ago

He hasn't missed an opportunity to ominously play up genAI capabilities (I remember him doing so as far back as AI dungeon), so it will be a real break for him to finally admit how garbage their output is.

Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025 in c/techtakes@awful.systems

[–] scruiser@awful.systems 6 points 1 month ago

Loose Mission Impossible Spoilers

The latest Mission Impossible movie features a rogue AI as one of the main antagonists. But on the other hand, the AI's main powers are lies, fake news, and manipulation, and it only gets as far as it does because people allow fear to make themselves manipulable and it relies on human agents to do a lot of its work. So in terms of promoting the doomerism narrative, I think the movie could actually be taken as opposing the conventional doomer narrative in favor of a calm, moderate, internationally coordinated (the entire plot could have been derailed by governments agreeing on mutual nuclear disarmament before the AI subverted them) response against AI's that ultimately have only moderate power.

Adding to the post-LLM hype predictions: I think post LLM bubble popping, "Terminator" style rogue AI movie plots don't go away, but take on a different spin. Rogue AI's strength's are going to be narrower, their weaknesses are going to get more comical and absurd, and idiotic human actions are going to be more of a factor. For weaknesses it will be less "failed to comprehend love" or "cleverly constructed logic bomb breaks its reasoning" and more "forgets what it was doing after getting drawn into too long of a conversation". For human actions it will be less "its makers failed to anticipate a completely unprecedented sequence of bootstrapping and self improvement" and more "its makers disabled every safety and granted it every resource it asked for in the process of trying to make an extra dollar a little bit faster".

A narcissist and a schizophrenic walk into a bar in c/sneerclub@awful.systems

[–] scruiser@awful.systems 13 points 1 month ago* (last edited 1 month ago) (2 children)

He's set up a community primed to think the scientific establishment's focus on falsifiablility and peer review is fundamentally worse than "Bayesian" methods, and that you don't need credentials or even conventional education or experience to have revolutionary good ideas, and strengthened the already existing myth of lone genii pushing science forward (as opposed to systematic progress). Attracting cranks was an inevitable outcome. In fact, Eliezer occasionally praises cranks when he isn't able to grasp their sheer crankiness (for instance, GeneSmith's ideas are total nonsense for anyone with more familiarity with genetics than skimming relevant-sounding scientific publications and garbage pop-sci journalism, but Eliezer commented favorably). The only thing that has changed is ChatGPT and it's clones glazing cranks first making them even more deluded. And of course, someone (cough Eliezer) was hyping up ChatGPT as far back as GPT-2, so it's only to be expected that cranks would think LLMs were capable of providing legitimate useful feedback.

Not a fan of yud but getting daily emails from delulus would drive me to wish for the basilisk

He's deliberately cultivated an audience willing to hear cranks out, so this is exactly what he deserves.

Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025 in c/techtakes@awful.systems

[–] scruiser@awful.systems 12 points 1 month ago

This connection hadn't occured to me before, but the Starship Troopers scenes (in the book) where they claim to have mathematically rigorous proofs about various moral statements or actions or societal constructs reminds me of how Eliezer has a decision theory in mind with all sorts of counter intuitive claims (it's mathematically valid to never ever give into any blackmail or threats or anything adjacent to them), but hasn't actually written out his decision theory in rigorous well defined terms that can pass peer review or be used to figure out anything beyond some pre-selected toy problems.

Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025 in c/techtakes@awful.systems

[–] scruiser@awful.systems 10 points 1 month ago

There are parts of the field that have major problems, like the sorts of studies that get done on 20 student volunteers and then get turned into a pop psychology factoid that gets tossed around and over-generalized while the original study fails to replicate, but there are parts that are actually good science.

Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025 in c/techtakes@awful.systems

[–] scruiser@awful.systems 7 points 1 month ago (1 children)

I wouldn't say even that part works so well, given how Mt. Moon is such a major challenge even with all the features like that.

Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025 in c/techtakes@awful.systems

[–] scruiser@awful.systems 15 points 1 month ago

Every AI winter, the label AI becomes unwanted and people go with other terms (expert systems, machine learning, etc.)... and I've come around to thinking this is a good thing, as it forces people to specify what it is they actually mean, instead of using a nebulous label with many science fiction connotations that lumps together decent approaches and paradigms with complete garbage and everything in between.

Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025 in c/techtakes@awful.systems

[–] scruiser@awful.systems 13 points 1 month ago (1 children)

No, I think BlueMonday is being reasonable. The article has some quotes from scientists with actually relevant expertise, but it uncritically mixes them with LLM hype and speculation in a typical both sides sort of thing that gives lay readers the (false) impression that both sides are equal. This sort of journalism may appear balanced, but it ultimately has contributed to all kinds of controversies (from Global Warming to Intelligent Design to medical pseudoscience) where the viewpoints of cranks and uninformed busybodies and autodidacts of questionable ability and deliberate fraudsters get presented equally with actually educated and researched viewpoints.

Stubsack: weekly thread for sneers not worth an entire post, week ending 1st June 2025 in c/techtakes@awful.systems

[–] scruiser@awful.systems 21 points 1 month ago (3 children)

A new LLM plays pokemon has started, with o3 this time. It plays moderately faster, and the twitch display UI is a little bit cleaner, so it is less tedious to watch. But in terms of actual ability, so far o3 has made many of the exact same errors as Claude and Gemini including: completely making things up/seeing things that aren't on the screen (items in Virdian Forest), confused attempts at navigation (it went back and forth on whether the exit to Virdian Forest was in the NE or NW corner), repeating mistakes to itself (both the items and the navigation issues I mentioned), confusing details from other generations of Pokemon (Nidoran learns double kick at level 12 in Fire Red and Leaf Green, but not the original Blue/Yellow), and it has signs of being prone to going on completely batshit tangents (it briefly started getting derailed about sneaking through the tree in Virdian Forest... i.e. moving through completely impassable tiles).

I don't know how anyone can watch any of the attempts at LLMs playing Pokemon and think (viable) LLM agents are just around the corner... well actually I do know: hopium, cope, cognitive bias, and deliberate deception. The whole LLM playing Pokemon thing is turning into less of a test of LLMs and more entertainment and advertising of the models, and the scaffold are extensive enough and different enough from each other that they really aren't showing the models' raw capabilities (which are even worse than I complained about) or comparing them meaningfully.

Stubsack: weekly thread for sneers not worth an entire post, week ending 25th May 2025 in c/techtakes@awful.systems

[–] scruiser@awful.systems 10 points 1 month ago (1 children)

Is that supposed to be an advertisement in favor of AI? (As opposed to stealth satire?) Seeing it makes me want to get off my computer and touch grass.

ChatGPT guessed this guys Dad's name. in c/techtakes@awful.systems

[–] scruiser@awful.systems 3 points 1 month ago

Wow, that is some skilled modeling. You should become a superforecaster and write ~~prophecies~~ AI timelines, they are quite popular on lesswrong.

eating our own dogshit in c/techtakes@awful.systems

[–] scruiser@awful.systems 10 points 1 month ago (2 children)

To elaborate on the other answers about alphaevolve. the LLM portion is only a component of alphaevolve, the LLM is the generator of random mutations in the evolutionary process. The LLM promoters like to emphasize the involvement of LLMs, but separate from the evolutionary algorithm guiding the process through repeated generations, LLM is as likely to write good code as a dose of radiation is likely to spontaneously mutate you to be able to breathe underwater.

And the evolutionary aspect requires a lot of compute, they don't specify in their whitepaper how big their population is or the number of generations, but it might be hundreds or thousands of attempted solutions repeated for dozens or hundreds of generations, so that means you are running the LLM for thousands or tens of thousands of attempted solutions and testing that code against the evaluation function everytime to generate one piece of optimized code. This isn't an approach that is remotely affordable or even feasible for software development, even if you reworked your entire software development process to something like test driven development on steroids in order to try to write enough tests to use them in the evaluation function (and you would probably get stuck on this step, because it outright isn't possible for most practical real world software).

Alphaevolve's successes are all very specific very well defined and constrained problems, finding specific algorithms as opposed to general software development