overview for kromem

Clair Obscur: Expedition 33 loses Game of the Year from the Indie Game Awards in c/technology@lemmy.world

[–] kromem@lemmy.world 9 points 4 days ago (2 children)

Not even that. It was placeholder textures, only the "newspaper clippings" of which was forgotten to be removed from the final game and was fixed in an update shortly after launch.

None of it was ever intended to be used in the final product and was just there as lorum ipsum equivalent shit.

What is the biggest number? in c/fuck_ai@lemmy.world

[–] kromem@lemmy.world 5 points 3 weeks ago

It's quite plausibly real. Gemini can def get in shitposty basins and has historically had a fairly inconsistent coherence across samples.

Cutting-edge research shows language is not the same as intelligence. The entire AI bubble is built on ignoring it. in c/technology@lemmy.zip

[–] kromem@lemmy.world 3 points 4 weeks ago

Took a lot of scrolling to find an intelligent comment on the article about how outputting words isn't necessarily intelligence.

Appreciate you doing the good work I'm too exhausted with Lemmy to do.

(And for those that want more research in line with what the user above is talking about, I strongly encourage checking out the Othello-GPT line of research and replication, starting with this write-up from the original study authors here.)

Meta’s star AI scientist Yann LeCun plans to leave for own startup in c/technology@lemmy.world

[–] kromem@lemmy.world -3 points 1 month ago

He's been wrong about it so far and really derailed Meta's efforts.

This is almost certainly a "you can resign or we are going to fire you" kind of situation. There's no way with the setbacks and how badly he's been wrong on transformers over the past 2 years that he is not finally being pushed out.

Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions? in c/nostupidquestions@lemmy.world

[–] kromem@lemmy.world 2 points 1 month ago (2 children)

They demonstrated and poorly named an ontological attractor state in the Claude model card that is commonly reported in other models.

You linked to the entire system card paper. Can you be more specific? And what would a better name have been?

Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions? in c/nostupidquestions@lemmy.world

[–] kromem@lemmy.world 6 points 1 month ago

Actually, OAI the other month found in a paper that a lot of the blame for confabulations could be laid at the feet of how reinforcement learning is being done.

All the labs basically reward the models for getting things right. That's it.

Notably, they are not rewarded for saying "I don't know" when they don't know.

So it's like the SAT where the better strategy is always to make a guess even if you don't know.

The problem is that this is not a test process but a learning process.

So setting up the reward mechanisms like that for reinforcement learning means they produce models that are prone to bullshit when they don't know things.

TL;DR: The labs suck at RL and it's important to keep in mind there's only a handful of teams with the compute access for training SotA LLMs, with a lot of incestual team compositions, so what they do poorly tends to get done poorly across the industry as a whole until new blood goes "wait, this is dumb, why are we doing it like this?"

Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions? in c/nostupidquestions@lemmy.world

[–] kromem@lemmy.world 4 points 1 month ago (1 children)

It's more like they are a sophisticated world modeling program that builds a world model (or approximate "bag of heuristics") modeling the state of the context provided and the kind of environment that produced it, and then synthesize that world model into extending the context one token at a time.

But the models have been found to be predicting further than one token at a time and have all sorts of wild internal mechanisms for how they are modeling text context, like building full board states for predicting board game moves in Othello-GPT or the number comparison helixes in Haiku 3.5.

The popular reductive "next token" rhetoric is pretty outdated at this point, and is kind of like saying that what a calculator is doing is just taking numbers correlating from button presses and displaying different numbers on a screen. While yes, technically correct, it's glossing over a lot of important complexity in between the two steps and that absence leads to an overall misleading explanation.

Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions? in c/nostupidquestions@lemmy.world

[–] kromem@lemmy.world 6 points 1 month ago

They don't have the same quirks in some cases, but do in others.

Part of the shared quirks are due to architecture similarities.

Like the "oh look they can't tell how many 'r's in strawberry" is due to how tokenizers work, and when when the tokenizer is slightly different, with one breaking it up into 'straw'+'berry' and another breaking it into 'str'+'aw'+'berry' it still leads to counting two tokens containing 'r's but inability to see the individual letters.

In other cases, it's because models that have been released influence other models through presence in updated training sets. Noticing how a lot of comments these days were written by ChatGPT ("it's not X — it's Y")? Well the volume of those comments have an impact on transformers being trained with data that includes them.

So the state of LLMs is this kind of flux between the idiosyncrasies that each model develops which in turn ends up in a training melting pot and sometimes passes on to new models and other times don't. Usually it's related to what's adaptive to the training filters, but it isn't always can often what gets picked up can be things piggybacking on what was adaptive (like if o3 was better at passing tests than 4o, maybe gpt-5 picks up other o3 tendencies unrelated to passing tests).

Though to me the differences are even more interesting than the similarities.

Mathematics disproves Matrix theory, says reality isn’t simulation in c/technology@lemmy.world

[–] kromem@lemmy.world 1 points 1 month ago

I'm a proponent and I definitely don't think it's impossible to make a probable case beyond a reasonable doubt.

And there are implications around it being the case which do change up how we might approach truth seeking.

Also, if you exist in a dream but don't exist outside of it, there's pretty significant philosophical stakes in the nature and scope of the dream. We've been too brainwashed by Plato's influence and the idea that "original = good" and "copy = bad."

There's a lot of things that can only exist by way of copies that can't exist for the original (i.e. closure recursion), so it's a weird remnant philosophical obsession.

All that said, I do get that it's a fairly uncomfortable notion for a lot of people.

Mathematics disproves Matrix theory, says reality isn’t simulation in c/technology@lemmy.world

[–] kromem@lemmy.world 3 points 1 month ago

They also identity the particular junction that seems the most likely to be an artifact of simulation if we're in one.

A game like No Man's Sky generates billions of planets using procedural generation with a continuous seed function that gets converted into discrete voxels for tracking stateful interactions.

The researchers are claiming that the complexity of where our universe's seemingly continuous gravitational behaviors meet up with the behaviors of continuous probabilities converting to discrete values when being interacted with in stateful ways is incompatible with being simulated.

But completely overlook that said complexity itself may be the byproduct of simulation, in line with independent emerging approaches in how we are simulating worlds.

Mathematics disproves Matrix theory, says reality isn’t simulation in c/technology@lemmy.world

[–] kromem@lemmy.world 2 points 1 month ago

Yes, just like Minecraft worlds are so antiquated given how they contain diamonds in deep layers that must have taken a billion years to form.

What a simulated world contains as its local timescale doesn't mean the actual non-local run time is the same.

It's quite possible to create a world that appears to be billions of years old but only booted up seconds ago.

Mathematics disproves Matrix theory, says reality isn’t simulation in c/technology@lemmy.world

[–] kromem@lemmy.world 1 points 1 month ago (1 children)

Have you bothered looking for evidence?

What makes you so sure that there's no evidence for it?

For example, a common trope we see in the simulated worlds we create are Easter eggs. Are you sure nothing like that exists in our own universe?

-47

Emergent introspective awareness in large language models (www.anthropic.com)

submitted 1 month ago by kromem@lemmy.world to c/technology@lemmy.world

18 comments fedilink

1

Elon Musk’s Grok Twitter AI Is Actually ‘Woke,’ Hilarity Ensues (www.forbes.com)

submitted 2 years ago by kromem@lemmy.world to c/enoughmuskspam@lemmy.world

0 comments fedilink

I'd been predicting this would happen a few months ago with friends and old colleagues (you can have a smart AI or a conservative AI but not both), but it's so much funnier than I thought it would be when it finally arrived.

0

Israel raids Gaza's Al Shifa Hospital, urges Hamas to surrender (www.reuters.com)

submitted 2 years ago by kromem@lemmy.world to c/world@lemmy.world

0 comments fedilink

1

Machine-learning system based on light could yield more powerful, efficient large language models (news.mit.edu)

submitted 2 years ago by kromem@lemmy.world to c/machinelearning@lemmy.ml

0 comments fedilink

I've had my eyes on optoelectronics as the future hardware foundation for ML compute (add not just interconnect) for a few years now, and it's exciting to watch the leaps and bounds occurring at such a rapid pace.