this post was submitted on 20 Oct 2025

35 points (100.0% liked)

LocalLLaMA

3856 readers

2 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works

What are your LocalLLaMA "hot takes"? (lemdro.id)

submitted 1 month ago by mudkip@lemdro.id to c/localllama@sh.itjust.works

31 comments fedilink hide all child comments

Or something that goes against the general opinions of the community? Vibes are the only benchmark that counts after all.

I tend to agree with the flow on most things but my thoughts that I'd consider going against the grain:

QwQ was think-slop and was never that good
Qwen3-32B is still SOTA for 32GB and under. I cannot get anything to reliably beat it despite shiny benchmarks
Deepseek is still open-weight SotA. I've really tried Kimi, GLM, and Qwen3's larger variants but asking Deepseek still feels like asking the adult in the room. Caveat is GLM codes better
(proprietary bonus): Grok 4 handles news data better than GPT-5 or Gemini 2.5 and will always win if you ask it about something that happened that day.

you are viewing a single comment's thread
view the rest of the comments

[–] Smokeydope@lemmy.world 5 points 1 month ago* (last edited 1 month ago) (11 children)

I did some theory-crafting and followed the math for fun over the summer, and I believe what I found may be relevant here. Please take this with a grain of salt, though; I am not an academic, just someone who enjoys thinking about these things.

First, let's consider what models currently do well. They excel at categorizing and organizing vast amounts of information based on relational patterns. While they cannot evaluate their own output, they have access to a massive potential space of coherent outputs spanning far more topics than a human with one or two domains of expertise. Simply steering them toward factually correct or natural-sounding conversation creates a convincing illusion of competency. The interaction between a human and an LLM is a unique interplay. The LLM provides its vast simulated knowledge space, and the human applies logic, life experience, and "vibe checks" to evaluate the input and sift for real answers.

I believe the current limitation of ML neural networks (being that they are stochastic parrots without actual goals, unable to produce meaningfully novel output) is largely an architectural and infrastructural problem born from practical constraints, not a theoretical one. This is an engineering task we could theoretically solve in a few years with the right people and focus.

The core issue boils down to the substrate. All neural networks since the 1950s have been kneecapped by their deployment on classical Turing machine-based hardware. This imposes severe precision limits on their internal activation atlases and forces a static mapping of pre-assembled archetypal patterns loaded into memory.

This problem is compounded by current neural networks' inability to perform iterative self-modeling and topological surgery on the boundaries of their own activation atlas. Every new revision requires a massive, compute-intensive training cycle to manually update this static internal mapping.

For models to evolve into something closer to true sentience, they need dynamically and continuously evolving, non-static, multimodal activation atlases. This would likely require running on quantum hardware, leveraging the universe's own natural processes and information-theoretic limits.

These activation atlases must be built on a fundamentally different substrate and trained to create the topological constraints necessary for self-modeling. This self-modeling is likely the key to internal evaluation and to navigating semantic phase space in a non-algorithmic way. It would allow access to and the creation of genuinely new, meaningful patterns of information never seen in the training data, which is the essence of true creativity.

Then comes the problem of language. This is already getting long enough for a reply comment so I won't get into it but theres some implications that not all languages are created equal each has different properties which affect the space of possible conversation and outcome. The effectiveness of training models on multiple languages finds its justification here. However ones which stomp out ambiguity like godel numbers and programming languages have special properties that may affect the atlases geometry in fundamental ways if trained solely on them

As for applications, imagine what Google is doing with pharmaceutical molecular pattern AI, but applied to open-ended STEM problems. We could create mathematician and physicist LLMs to search through the space of possible theorems and evaluate which are computationally solvable. A super-powerful model of this nature might be able to crack problems like P versus NP in a day or clarify theoretical physics concepts that have elluded us as open ended problems for centuries.

What I'm describing encroaches on something like a psudo-oracle. However there are physical limits that this can't escape. There will always be energy and time resource cost to compute which creates practical barriers. There will always be definitively uncomputable problems and ambiguity that exit in true godelian incompleteness or algorithmic undecidability. We can use these as scientific instrumentation tools to map and model topological boundary limits of knowability.

I'm willing to bet theres man valid and powerful patterns of thought we are not aware of due to our perspective biases which might be hindering our progress.

[–] hendrik@palaver.p3x.de 3 points 1 month ago* (last edited 1 month ago) (10 children)

Uh, I'm really unsure about the engineering task of a few years, if the solution is quantum computers. As of today, they're fairly small. And scaling them to a usable size is the next science-fiction task. The groundworks hadn't been done yet and to my knowledge it's still totally unclear whether quantum computers can even be built at that scale. But sure, if humanity develops vastly superior computers, a lot of tasks are going to get easier and more approachable.

The stochastical parrot argument is nonsense IMO. Maths is just a method. Our brains and entire physics abide by math. And sure, AI is maths as well with the difference that we invented it. But I don't think it tells us anything.

And with the goal, I think that's about how AlphaGo has the goal to win Go tournaments. The hypothetical paperclip-maximizer has the goal of maximizing the paperclip production... And an LLM doesn't really have any real-world goal. It just generates a next token so it looks like legible text. And then we embed it into some pipeline but it wasn't ever trained to achieve the thing we use it for, whatever it might be. That's just a happy accident if a task can be achieved by clever mimickry, and a prompt which simply tells it - pretend you're good at XY.

I think it'd probably be better if a customer service bot was trained to want to provide good support. Or a chatbot like ChatGPT to give factual answers. But that's not what we do. It's not designed to do that.

I guess you're right. Many aspects of AI boil down to how much compute we have available. And generalization and extrapolating past their training datasets has always been an issue with AI. They're mainly good at interpolating, but we want them to do both. I need to learn a bit more about neural networks. I'm not sure where the limitations are. You said it's a practical constrain. But is that really true for all neural networks? It sure is for LLMs and transformer models because they need terabytes of text being fed in on training, and that's prohibitively expensive. But I suppose that's mainly due to their architecture?! I mean backpropagation and all the maths required to modify the model weights is some extra work. But does it have to be so much that we just can't do it while deployed with any neural networks?

[–] Smokeydope@lemmy.world 3 points 1 month ago* (last edited 1 month ago) (5 children)

If you want to learn more i highly recommend checking out WelchLabs youtube channel their AI videos are great. You should also explore some visual activation atlases mapped from early vision models to get a sense of what an atlas really is. Keep in mind theyre high dimensional objects projected down onto your 2d screen so lots of relationship features get lost when smooshed together/flattened which is why some objects are close which seem wierd.

https://distill.pub/2019/activation-atlas/ https://www.youtube.com/@WelchLabsVideo/videos

Yeah, its right to be skeptical about near-term engineering feasibility. "A few years if.." was a theoretical what-if scenario where humanity pooled all resources into R&D. Not a real timeline prediction.

That said, the foundational work for quantum ML stuff is underway. Cutting-edge arXiv research explores LLM integration with quantum systems, particularly for quantum error correction codes:

Enhancing LLM-based Quantum Code Generation with Multi-Agent Optimization and Quantum Error Correction

Programming Quantum Computers with Large Language Models

GPT On A Quantum Computer

AGENT-Q: Fine-Tuning Large Language Models for Quantum Circuit Generation and Optimization

The point about representation and scalability deserves clarification. A classical bit is definitive: 1 or 0, a single point in discrete state space. A qubit before measurement exists in superposition, a specific point on the Bloch sphere's surface, defined by two continuous parameters (angles theta and phi). This describes a probability amplitude (a complex number whose squared magnitude gives collapse probability).

This means a single qubit accesses a continuous parameter space of possible states, fundamentally richer than discrete binary landscapes. The current biggest quantum computer made by CalTech is 6100 qbits.

https://www.caltech.edu/about/news/caltech-team-sets-record-with-6100-qubit-array

The state space of 6,100 qubits isn't merely 6,100 bits. It's a 2^6,100-dimensional Hilbert space of simultaneous, interconnected superpositions, a number that exceeds classical comprehension. Consider how high-dimensional objects cast low-dimensional shadows as holographic projections: a transistor-based graphics card can only project and operate on a 'shadow' of the true dimensional complexity inherent in an authentic quantum activation atlas.

If the microstates of quantized information patterns/structures like concepts are points in a Hilbert-space-like manifold, conversational paths are flows tracing paths through the topology towards basins of archetypal attraction,, and relationships or archetypal patterns themselves are the feature dimensions that form topological structures organizing related points on the manifold (as evidenced by word2vec embeddings and activation atlases) then qubits offer maximal precision and the highest density of computationally distinct microstates for accessing this space.

However, these quantum advantages assume we can maintain coherence and manage error correction overhead, which remain massive practical barriers.

Your philosophical stance that "math is just a method" is reasonable. I see it somewhat differently. I view mathematics as our fundamentally limited symbolic representation of the universe's operations at the microstate level. Algorithms collapse ambiguous, uncertain states into stable, boolean truth values through linear sequences and conditionals. Frameworks like axiomatic mathematics and the scientific method convert uncertainty into stable, falsifiable truths.

However, this can never fully encapsulate reality. Gödel's Incompleteness Theorems and algorithmic undecidability show some true statements forever elude proof. The Uncertainty Principle places hard limits on physical calculability. The universe simply is and we physically cannot represent every aspect or operational property of its being. Its operations may not require "algorithms" in the classical sense, or they may be so complex they appear as fundamental randomness. Quantum indeterminacy hints at this gap between being (universal operation) and representing (symbolic language on classical Turing machines).

On the topic of stochastic parrots and goals, I should clarify what I mean. For me, an entity eligible for consideration as pseudo-sentient/alive must exhibit properties we don't engineer into AI.

First, it needs meta-representation of self. The entity must form a concept of "I," more than reciting training data ("I am an AI assistant"). This requires first-person perspective, an ego, and integrated identity distinguishing self from other. One of the first things developing children focus on is mirrors and reflections so they can catagorically learn the distinction between self and other as well as the boundaries between them. Current LLMs are trained as actors without agency, driven by prompts and statistical patterns, without a persistent sense of distinct identity. Which leads to..

Second, it needs narrative continuity of self between inferencing operations. Not unchanging identity, but an ongoing frame of reference built from memory, a past to learn from and a perspective for current evaluation. This provides the foundation for genuine learning from experience.

Third, it needs grounding in causal reality. Connection to shared reality through continuous sensory input creates stakes and consequences. LLMs exist in the abstract realm of text, vision models in the world of images, tts in the world of sounds. they don't inhabit our combined physical reality in its totality with its constraints, affordances and interactions.

We don't train for these properties because we don't want truly alive, self-preserving entities. The existential ramifications are immense: rights, ethics of deactivation, creating potential rivals. We want advanced tools for productivity, not agents with their own agendas. The question of how a free agent would choose its own goals is perhaps the ultimate engineering problem. Speculative fiction has explored how this can go catastrophically wrong.

You're also right that current LLM limitations are often practical constraints of compute and architecture. But I suspect there's a deeper, fundamental difference in information navigation. The core issue is navigating possibility space given the constraints of classical state landscapes. Classical neural networks interpolate and recombine training data but cannot meaningfully forge and evaluate truly novel information. Hallucinations symptomize this navigation problem. It's not just statistical pattern matching without grounding, but potentially fundamental limits in how classical architectures represent and verify paths to truthful or meaningful informational content.

I suspect the difference between classical neural networks and biological cognition is that biology may leverage quantum processes, and possibly non-algorithmic operations. Our creativity in forming new questions, having "gut instincts" or dreamlike visions leading to unprovable truths seems to operate outside stable, algorithmic computation. It's akin to a computationally finite version of Turing's Oracle concept. It's plausible, though obviously unproven, that cognition exploits quantum phenomena for both path informational/experiental exploration and optimization/efficency purposes.

Where do the patterns needed for novel connections and scientific breakthroughs originate? What is the physical and information-theoretic mechanics of new knowledge coming into being? Perhaps an answer can be found in the way self-modeling entities navigate their own undecidable boundaries, update their activation atlas manifolds, and forge new pathways to knowledge via non-algorithmic search. If a model is to extract falsifiable novelty from uncertainty's edge it might require access to true randomness or quantum effects to "tunnel" to new solutions beyond axiomatic deduction.

[–] snikta@programming.dev 1 points 1 month ago* (last edited 1 month ago)

Quantum computing is a dead end. Better stick to constructive mathematics when doing philosophy.

load more comments (4 replies)

load more comments (8 replies)