this post was submitted on 19 Nov 2025
143 points (96.7% liked)

Technology

76917 readers
3295 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] just_another_person@lemmy.world 5 points 14 hours ago

From your own linked paper:

To design a neural long-term memory module, we need a model that can encode the abstraction of the past history into its parameters. An example of this can be LLMs that are shown to be memorizing their training data [98, 96, 61]. Therefore, a simple idea is to train a neural network and expect it to memorize its training data. Memorization, however, has almost always been known as an undesirable phenomena in neural networks as it limits the model generalization [7], causes privacy concerns [98], and so results in poor performance at test time. Moreover, the memorization of the training data might not be helpful at test time, in which the data might be out-of-distribution. We argue that, we need an online meta-model that learns how to memorize/forget the data at test time. In this setup, the model is learning a function that is capable of memorization, but it is not overfitting to the training data, resulting in a better generalization at test time.

Literally what I just said. This is specifically addressing the problem I mentioned, and goes on further to exacting specificity on why it does not exist in production tools for the general public (it'll never make money, and it's slow, honestly). In fact, there is a minor argument later on that developing a separate supporting system negates even referring to the outcome as an LLM, and the supported referenced papers linked at the bottom dig even deeper into the exact thing I mentioned on the limitations of said models used in this way.