1

5

Neural Graffiti is an experiment in adding a "Spray Layer" to a transformer model, which injects a memory trace into the final stages of inference without finetuning or retraining (github.com)

submitted 2 days ago by [email protected] to c/[email protected]

0 comments fedilink

2

-4

Breaking GPT-5 News! (lemmy.world)

submitted 1 week ago by [email protected] to c/[email protected]

2 comments fedilink

cross-posted from: https://lemmy.world/post/27657674

3

I want to open source a dataset but I'm not sure what license to use (lemmy.world)

submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]

6 comments fedilink

Hello!

I did a map generator(it's pixel art and the largest are 300x200 pixels) some time ago and decided to generate 3 types of map sizes and 1500 maps for each size to train a model to practice and I thought to do that dataset open source.

Is that really something that people want/appreciate or not really? I'm a bit lost on how to proceed and what license to use. Does it make sense to use an MIT License? Or which one do you recommend?

thanks!

4

6

Why do LLMs make stuff up? New research peers under the hood. (arstechnica.com)

submitted 1 week ago by [email protected] to c/[email protected]

0 comments fedilink

5

8

MLOps tips I gathered recently (www.readyforagents.com)

submitted 3 weeks ago by [email protected] to c/[email protected]

0 comments fedilink

Hi all,

I've been experimenting with building and deploying ML and LLM projects for a while now, and honestly, it’s been a journey.

Training the models always felt more straightforward, but deploying them smoothly into production turned out to be a whole new beast.

I had a really good conversation with Dean Pleban (CEO @ DAGsHub), who shared some great practical insights based on his own experience helping teams go from experiments to real-world production.

Sharing here what he shared with me, and what I experienced myself -

Data matters way more than I thought. Initially, I focused a lot on model architectures and less on the quality of my data pipelines. Production performance heavily depends on robust data handling—things like proper data versioning, monitoring, and governance can save you a lot of headaches. This becomes way more important when your toy-project becomes a collaborative project with others.

LLMs need their own rules. Working with large language models introduced challenges I wasn't fully prepared for—like hallucinations, biases, and the resource demands. Dean suggested frameworks like RAES (Robustness, Alignment, Efficiency, Safety) to help tackle these issues, and it’s something I’m actively trying out now. He also mentioned "LLM as a judge" which seems to be a concept that is getting a lot of attention recently.

Some practical tips Dean shared with me:

Save chain of thought output (the output text in reasoning models) - you never know when you might need it. This sometimes require using the verbos parameter.

Log experiments thoroughly (parameters, hyper-parameters, models used, data-versioning...).

Start with a Jupyter notebook, but move to production-grade tooling (all tools mentioned in the guide bellow 👇🏻)

To help myself (and hopefully others) visualize and internalize these lessons, I created an interactive guide that breaks down how successful ML/LLM projects are structured. If you're curious, you can explore it here:

https://www.readyforagents.com/resources/llm-projects-structure

I'd genuinely appreciate hearing about your experiences too—what’s your favorite MLOps tools? I think that up until today dataset versioning and especially versioning LLM experiments (data, model, prompt, parameters..) is still not really fully solved.

6

10

DeepSeek open source DeepEP – library for MoE training and Inference (github.com)

submitted 1 month ago by [email protected] to c/[email protected]

0 comments fedilink

7

4

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning (transformer-circuits.pub)

submitted 1 month ago by [email protected] to c/[email protected]

0 comments fedilink

8

4

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (transformer-circuits.pub)

submitted 1 month ago by [email protected] to c/[email protected]

0 comments fedilink

9

6

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (arxiv.org)

submitted 2 months ago by [email protected] to c/[email protected]

0 comments fedilink

10

7

Neurosymbolic AI -- Why, What, and How (arxiv.org)

submitted 3 months ago by [email protected] to c/[email protected]

0 comments fedilink

Neurosymbolic AI is a hybrid approach aiming to bridge the gap between neural networks' ability to learn patterns and symbolic AI's capacity for logical reasoning and explainability.

This approach may offer the best of both worlds combining robust learning from data and clear with understandable reasoning based on knowledge. It has the potential to outperform systems relying solely on either neural networks or symbolic logic and to provide clear explanations for its decisions.

The approach involves encoding structured symbolic knowledge into a format that can be integrated with neural networks and then mapping information from neural patterns back to structured symbolic representations.

11

2

Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence (arxiv.org)

submitted 3 months ago by [email protected] to c/[email protected]

0 comments fedilink

12

4

Genie 2: A large-scale foundation world model (deepmind.google)

submitted 4 months ago by [email protected] to c/[email protected]

1 comments fedilink

13

4

A good primer on what to expect running local LLMs (nullprogram.com)

submitted 4 months ago by [email protected] to c/[email protected]

4 comments fedilink

14

9

A community statement supporting the Open Source Definition (OSD) (osd.fyi)

submitted 5 months ago* (last edited 5 months ago) by [email protected] to c/[email protected]

1 comments fedilink

Declaration

We, the undersigned members of the Open Source community, assert that Open Source is defined solely by the Open Source Definition (OSD) version 1.9.

Any amendments or new definitions shall only be recognized if declared by clear community consensus through a transparent process to be determined.

15

7

How ‘Embeddings’ Encode What Words Mean (www.quantamagazine.org)

submitted 6 months ago by [email protected] to c/[email protected]

0 comments fedilink

16

3

New AI model “learns” how to simulate Super Mario Bros. from video footage (arstechnica.com)

submitted 7 months ago by [email protected] to c/[email protected]

0 comments fedilink

17

12

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o) (huggingface.co)

submitted 7 months ago by [email protected] to c/[email protected]

0 comments fedilink

"Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o).

It’s the top LLM in (at least) MMLU, MATH, IFEval, GSM8K.

Beats GPT-4o on every benchmark tested.

It clobbers Llama 3.1 405B. It’s not even close.

The technique that drives Reflection 70B is simple, but very powerful.

Current LLMs have a tendency to hallucinate, and can’t recognize when they do so.

Reflection-Tuning enables LLMs to recognize their mistakes, and then correct them before committing to an answer.

Additionally, we separate planning into a separate step, improving CoT potency and keeping the outputs simple and concise for end users.

Important to note: We have checked for decontamination against all benchmarks mentioned using @lmsysorg’s LLM Decontaminator.

The weights of our 70B model are available today on @huggingface here: https://huggingface.co/mattshumer/Reflection-70B

@hyperbolic_labs API available later today.

Next week, we will release the weights of Reflection-405B, along with a short report going into more detail on our process and findings.

Most importantly, a huge shoutout to @csahil28 and @GlaiveAI.

I’ve been noodling on this idea for months, and finally decided to pull the trigger a few weeks ago. I reached out to Sahil and the data was generated within hours.

If you’re training models, check Glaive out.

This model is quite fun to use and insanely powerful.

Please check it out — with the right prompting, it’s an absolute beast for many use-cases.

Demo here: https://reflection-playground-production.up.railway.app/

405B is coming next week, and we expect it to outperform Sonnet and GPT-4o by a wide margin.

But this is just the start. I have a few more tricks up my sleeve.

I’ll continue to work with @csahil28 to release even better LLMs that make this one look like a toy.

Stay tuned."

https://x.com/mattshumer_/status/1831767014341538166

18

9

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI (www.lifeiscomputation.com)

submitted 7 months ago by [email protected] to c/[email protected]

6 comments fedilink

19

6

The Difference Between Speaking and Thinking (www.theatlantic.com)

submitted 7 months ago by [email protected] to c/[email protected]

0 comments fedilink

https://archive.is/SXZMe

20

5

Diffusion Models Are Real-Time Game Engines (gamengen.github.io)

submitted 7 months ago by [email protected] to c/[email protected]

0 comments fedilink

21

4

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%. (github.com)

submitted 7 months ago by [email protected] to c/[email protected]

0 comments fedilink

22

1

Transformer Explainer (poloclub.github.io)

submitted 7 months ago by [email protected] to c/[email protected]

0 comments fedilink

23

4

Alibaba claims no. 1 spot in AI math models with Qwen2-Math (venturebeat.com)

submitted 8 months ago by [email protected] to c/[email protected]

0 comments fedilink

24

5

How to convert a positionally encoded predicted embedding from a decoder to its matching token? (infosec.pub)

submitted 8 months ago by [email protected] to c/[email protected]

2 comments fedilink

When training a transformer on positionally encoded embeddings, should the tgt output embeddings also be positionally encoded? If so, wouldn't the predicted/decoded embeddings also be positionally encoded?

25

10

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow (decrypt.co)

submitted 8 months ago by [email protected] to c/[email protected]

1 comments fedilink

Machine Learning

Declaration