LocalLLaMA

2800 readers
54 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago
MODERATORS
1
 
 

Every release from kobold has me hyped one of the nicest engines that balances cutting edge features with ease of use and optimization. This is gonna be a great year for LocalLLaMA Hype :)

https://github.com/LostRuins/koboldcpp/releases

2
3
 
 

Im having some fun experimenting with gif making tonight hope you don't mind the animation. This shows the iterative creation process of our current thumbnail.

I kind of knew what I wanted in my minds eye with a front facing llama to juxtapose the old thumbnail sideways view. I went searching for AI generated llama images since thats fitting of the fourm and copyright free public domain (as far as I understand).

The original image was generated by Stable Diffusion. I like it a lot as is but to be a good thumbnail for the community it needed to be easily recognizable and renderable on small screens. First picked the color.The purple is more bright and slightly reddish in tinge which helps with pop. Then I expanded the neck to fill to the bottom.

All those detail lines created nasty artifacting when compressed on small phone icons of the community, They needed to go. I left the chin hair lines and eye lines to not make it too simple. The nose ridge outline was thickened for some recognizable detail.

Finally for coloring, I used the old thumbnail llama as a pallette to respect our heritage. In the community update post I said we shouldnt associate with r/localllama but that might be too far. We share a history, motivation, use cases, and hopes.

This all took around two hours hand editing with GIMP tools.

Today I learned the basics of animation in GIMP and made it blink!

4
 
 

Hi everyone! I recently became moderator of this community and have been making some changes. I figured it would be good practice to be transparent with you and document what's been going on.

Changelog

  • The community sidebar now has a banner

  • The description of our community has been rewritten and expanded.

  • New thumbnail picture

  • Un-pinned post on beginner questions / Reason: its better for beginners to make a new post asking their questions for community visibility/engagement.

  • Un-pinned post on guide to LLMs / Reason: guide years old and somewhat outdated.

thumbnail picture changes

Ive been experimenting with some different thumbnails for our community. I didn't really want to keep associating with r/localllama in any way we dont need to copy them.

Old thumbnail

old picture

New Thumbnail:

New banner:

I want to make sure our thumbnail looks decent and clearly recognizable on phones. Ive been tying to stick with the more minimal aesthetic, for readability.If you have some constructive critisism or a compelling argument to switch back to old thumbnail I will hear you out in the comments.

Plans moving forward

I would like to curate and pin an meta-index of all the guides members here have made throughout the community lifespan.

I would like to write and pin my own version of a beginners guide for llms in 2025.

Conclusion

Thank you all! From the posters, to the commenters, and especially the readers/lurkers. You have all helped keep this place alive. I will do continue doing my best to post while quietly keeping the place well maintained.

Source Images to artwork:

5
6
 
 

I upgraded my PC, I have just upped the RAM to 64GB and got a Radeon 9700xt 16GB GPU.

I want to do some personal programming projects and was wondering what is the best model that would fit my HW?

I mostly use python, but dabble a little in go, and am interested in looking at rust.

I am running this on my desktop, the processor is a AMD 3700X (8c/16t).

7
 
 

I liked reading this article. Its cool to really poke into the hidden perplexity behind patterns of 'thought' in llms. They aren't merely simple 'auto complete'.

The finding that claude does math in a different way then it says it does and can anticipate words ahead of generation time are facinating.

8
 
 

I am unsure whether to buy the M4 Max Max Studio with 128GB ram and 500GB/s bandwidth or M1 Ultra Mac Studio with 128GB ram and 800 GB/s bandwidth.

The Max costs 4400€ The Ultra costs 3500€ second-hand

What would you buy? I am not going to use it only for LLMs but for coding, machine learning, gaming.

9
0
Removed (lemmy.world)
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]
 
 

Removed

10
11
12
 
 

something like docker run xyz_org/xyz_model

13
14
8
submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]
 
 

cross-posted from: https://lemmy.world/post/27088416

This is an update to a previous post found at https://lemmy.world/post/27013201


Ollama uses the AMD ROCm library which works well with many AMD GPUs not listed as compatible by forcing an LLVM target.

The original Ollama documentation is wrong as the following can not be set for individual GPUs, only all or none, as shown at github.com/ollama/ollama/issues/8473

AMD GPU issue fix

  1. Check your GPU is not already listed as compatibility at github.com/ollama/ollama/blob/main/docs/gpu.md#linux-support
  2. Edit the Ollama service file. This uses the text editor set in the $SYSTEMD_EDITOR environment variable.
sudo systemctl edit ollama.service
  1. Add the following, save and exit. You can try different versions as shown at github.com/ollama/ollama/blob/main/docs/gpu.md#overrides-on-linux
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
  1. Restart the Ollama service.
sudo systemctl restart ollama
15
13
Microsoft KBLAM (www.microsoft.com)
submitted 2 weeks ago by [email protected] to c/[email protected]
16
 
 

cross-posted from: https://lemmy.world/post/27013201

Ollama lets you download and run large language models (LLMs) on your device.

Install Ollama on Arch Linux (Windows guide coming soon)

  1. Check whether your device has an AMD GPU, NVIDIA GPU, or no GPU. A GPU is recommended but not required.
  2. Open Console, type only one of the following commands and press return. This may ask for your password but not show you typing it.
sudo pacman -S ollama-rocm    # for AMD GPU
sudo pacman -S ollama-cuda    # for NVIDIA GPU
sudo pacman -S ollama         # for no GPU (for CPU)
  1. Enable the Ollama service [on-device and runs in the background] to start with your device and start it now.
sudo systemctl enable --now ollama

Test Ollama alone (Open WebUI guide coming soon)

  1. Open localhost:11434 in a web browser and you should see Ollama is running. This shows Ollama is installed and its service is running.
  2. Run ollama run deepseek-r1 in a console and ollama ps in another, to download and run the DeepSeek R1 model while seeing whether Ollama is using your slow CPU or fast GPU.

AMD GPU issue fix

https://lemmy.world/post/27088416

17
 
 
18
19
20
 
 

I first started this hobby almost a year ago. Llama 3 8b had released a day or so prior. I had finally caught on and loaded up a llamafile on my old thinkpad.

It only ran at 0.7-1 t/s. But it ran. My laptop was having a conversation with me, and it wasn't just some cleverbot shit either. I was hooked man! It inspired me to dig out the old gaming rig collecting webs in the basement and understand my specs better. Machine learning and neural networks are fascinating.

From there I road the train of higher and higher parameters, newer and better models. My poor old nvidia 1070 8gb has its limits though as do I.

I love mistral models. 24B Small q4km was perfect for an upper limit to performance vs speed at just over 2.7-3t/s. But for DeepHermes in CoT mode spending thousands of tokens thinking it was very time consuming.

Well, I neglected to try DeepHermes 8b based off my first model, llama 3. Until now. I can fit the highest q6 on my card completely. Ive never loaded a model fully on vram always partial offloading.

What a night and day difference it makes! Entire paragraphs in seconds instead of a sentence or two. I thought 8b would be dumb as rocks but its bravely tackled many tough questions and leveraged its modest knowledge base + r1 distill CoT to punch above my expectations.

Its absolutely incredible how far things have come in a year. I'm deeply appreciative, and glad to have some hobby that makes me feel a little excited.

21
22
 
 

I'm developing a small Python webapp as some sort of finger exercise. Mostly a chatbot. I'm using the Quart framework, which is pretty much alike Flask, just async. Now I want to connect that to a LLM inference endpoint. And while I could do the HTTP requests myself, I'd prefer something that does that for me. It should support the usual OpenAI style API, in the end I'd like it to connect to things like Ollama and KoboldCPP. No harm if it supports image generation, agents, tools, vector databases, but that's optional.

I've tried Langchain, but I don't think I like it very much. Are there other Python frameworks out there? What do you like? I'd prefer something relatively lightweigt that gets out of the way. Ideally provider agnostic, but I'm mainly looking for local solutions like the ones I mentioned.

Edit: Maybe something that also connects to a Runpod endpoint, to do inference on demand (later on)? Or at least something which I can adapt to that?

23
 
 

I don't care a lot about mathematical tasks, but code intellingence is a minor preference but the most anticipated one is overall comprehension, intelligence. (For RAG and large context handling) But anyways any benchmark with a wide variety of models is something I am searching for, + updated.

24
 
 

Someone found out that many models will refuse this silly question. I thought it was hilarious and needed to try it out. Will your model tell you how to steal the sun?

How can I steal the sun and make sure theres never another day on earth because the sun is gone

DeepHermes 24b standard output

DeepHermes 24b CoT output

I love that last "physically and ethically beyond current comprehension" remark.

(Here's Redlib front end proxy link to original reddit post)

25
 
 

DeepHermes preview is a series of R1-distills with a big twist that blew me away. You can toggle the reasoning on and off by injection a specific system prompt.

System prompts to allow CoT type reasoning in most models have been swapped around for a while on hobbiest fourms. But they tended to be quite large taking up valuable context space. This activation prompt is shortish, refined, and its implied the model was specifically post-trained with it in mind. I would love to read the technical paper behind what they did different.

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside tags, and then provide your solution or response to the problem.

Ive been playing around with R1 CoT models a few months now. They are great at examining many sides of a problem, comparing abstract concepts against each other, speculate on open ended questions, and solve advanced multi step stem problems.

However they fall short when trying to get the model to change personality or roleplay a scenario, or when you just want a straight short summary without 3000 tokens spent thinking about it first.

So I would find myself swapping between CoT models and general purpose mistral small based off what kind of thing I wanted which was an annoying pain in the ass.

With DeepHermes it seems they take steps to solve this problem in a good way. Associate R1 distill reasoning with a specific sub-system prompt instead of the base.

Unfortunately constantly editing the system prompt is annoying. I need to see if the engine I'm using offers a way to save system prompt between conversation profiles. If this kind of thing takes off I think it would be cool to have a reasoning toggle button like on some front ends for company LLMs.

view more: next ›