LocalLLaMA

2837 readers

10 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago

MODERATORS

[email protected]

What is a good model that runs on 6GB Vram? (discuss.online)

submitted 2 months ago by [email protected] to c/[email protected]

10 comments fedilink hide all child comments

Should be good at conversations and creative, it'll be for worldbuilding

Best if uncensored as I prefer that over it kicking in when I least want it

I'm fine with those roleplaying models as long as they can actually give me ideas and talk to be logically

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 2 months ago* (last edited 2 months ago) (1 children)

Uh, that's not much VRAM. What kind of model sizes fit into a GPU like that? Does a 7B parameter model fit, quantized to 4bit? With whatever context length you need?

[–] [email protected] 2 points 2 months ago (1 children)

Yeah, Llama 3.1 7b works although it's with a bit of ram

It's not as slow as one might expect

[–] [email protected] 1 points 2 months ago* (last edited 2 months ago) (1 children)

Uh I forgot Llama3 has 8B parameters. What about something like L3-8B-Lunaris? Though, that's not the latest and greatest anymore and it's tuned for roleplay. Maybe it's worth a try, but there are probably better ones out there. I use Mistral-Nemo-Instruct-2407 for pretty much everything. I think it's a great allrounder and can do anything from answering questions about facts to dialogue to storywriting, and it's not censored at all. But it has 14B parameters unless I'm mistaken... Does your worldbuilding have to be fast? Because if you're fine with it being very slow, you can just run it on the CPU, without any graphics card. I usually do that. It'll take a few minutes to ingest the prompt and come up with an output. But I don't really care for use cases like storywriting or creative worldbuilding. (Software would be something like llama.cpp, ollama, LocalAI, koboldcpp, ...)

Otherwise I think you'd need to find a fine-tune of a <=8B parameter model that fits. There are enough of them out there. But I found writing prose, or story arcs is a bit more challenging than other tasks, and I believe worldbuilding might be, too. So I guess it's not as easy as finding a random roleplay or chatbot model.

[–] [email protected] 2 points 2 months ago

I'll see, maybe I could work with 14b after all... Maybe...