Selfhosted

58172 readers

332 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

Which Llama Server Hardware do you use? (discuss.tchncs.de)

submitted 1 day ago* (last edited 1 day ago) by bazinga@discuss.tchncs.de to c/selfhosted@lemmy.world

21 comments fedilink hide all child comments

I realize, I need to upgrade my little NUC to something bigger for higher inference of bigger llama models. I want something that you still can have on your living room's tv bench, so no monster rack please, but that has also the necessary muscle when needed for llama. Budget doesn't matter right now, want to understand what's good and what's out there. Thanks

EDIT: Wow, thanks for the inspiration, guess I need to look at bit for "how to stuff a huge graphics card into a mini box". To clarify a bit more what I want with it: I want to build a responsive personal assistant. I am dreaming of models bigger than 8B, good tool calling for things like memory, websearch etc., no coding, no image generation, no video generation required. Image recognition would be good but not a must. Regarding footprint, the no monster ;) Something that you can have in your livingroom, and could be wife approved - so no big gaming rig with exhaust pipes and stuff, needs to be good looking ;)

you are viewing a single comment's thread
view the rest of the comments

[–] anamethatisnt@sopuli.xyz 4 points 1 day ago (1 children)

When it comes to Nvidia GPUs the VRAM is the main thing to look for.
For consumer cards it is:
Entry level - RTX 5060 Ti 16GB RAM with a price point around 500-550 euro
Mid - Buying a used RTX 3090 24GB RAM with a price point around 830 euro when I look at swedish second hand markets
High - RTX 5090 32GB RAM with a price point around 3500 euro

After that you end up looking at the RTX Pro Blackwell cards:
Entry - RTX PRO 5000 Blackwell 48GB RAM ~5300 euro
Mid - RTX PRO 6000 Blackwell 96GB RAM ~10100 euro

It all depends on which models you want to run, you can definitely start playing around with Llama 3 8B and similar models with a 5060 Ti 16GB.

If you're looking at 24B-30B models you need the 24GB VRAM that RTX 3090 offers and get a larger context window if you go for the RTX 5090.

If you're looking to run Llama 3 70B then you need to go into the RTX Pro level of vram.

All of this is based on running it with Nvidia cards, there's also other setups such as Mac Studios with huge amount of RAM. They're slower but allow for much larger models at the same price point.
You could also run with AMD/Intel gpus but much software is built primarily for running CUDA (and Nvidia) gpus so it's more work and not always compatible.

I know you said no "monster rack" but I don't really know what you classify as a monster. :)
An ordinary gaming pc is also a good starter AI pc, so something like this allows you to do both:
https://pcpartpicker.com/list/sFp4qd

[–] bazinga@discuss.tchncs.de 2 points 1 day ago (3 children)

Thank you for the detailed writeup. Are you aware of anything small footprint? I am thinking like dgx spark size maybe a bit bigger?

[–] ikidd@lemmy.world 1 points 9 hours ago

Memory bus speed of the Spark is poor and that's a huge detriment.

[–] anamethatisnt@sopuli.xyz 2 points 1 day ago* (last edited 1 day ago)

Problem with smaller footprint is cooling and how audible it becomes.
One idea is to use fiber optic hdmi cables and a usb extender to hide the pc away in another room.

If you want smaller footprint then the keyword to use is "Unified memory", it can be reasonable fast for 30B models and a slow thinker mode for 70B ones.

edit: example of a Unified Memory Apple Mac Studio can be found here at $5499 for 96GB RAM
https://www.apple.com/shop/buy-mac/mac-studio/m3-ultra-chip-32-core-cpu-80-core-gpu-96gb-memory-2tb-storage

[–] TheHolm@aussie.zone 2 points 1 day ago (1 children)

If you happy with 16g , nothing can beat in speed/cost of AMD RX 9070 XT.

[–] zergtoshi@lemmy.world 1 points 1 day ago (1 children)

Wouldn't an AMD RX 9060 XT with 16 GB RAM be nice as well if you're hunting for good speed/cost options?

[–] TheHolm@aussie.zone 1 points 16 hours ago

Probably. It just not as fast as 9070 XT. I'm using 9070 XT myself and limitation for running LLMs is memory, not speed. If model fit in memory it will runs fast enough to be practical.