this post was submitted on 29 Dec 2024
20 points (81.2% liked)

Selfhosted

59973 readers
420 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam.

  3. Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.

  4. Don't duplicate the full text of your blog or git here. Just post the link for folks to click.

  5. Submission headline should match the article title.

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

I'm currently getting a lot of timeout errors and delays processing the analysis. What GPU can I add to this? Please advise.

you are viewing a single comment's thread
view the rest of the comments
[–] grue@lemmy.world 6 points 1 year ago* (last edited 1 year ago) (1 children)

I'm glad you posted this because I need similar advice. I want a GPU for Jellyfin transcoding and running Ollama (for a local conversation agent for Home Assistant), splitting access to the single GPU between two VMs in Proxmox.

I would also prefer it to be AMD as a first choice or Intel as a second, because I'm still not a fan of Nvidia for their hostile attitude towards Linux and for proprietary CUDA.

(The sad thing is that I probably could have accomplished the transcoding part with just integrated graphics, but my AMD CPU isn't an APU.)

[–] BaroqueInMind@lemmy.one 3 points 1 year ago* (last edited 1 year ago) (2 children)

The problem with AMD graphics cards is that the performance that CUDA, xformers and pytorch provide for nVidia cards blows anything AMD has away by a significantly high order of magnitude.

I have no idea why AMD gpus are so trash when it comes near anything involving generative AI/LLMs, DLSS, Jellyfin transcoding, or even raytracing; i would recommend waiting until their upcoming new GPU announcements.

[–] sith@lemmy.zip 3 points 1 year ago* (last edited 1 year ago) (1 children)

Is that still true though? My impression is that AMD works just fine for inference with ROCm and llama.cpp nowadays. And you get much more VRAM per dollar, which means you can stuff a bigger model in there. You might get fewer tokens per second compared with a similar Nvidia, but that shouldn't really be a problem for a home assistant. I believe. Even an Arc a770 should work with IPEX-LLM. Buy two Arc or Radeon with 16 GB VRAM each, and you can fit a Llama 3.2 11B or a Pixtral 12B without any quantization. Just make sure that ROCm supports that specific Radeon card, if you go for team red.

[–] OpossumOnKeyboard@lemmy.world 2 points 1 year ago

I’m also curious. I have also heard good things this past year about AMD and ROCm. Obviously not as close to Nvidia yet (or maybe ever) but considering the price I’ve been considering trying.

[–] ikidd@lemmy.world 1 points 1 year ago (1 children)

There's a CUDA emulator called ZLUDA that fixes a lot of that.

[–] BaroqueInMind@lemmy.one 1 points 1 year ago (1 children)

Development for that stopped almost a year ago because the performance difference is so much that no one used it and even AMD themselves dropped all funding to that project.

[–] ikidd@lemmy.world 1 points 1 year ago (1 children)

It was started again and is close to where it was before it was dropped.

[–] brucethemoose@lemmy.world 2 points 1 year ago

It’s not functional yet.