Selfhosted

59923 readers

531 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam.
Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.
Don't duplicate the full text of your blog or git here. Just post the link for folks to click.
Submission headline should match the article title.
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago

MODERATORS

curbstickle@anarchist.nexus

curbstickle_lw@lemmy.world

Self hosting LLMs on a remote VPS (lemmy.dbzer0.com)

submitted 2 years ago by EmbarrassedDrum@lemmy.dbzer0.com to c/selfhosted@lemmy.world

18 comments fedilink hide all child comments

Hi all, I'd like to hear some suggestions on self hosting LLMs on a remote server, and accessing said LLM via a client app or a convenient website. Either hear about your setups or products you got good impression on.

I've hosted Ollama before but I don't think it's intented for remote use. On the other hand I'm not really an expert and maybe there's other things to do like add-ons.

Thanks in advance!

you are viewing a single comment's thread
view the rest of the comments

[–] just_another_person@lemmy.world -3 points 2 years ago (1 children)

Yes, but running an LLM isn't an on-demand workload, it's always on. You're paying for a 24/7 GPU instance if going that route over CPU.

[–] hendrik@palaver.p3x.de 7 points 2 years ago* (last edited 2 years ago) (1 children)

Well, there's both. I'm with runpod and they bill me for each second I run that cloud instance. I can have it running 24/7 or 30min on-demand or just 20 seconds if I want to generate just one reply/image. Behind the curtains, it's Docker containers. And one of the services is an API that you can hook into. Upon request, it'll start a container, do the compute and at your option either shut down immediately, meaning you'd have payed like 2ct for that single request. Or listen for more requests until an arbitrary timeout is reached. Other services offer similar things. Or a fixed price per ingested or generated token with some other (ready-made) services.

[–] just_another_person@lemmy.world 1 points 2 years ago (1 children)

Runpod is a container service. OP asked about remote server.

[–] hendrik@palaver.p3x.de 3 points 2 years ago* (last edited 2 years ago) (1 children)

What's the difference regarding this task? You can rent it 24/7 as a crude webserver. Or run a Linux desktop inside. Pretty much everything you could do with other kinds of servers. I don't think the exact technology matters. It could be a VPS, virtualized with KVM, or a container. And for AI workloads, these containers have several advantages. Like you can spin them up within seconds. Scale them etc. I mean you're right. This isn't a bare-metal server that you're renting. But I think it aligns well with OP's requirements?!

[–] just_another_person@lemmy.world -2 points 2 years ago (1 children)

Well I think the difference is what they asked about.

[–] DarkDarkHouse@lemmy.sdf.org 1 points 2 years ago

Running an LLM can certainly be an on-demand service. Apart from training, which I don’t think we are discussing, GPU compute is only used while responding to prompts.