Selfhosted

59923 readers

578 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam.
Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.
Don't duplicate the full text of your blog or git here. Just post the link for folks to click.
Submission headline should match the article title.
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago

MODERATORS

curbstickle@anarchist.nexus

curbstickle_lw@lemmy.world

Replaced $40/month in AI API subscriptions with self-hosted Ollama + n8n (discuss.tchncs.de)

submitted 2 months ago by quickbitesdev@discuss.tchncs.de to c/selfhosted@lemmy.world

31 comments fedilink hide all child comments

Quick post about a change I made that's worked out well.

I was using OpenAI API for automations in n8n — email summaries, content drafts, that kind of thing. Was spending ~$40/month.

Switched everything to Ollama running locally. The migration was pretty straightforward since n8n just hits an HTTP endpoint. Changed the URL from api.openai.com to localhost:11434 and updated the request format.

For most tasks (summarization, classification, drafting) the local models are good enough. Complex reasoning is worse but I don't need that for automation workflows.

Hardware: i7 with 16GB RAM, running Llama 3 8B. Plenty fast for async tasks.

you are viewing a single comment's thread
view the rest of the comments

[–] Valmond@lemmy.dbzer0.com 2 points 2 months ago (3 children)

What kind of hardware do you need to run those models?

[–] Jakeroxs@sh.itjust.works 5 points 2 months ago

Depends on how much quantization, but still fairly beefy, couldn't run it on my homelab with a 3080ti for example.

I generally use smaller 8-12b models and they're alright depending on the task.

[–] TheMightyCat@ani.social 5 points 2 months ago* (last edited 2 months ago)

I'm running 2x4090, the 35B fits very comfortable in that.

For large models like the 397B without a ton of money there are several ways, ive seen posts of people using arrays of used 3090s with good results.

The other option is CPU inference although with current RAM prices that is less cost effective.

I was looking at maybe an array of Milk-V JUPITER2 since vllm added riscv support which could be very cost effective.

[–] suicidaleggroll@lemmy.world 4 points 2 months ago* (last edited 2 months ago)

In general, you take the model size in billions of parameters (eg: 397B), divide it by 2 and add a bit for overhead, and that’s how much RAM/VRAM it takes to run it at a “normal” quantization level. For Qwen3.5-397B, that’s about 220 GB. Ideally that would be all VRAM for speed, but you can offload some or all of that to normal RAM on the CPU, you’ll just take a speed hit.

So for something like Qwen3.5-397B, it takes a pretty serious system, especially if you’re trying to do it all in VRAM.