Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
-
No low-effort posts. This is subjective and will largely be determined by the community member reports.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
view the rest of the comments
Does anyone have a recommendation for a local model that can run well on a 5070 12GB? It pretty much would only get used for help with homelabbing and simple scripts.
There's an argument to be had regarding a MoE versus a small dense model. I guess it depends on what exactly you need doing with it. I would be tempted to run a smaller dense model (like a Qwen 3-14B or a Qwen 3.5 9B) as at a reasonable quant, it might fit mostly or entirely on the GPU, thereby giving you excellent speeds.
PS: I'm actually in the process of designing an expert system (not a LLM) for pretty much the task you described. The intention is that you would still interact with it like a large language model, but the actual brains underneath it would be something more traditional.
MoEs can be very fast with hybrid inference. I run Xiaomi Mimo 2.5 (a 310B model, 116GB weights) on my single 3090 + 7800 CPU, and it outputs faster than I can read it.
It's also easier to fit long context, if you need that.
It's best to use the ik_llama.cpp fork for that, though. It gives a huge boost to hybrid MoE speeds.