this post was submitted on 14 Jan 2025
40 points (90.0% liked)

Selfhosted

60048 readers
1180 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam.

  3. Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.

  4. Don't duplicate the full text of your blog or git here. Just post the link for folks to click.

  5. Submission headline should match the article title.

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

I’m doing a lot of coding and what I would ideally like to have is a long context model (128k tokens) that I can use to throw in my whole codebase.

I’ve been experimenting e.g. with Claude and what usually works well is to attach e.g. the whole architecture of a CRUD app along with the most recent docs of the framework I’m using and it’s okay for menial tasks. But I am very uncomfortable sending any kind of data to these providers.

Unfortunately I don’t have a lot of space so I can’t build a proper desktop. My options are either renting out a VPS or going for something small like a MacStudio. I know speeds aren’t great, but I was wondering if using e.g. RAG for documentation could help me get decent speeds.

I’ve read that especially on larger contexts Macs become very slow. I’m not very convinced but I could get a new one probably at 50% off as a business expense, so the Apple tax isn’t as much an issue as the concern about speed.

Any ideas? Are there other mini pcs available that could have better architecture? Tried researching but couldn’t find a lot

Edit: I found some stats on GitHub on different models: https://github.com/ggerganov/llama.cpp/issues/10444

Based on that I also conclude that you’re gonna wait forever if you work with a large codebase.

you are viewing a single comment's thread
view the rest of the comments
[–] Boomkop3@reddthat.com 1 points 1 year ago

Anyways, the important thing is the "TOPS" aka trillions of operations per second. Having enough ram in important, but if you don't have a fast processor than you're wasting ram while you can just stream it from a fast ssd.

One such cases is when your system can't handle more than 50 tops, like the apple m systems. Try an old gpu, and enjoy 1000's of tops