this post was submitted on 04 Oct 2024
34 points (87.0% liked)

Selfhosted

60093 readers
875 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam.

  3. Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.

  4. Don't duplicate the full text of your blog or git here. Just post the link for folks to click.

  5. Submission headline should match the article title.

  6. No trolling.

  7. Promotion posts require your active participation in selfhosting or related communities, or the post will be removed. No more than 10% of your posts or comments may be self-promotional, or your post will be removed. F/LOSS Exception: If your post is about a project that is completely open source & can be self-hosted in full without payment, and your account is at least 7 days old, your post is exempt from this rule as long as you continue to engage in comments.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

A while ago, I had requested help with using LLMs to manage all my teaching notes. I have since installed Ollama and been playing with it to get a feel for the setup.

I was also suggested the use of RAG (Retrieval Augmented Generation ) and CA (cognitive architecture). However, I am unclear on good self hosted options for these two tasks. Could you please suggest a few?

For example, I tried ragflow.io and installed it on my system, but it seems I need to setup an account with a username and password to use it. It remains unclear if I can use the system offline like the base ollama model, and that information won't be sent from my computer system.

you are viewing a single comment's thread
view the rest of the comments
[–] kwa@lemmy.zip 4 points 2 years ago* (last edited 2 years ago) (1 children)

I’m new to this and I was wondering why you don’t recommend ollama? This is the first one I managed to run and it seemed decent but if there are better alternatives I’m interested

Edit: it seems the two others don’t have an API. What would you recommend if you need an API?

[–] brucethemoose@lemmy.world 3 points 2 years ago* (last edited 2 years ago)

Pretty much everything has an API :P

ollama is OK because its easy and automated, but you can get higher performance, better vram efficiency, and better samplers from either kobold.cpp or tabbyAPI, with the catch being that more manual configuration is required. But this is good, as it "forces" you to pick and test an optimal config for your system.

I'd recommend kobold.cpp for very short context (like 6K or less) or if you need to partially offload the model to CPU because your GPU is relatively low VRAM. Use a good IQ quantization (like IQ4_M, for instance).

Otherwise use TabbyAPI with an exl2 quantization, as it's generally faster (but GPU only) and much better at long context through its great k/v cache quantization.

They all have OpenAI APIs, though kobold.cpp also has its own web ui.