SuspiciousCarrot78

joined 2 weeks ago
[–] SuspiciousCarrot78@aussie.zone 1 points 50 minutes ago

They don't need to. Github has a much stronger SEO. It's literally a global top 100 website.

Github as billboard / pointer / "trust signal" is just smart discoverability....but not a good "home" these days.

[–] SuspiciousCarrot78@aussie.zone 1 points 17 hours ago* (last edited 17 hours ago)

Go for it! The m73 is cheap enough (and powerful enough) to run all that and ddr3 is still not insane (say, 2x8gb 1600mhz sodimm if want / need). $100 or so, all up, if you shop around / your local market pending.

Raspberry pi is more elegant / more constrained / more "fuck you, figure it out" but unless you need the challenge, Lenovo is simpler and all around easier first step :). You can't stick a gpu in it (I think the m920 is the oldest one that has pcie - dunno what they go for. The usual combo is something like a 920 and a Quadro P1000 4GB GPU. Maybe ~$300 all up if we're guessing. At which point, there are better, non shoe box options)

[–] SuspiciousCarrot78@aussie.zone 2 points 23 hours ago* (last edited 23 hours ago) (2 children)

I have a RPI 4b and 3 lenovos (m93p, m710q, p330).

You can't beat the RPI for power draw (~2w idle and ~7w under max load) but I suspect if you wanted to look at $ to utility measure you'd probably prefer the Lenovo M93P. $50 USD. Mine has i7-4785t, 16GB ddr3 (2x8iirc?) with ethernet, USB etc. Bought 2023/4. I expect base model is still that price now (mines upgraded). The only caveat is that it doesn't have HDMI, it has display port out, but that's just a $5 dongle or SSH issue. M73 would be a touch cheaper.

Iirc the TDP is 35w max and can be lowered / undervolted a touch (don't update the BIOS - it blocks throtlestop).

I turned mine into a retro PC slash game server for the kids (luanti etc). But the siren call of doing truly impossible things with the RPI is too beguiling :)

Eg: running diet pi (headless) with all of my services (media stack, privacy, docs, search, images etc) takes about 300 megabytes (or 650mb if I have to boot into xfce).

300mb, 2-3w.

That shouldn't be possible. I love it.

My next goal is to create an expert system / pseudo llm that sources answers based on user provided markdown or PDF, ZIM files and 4get search or Tavily.

The advantage here is that 1) speed will be stupid fast as no neural network crap (outside of optional extra Markov chain garnish) 2) not stochastic (but allow for llm as optional "plug in module" - pi might actually run a 135M at non glacial speeds) 3) still serves openAI compat endpoint.

[–] SuspiciousCarrot78@aussie.zone 1 points 1 day ago* (last edited 23 hours ago) (1 children)

Is that the right site or am I not seeing it? Your link points to this -


https://idlewatt.foundagent.net/ Lookup Categories Compare Vendors AI Data Watch Methodology Will this vendor sign a HIPAA BAA? A cited, date-stamped answer for 105 major SaaS tools — can you sign a Business Associate Agreement and store PHI? Built for digital-health teams during vendor procurement.


Respectfully, that's not really how local LLMs work.

A GGUF model sitting on my hard drive has no ability to "send content back home" any more than a PDF or a JPEG does. If you're running something like llama.cpp or Ollama entirely locally, the model weights are just data files.

The real privacy concerns are cloud APIs, telemetry in front-ends, browser extensions, analytics, update services, or accidentally exposing a service to the public internet.

"Self-hosted AI" isn't one thing. There's a huge difference between:

  • Running ChatGPT through an API
  • Running a commercial AI appliance
  • Running a local Qwen/Mistral/Llama model on your own hardware

Firewalling internet-facing services is good advice. Assuming every local model is secretly uploading prompts is not.

[–] SuspiciousCarrot78@aussie.zone 2 points 3 days ago* (last edited 3 days ago) (1 children)

Hmm....it runs on a 1060...it's a MoE not a dense. 24B is even lighter. Worth a shot.

https://www.youtube.com/watch?v=8F_5pdcD3HY

Else, if youre looking for a coding model (??) something like Sara or fara might suit

https://huggingface.co/microsoft/Fara-7B

[–] SuspiciousCarrot78@aussie.zone 2 points 3 days ago* (last edited 3 days ago)

Nuance about AI? On Lemmy? On FuckAI? What? GTFO.

PS: I work with AI in health care; it's really useful to me. In my personal life, I'm also a fan of self hosting local models. I'd like to think there's room for nuanced discussion but there usually isn't, because wagons have been circled and frankly, people have pegged their identities to x or y. There are better things to do than argue with the righteous on the interwebs.

Anyway, I liked your post, fellow meat popsicle. I recommend using a Thinkpad for hitting with - more satisfying thump.

[–] SuspiciousCarrot78@aussie.zone 2 points 3 days ago* (last edited 3 days ago) (3 children)

I mean...that entirely depends on your use case - and I hate saying that. For me and what I do, Qwen SLM (esp Qwen3-4B 2507 instruct and Qwen3.5-2B) are exceptional. But I'm not trying to do Claude at home.

Best bet? Spend $10 on OpenRouter and try different models. In a head to head with ChatGPT 5.4 mini (excellent for coding BTW), I've found Qwen 3.5 27B more than able to hold its own for coding tasks...IF you narrowly gate it/confine it. The last batch of Qwen's really are something. Dunno about the 3.7 series.

Having said ALL that, I'm really tempted to go back in time and code myself a deterministic expert system, with user updatable knowledge cascade, tool calling and a minimal amount of Markov chain word garnish for flavour. I think we use to just call that "a program" lol.

Really tempted actually, because if 50% of llm use case is basically Super Google but not shit...well, I can make that myself. I just need to point my autism at it.

PS: this might help

https://www.youtube.com/watch?v=0AqpaFm11oI

[–] SuspiciousCarrot78@aussie.zone 2 points 4 days ago* (last edited 4 days ago)

Numbers about 3-4x. The P100 is near 800 GB/s. The 1080 is what... 192GB/s? Hell, even if it were double that, HBM2 simply has larger bandwidth. The 1080 was a gaming card; the P100 is a server / number cruncher.

[–] SuspiciousCarrot78@aussie.zone 3 points 4 days ago* (last edited 4 days ago)

Just for sake of completion

https://piwigo.org/

Pros

Mature project (around since the early 2000s)

Lightweight compared to Immich

Designed as a photo library first, not an AI platform

Albums, tags, metadata, permissions

Huge plugin ecosystem

Runs happily on modest hardware

Can manage very large collections

Doesn't demand phone-app-centric workflows (though of course it has a phone to computer app / sync)

Cons

Feels more like a traditional photo archive than Google Photos

Mobile experience is functional rather than slick

No fancy AI search or face recognition by default (though can add easy enough)

UI is a bit "classic web"

[–] SuspiciousCarrot78@aussie.zone 2 points 5 days ago* (last edited 5 days ago) (3 children)

Huh - cheaper than the P40s (though less VRAM) but larger bandwidth due to HBM2. Good looking out

 

Recent post re: AI as utility

https://www.tomsguide.com/ai/people-will-buy-intelligence-from-us-on-a-meter-chatgpts-ceo-sam-altman-has-critics-worried-with-his-ai-vision

Myself, I'm a fan of local LLM / self hosted ML.... but if you ever needed a clarion call that a hard pivot is coming (soon) for online/ cloud based AI...Altman et al are making some concerning mouth noises (to say nothing of broader concerns with OAI, Anthropic etc).

Right now, I'm sketching out a plan where my Raspberry Pi (always on, 2-3w) uses a magic packet to wake up my modest AI server (Lenovo P330 with Tesla P4) if/when needed (Qwen 3.6-35B-A3B); no point in chugging down 80-100w, 24/7 for no good reason.

If the trend continues the direction it appears to be (increasing costs, environmental impacts etc) then I'd feel a lot better hosting my own as port of first call and replacing simpler tasks with more traditional programs. YMMV.

14
submitted 2 weeks ago* (last edited 2 weeks ago) by SuspiciousCarrot78@aussie.zone to c/localllama@sh.itjust.works
 

More often than not, AI and LLM gets conflated in the public consciousness...and then gets mixed with "Agentic", "SaaS" and other well...slop. So, here is a farmer in Japan, using a raspberry pi, to sort cucumbers.

https://www.newsweek.com/artificial-intelligence-cucumber-farm-raspberry-pi-495289

PS: 2016 article. I expect by now the tractor is self driving and named Betty.

If you have any other "dude does cool AI shit with a box of scraps in a cave", I'm all EARS.md

 

I was browsing Reddit (yetch) while waiting for some stuff to finish when I came across this post

https://old.reddit.com/r/LocalLLM/comments/1tek00h/why_is_llm_is_so_expensive/

The author make a (very) interesting claim: if table stakes are $6K (they're not...but go with it for now), then most folks are cooked from the get go.

Personally, I have been figuring out how to get more from less. For example, people have found ways to run Qwen3.6 35B on a 6GB VRAM GTX 1060 at ~20tok/s (--ctx 64K IIRC, but go check the vids yourself)

https://youtu.be/8F_5pdcD3HY

I think there's a lot of juice to squeeze by turning LLMs from "all seeing sages" into basically mouth pieces for shit that actually runs fast on regular silicon - but that's just me and my crazy brain. YMMV.

5
Token Speed visualiser (mikeveerman.github.io)
submitted 2 weeks ago* (last edited 2 weeks ago) by SuspiciousCarrot78@aussie.zone to c/localllama@sh.itjust.works
 

https://mikeveerman.github.io/tokenspeed/?rate=20&mode=agent&think=15

Exactly what it says on the tin :)

Pretty good simulator this. May it cause you to reconsider your expensive GPU upgrade :)

view more: next ›