this post was submitted on 20 Jan 2026

451 points (98.5% liked)

Fuck AI

5305 readers

1197 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

451

DuckDuckGo Pushes Back on Forced AI, Asks Users If AI Should Be Optional (sh.itjust.works)

submitted 3 days ago by xabd@sh.itjust.works to c/fuck_ai@lemmy.world

63 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] brucethemoose@lemmy.world 1 points 3 days ago* (last edited 3 days ago) (1 children)

Yeah, accessibility is the big problem.

What I used depends.

For “chat” and creativity, I use my own version of GLM 4.6 350B quantized to just barely fit in 128GB RAM/24GB VRAM, with a fork of llama.cop called ik_llama.cpp:

https://huggingface.co/Downtown-Case/GLM-4.6-128GB-RAM-IK-GGUF

It’s complicated, but in a nutshell, the degradation vs the full model is reasonable even though it’s like 3 bits instead of 16, and it runs at 6-7 tokens/sec even with so much in CPU.

For the UI, it varies, but I tend to use mikupad so I can manipulate the chat syntax. LMStudio works pretty well though.

Now, for STEM stuff or papers? I tend to use Nemotron 49B quantized with exllamav3, or sometimes Seed-OSS 36B, as both are good at that and at long context stuff.

For coding, automation? It… depends. Sometimes I used Qwen VL 32B or 30B, in various runtimes, but it seems that GLM 4.7 Flash and GLM 4.6V will be better once I set them up.

Minimax is pretty good at making quick scripts, while being faster than GLM on my desktop.

For a front end, I’ve been switching around.

I also use custom sampling. I basically always use n-gram sampling in ik_llama.cpp where I can, with DRY at modest temperatures (0.6?). Or low or even zero temperature for more “objective” things. This is massively important, as default sampling is where so many LLM errors come from.

And TBH, I also use GLM 4.7 over API a lot, in situations where privacy does not matter. It’s so cheap it’s basically free.

So… Yeah. That’s the problem. If you just load up LMStudio with its default Llama 8B Q4KM, it’s really dumb and awful and slow. You almost have to be an enthusiast following the space to get usable results.

[–] errer@lemmy.world 1 points 2 days ago (1 children)

Thank you, very insightful.

Really the big disguishing feature is VRAM. Us consumers just don’t have enough. If I could have a 192GB VRAM system I prolly could run a local model comparable to what OpenAI and others offer, but here I am with a lowly 12GB

[–] brucethemoose@lemmy.world 1 points 2 days ago* (last edited 2 days ago)

You mean an Nvidia 3060? You can run GLM 4.6, a 350B model, on 12GB VRAM if you have 128GB of CPU RAM. It's not ideal though.

More practically, you can run GLM Air or Flash quite comfortably. And that'll be considerably better than "cheap" or old models like Nano, on top of being private, uncensored, and hackable/customizable.

The big distinguishing feature is "it's not for the faint of heart," heh. It takes time and tinkering to setup, as all the "easy" preconfigurations are suboptimal.

That aside, even you have a toaster, you can invest a in API credits and run open weights models with relative privacy on a self hosted front end. Pick the jurisdiction of your choosing.

For example: https://openrouter.ai/z-ai/glm-4.6v

It's like a dollar or two per million words. You can even give a middle finger to Nvidia by using Cerebras or Groq, which don't use GPUs at all.