overview for TheMightyCat

Building a Linux Phone in c/linux@programming.dev

[–] TheMightyCat@ani.social 10 points 5 days ago (1 children)

I think everyone has once thought of the idea of taking an sbc and a touchscreen and making a linux phone, cool to actually see one!

developer of game 'Rust' talks about anticheat on linux in c/linux_gaming@lemmy.world

[–] TheMightyCat@ani.social 240 points 1 week ago (39 children)

It's almost like client side anti cheat doesn't work and if proper server side anti cheat is made it wouldn't matter what platform the client is on.

Feminism.... Not even once folks in c/lemmyshitpost@lemmy.world

[–] TheMightyCat@ani.social 12 points 2 weeks ago

Dirth rates are overrated anyway

My 8gb vram system as i try to load GLM-4.6-Q0.00001_XXXS.gguf: in c/localllama@sh.itjust.works

[–] TheMightyCat@ani.social 1 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

i found the reason, somehow setting --max_num_seqs 1 makes vllm way more efficient.

Not sure exactly what it does but i think its because vllm batches requests and the api was using with exlamav3 doesn't

Now im doing 100k with vllm too

(Worker_TP0_EP0 pid=99695) INFO 11-03 17:34:00 [gpu_worker.py:298] Available KV cache memory: 4.73 GiB
(Worker_TP1_EP1 pid=99696) INFO 11-03 17:34:00 [gpu_worker.py:298] Available KV cache memory: 4.73 GiB
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1087] GPU KV cache size: 103,264 tokens
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1091] Maximum concurrency for 100,000 tokens per request: 1.03x
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1087] GPU KV cache size: 103,328 tokens
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1091] Maximum concurrency for 100,000 tokens per request: 1.03x

I would say exlamav3 is still slightly more efficient but this explains the huge discrepancy, exlamav3 also allows setting GB per gpu which allows me to get a view more GB then vllm which spreads it evenly because a bunch of memory on gpu 0 is used for other stuff

As for the T/s its about the same, in the 80-100 range, this is what im getting with vllm:

(APIServer pid=99454) INFO 11-03 17:36:31 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:32 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:32 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:34 [loggers.py:127] Engine 000: Avg prompt throughput: 461.4 tokens/s, Avg generation throughput: 17.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.6%, Prefix cache hit rate: 66.9%
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:43 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:44 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:44 [loggers.py:127] Engine 000: Avg prompt throughput: 1684.4 tokens/s, Avg generation throughput: 96.7 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 4.4%, Prefix cache hit rate: 83.4%

Now that i have found this out ive switched back to vllm because the API i'm using with exlamav3 doesn't support qwen 3 tools yet :(

Linux users have no reason to worry about recent AMD GPU driver changes [driver for GPU series RX 5000 and RX 6000 are going into "maintenance mode" is just for Windows] in c/pcgaming@lemmy.ca

[–] TheMightyCat@ani.social 9 points 2 weeks ago (2 children)

I really don't want to shill for nvidia but Isn't the kernel driver open source now?

My 8gb vram system as i try to load GLM-4.6-Q0.00001_XXXS.gguf: in c/localllama@sh.itjust.works

[–] TheMightyCat@ani.social 4 points 3 weeks ago (3 children)

I would suggest trying exllamav3 once, i have no idea what kind of black magic they use but its very memory efficient.

i can't load Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 with 16K using vllm

but using exlamav3 i can SOMEHOW load ArtusDev/Qwen_Qwen3-Coder-30B-A3B-Instruct-EXL3:8.0bpw_H8 at its full context of 262.144 with still 2GiB to spare.

I really feel like this is too good to be true and im doing something wrong but it just works so i don't know.

The only way one should code C btw. in c/programmer_humor@programming.dev

[–] TheMightyCat@ani.social 12 points 1 month ago* (last edited 1 month ago) (2 children)

Forget that the real evil is that the first character of Int or Ganz in this case is capitalized

As Big Tech Oligarchs Wage War on Workers, Sanders Warns AI Could Kill Nearly 100 Million US Jobs in c/news@lemmy.world

[–] TheMightyCat@ani.social 1 points 1 month ago

Why does it matter what is replaced?

I would say if it frees up a human from having to do it that it's good news.

As Big Tech Oligarchs Wage War on Workers, Sanders Warns AI Could Kill Nearly 100 Million US Jobs in c/news@lemmy.world

[–] TheMightyCat@ani.social 1 points 1 month ago (2 children)

"must benefit everyone, not just a handful of billionaires"

It's hard to disagree with that statement but all this upheaval about "killing jobs" seems really pointless to me.

A manual farm has far more employees then one that uses modern machinery, is the economy in decline if the farmer replaces human labor with machines?

EU’s New Cybersecurity Law Could Change Your Everyday Online Life – Here’s How in c/buyeuropean@feddit.uk

[–] TheMightyCat@ani.social 1 points 1 month ago* (last edited 1 month ago)

The hoodied master hacker hacking the mainframe using HTML...

I really doubt the authenticity of reporting if they use such "tech looking" images

soulless and one ruleful in c/196@lemmy.blahaj.zone

[–] TheMightyCat@ani.social 9 points 1 month ago* (last edited 1 month ago) (2 children)

Yeah I tried taking a screenshot first but as you can see the screenshot doesn't include the panel.

This is a PinePhonePro, the experience is surprisingly usable except for one major thing, the battery life is terrible.

Wether that is specific to the PPP, PostmarketOS or if my phone is defective but after like an hour of use the battery is already drained and even in standby the battery won't last more then 6 hours or so.

There are some minor things here and there but that is the main reason i can't use it as a daily driver.

soulless and one ruleful in c/196@lemmy.blahaj.zone

[–] TheMightyCat@ani.social 68 points 1 month ago* (last edited 1 month ago) (6 children)

Fun thing about linux phones is that they can all look different without vendor lock in

(Yes i aplogize for the scuffed picture put pressing the screenshot button closes the menu)

7

Very large amounts of gaming gpus vs AI gpus (ani.social)

submitted 4 months ago by TheMightyCat@ani.social to c/localllama@sh.itjust.works

8 comments fedilink

cross-posted from: https://ani.social/post/16779655

GPU VRAM Price (€) Bandwidth (TB/s) TFLOP16 €/GB €/TB/s €/TFLOP16

NVIDIA H200 NVL 141GB 36284 4.89 1671 257 7423 21

NVIDIA RTX PRO 6000 Blackwell 96GB 8450 1.79 126.0 88 4720 67

NVIDIA RTX 5090 32GB 2299 1.79 104.8 71 1284 22

AMD RADEON 9070XT 16GB 665 0.6446 97.32 41 1031 7

AMD RADEON 9070 16GB 619 0.6446 72.25 38 960 8.5

AMD RADEON 9060XT 16GB 382 0.3223 51.28 23 1186 7.45

This post is part "hear me out" and part asking for advice.

Looking at the table above AI gpus are a pure scam, and it would make much more sense to (atleast looking at this) to use gaming gpus instead, either trough a frankenstein of pcie switches or high bandwith network.

so my question is if somebody has build a similar setup and what their experience has been. And what the expected overhead performance hit is and if it can be made up for by having just way more raw peformance for the same price.

37

Very large amounts of gaming gpus vs AI gpus (ani.social)

submitted 4 months ago by TheMightyCat@ani.social to c/selfhosted@lemmy.world

43 comments fedilink

GPU	VRAM	Price (€)	Bandwidth (TB/s)	TFLOP16	€/GB	€/TB/s	€/TFLOP16
NVIDIA H200 NVL	141GB	36284	4.89	1671	257	7423	21
NVIDIA RTX PRO 6000 Blackwell	96GB	8450	1.79	126.0	88	4720	67
NVIDIA RTX 5090	32GB	2299	1.79	104.8	71	1284	22
AMD RADEON 9070XT	16GB	665	0.6446	97.32	41	1031	7
AMD RADEON 9070	16GB	619	0.6446	72.25	38	960	8.5
AMD RADEON 9060XT	16GB	382	0.3223	51.28	23	1186	7.45

This post is part "hear me out" and part asking for advice.

Looking at the table above AI gpus are a pure scam, and it would make much more sense to (atleast looking at this) to use gaming gpus instead, either trough a frankenstein of pcie switches or high bandwith network.

so my question is if somebody has build a similar setup and what their experience has been. And what the expected overhead performance hit is and if it can be made up for by having just way more raw peformance for the same price.