It's almost like client side anti cheat doesn't work and if proper server side anti cheat is made it wouldn't matter what platform the client is on.
TheMightyCat
Dirth rates are overrated anyway
i found the reason, somehow setting --max_num_seqs 1 makes vllm way more efficient.
Not sure exactly what it does but i think its because vllm batches requests and the api was using with exlamav3 doesn't
Now im doing 100k with vllm too
(Worker_TP0_EP0 pid=99695) INFO 11-03 17:34:00 [gpu_worker.py:298] Available KV cache memory: 4.73 GiB
(Worker_TP1_EP1 pid=99696) INFO 11-03 17:34:00 [gpu_worker.py:298] Available KV cache memory: 4.73 GiB
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1087] GPU KV cache size: 103,264 tokens
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1091] Maximum concurrency for 100,000 tokens per request: 1.03x
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1087] GPU KV cache size: 103,328 tokens
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1091] Maximum concurrency for 100,000 tokens per request: 1.03x
I would say exlamav3 is still slightly more efficient but this explains the huge discrepancy, exlamav3 also allows setting GB per gpu which allows me to get a view more GB then vllm which spreads it evenly because a bunch of memory on gpu 0 is used for other stuff
As for the T/s its about the same, in the 80-100 range, this is what im getting with vllm:
(APIServer pid=99454) INFO 11-03 17:36:31 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:32 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:32 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:34 [loggers.py:127] Engine 000: Avg prompt throughput: 461.4 tokens/s, Avg generation throughput: 17.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.6%, Prefix cache hit rate: 66.9%
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:43 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO: 127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:44 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:44 [loggers.py:127] Engine 000: Avg prompt throughput: 1684.4 tokens/s, Avg generation throughput: 96.7 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 4.4%, Prefix cache hit rate: 83.4%
Now that i have found this out ive switched back to vllm because the API i'm using with exlamav3 doesn't support qwen 3 tools yet :(
I would suggest trying exllamav3 once, i have no idea what kind of black magic they use but its very memory efficient.
i can't load Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 with 16K using vllm
but using exlamav3 i can SOMEHOW load ArtusDev/Qwen_Qwen3-Coder-30B-A3B-Instruct-EXL3:8.0bpw_H8 at its full context of 262.144 with still 2GiB to spare.
I really feel like this is too good to be true and im doing something wrong but it just works so i don't know.
Forget that the real evil is that the first character of Int or Ganz in this case is capitalized
Why does it matter what is replaced?
I would say if it frees up a human from having to do it that it's good news.
"must benefit everyone, not just a handful of billionaires"
It's hard to disagree with that statement but all this upheaval about "killing jobs" seems really pointless to me.
A manual farm has far more employees then one that uses modern machinery, is the economy in decline if the farmer replaces human labor with machines?

The hoodied master hacker hacking the mainframe using HTML...
I really doubt the authenticity of reporting if they use such "tech looking" images
Yeah I tried taking a screenshot first but as you can see the screenshot doesn't include the panel.
This is a PinePhonePro, the experience is surprisingly usable except for one major thing, the battery life is terrible.
Wether that is specific to the PPP, PostmarketOS or if my phone is defective but after like an hour of use the battery is already drained and even in standby the battery won't last more then 6 hours or so.
There are some minor things here and there but that is the main reason i can't use it as a daily driver.
Fun thing about linux phones is that they can all look different without vendor lock in

(Yes i aplogize for the scuffed picture put pressing the screenshot button closes the menu)
I think everyone has once thought of the idea of taking an sbc and a touchscreen and making a linux phone, cool to actually see one!