corvus
It gives me exactly the same message but I'm not using a VPN. When I use the external viewer option with mpv using yt-dlp I only get video without audio. I can download the video fine using yt-dlp and then watch it with mpv, but if I try to stream to mpv while downloading to watch it real-time it gives an ffmpeg error: can't recognize format... weird.
Then it's not the new C, maybe the new C++
Then I'll wait for Rust++
Oh great, thanks
Yeah I tested with lower numbers and it works, I just wanted to offload the whole model thinking it will work, 2GB it's a lot. With other models it prints about 250MB when fails and if you sum up the model size it's still well below the iGPU free memory so I dont get it... anyway, I was thinking about upgrading the memory to 32GB or may be 64GB but I hesitate because with models around 7GB and CPU only I get around 5 t/s and with 14GB 2-3 t/s, so I run one of around 30GB I guess it will get around 1 t/s? My supposition is that increasing RAM doesn't increase performance per se, just let's you upload bigger models to memory, so performance is approximately linear on model size... what do you think?
I get an error when offloading the whole model to GPU
./build/bin/llama-cli -m ~/software/ai/models/deepseek-math-7b-instruct.Q8_0.gguf -n 200 -t 10 -ngl 31 -if
The relevant output is:
....
llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Iris(R) Xe Graphics (RPL-U)) - 7759 MiB free
...
print_info: file size = 6.84 GiB (8.50 BPW)
....
load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 30 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 31/31 layers to GPU load_tensors: Vulkan0 model buffer size = 6577.83 MiB load_tensors: CPU_Mapped model buffer size = 425.00 MiB
.....
ggml_vulkan: Device memory allocation of size 2013265920 failed ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory llama_kv_cache_init: failed to allocate buffer for kv cache llama_init_from_model: llama_kv_cache_init() failed for self-attention cache common_init_from_params: failed to create context with model '~/software/ai/models/deepseek-math-7b-instruct.Q8_0.gguf' main: error: unable to load model
It seems to me that there is enough room for the model, but I don't know what "Device memory allocation of size 2013265920" means.
Is BLAS faster with CPU only than Vulkan with CPU+iGPU? After failing to make work the SYCL backend in llama.cpp apparently because of a Debian driver issue I ended up using the Vulkan backend but after many tests offloadding to the iGPU doesn't seem to make much difference.
Is BLAS faster with CPU only than Vulkan with CPU+iGPU? After failing to make work the SYCL backend of llama.cpp apparently because a Debian driver issue I tried the Vulkan backend successfuly but offloading to iGPU doesn't seems to make much difference.
I don't know, but I really enjoyed reading his books.