Just beware that like AMD, Intel GPUs suffer a performance hit when using LLMs because of the CUDA specific optimizations in frameworks like llama.cpp
tau
joined 2 years ago
Just beware that like AMD, Intel GPUs suffer a performance hit when using LLMs because of the CUDA specific optimizations in frameworks like llama.cpp
Do you have any tips (or examples) using quadlets? I tried using them but I couldn't wrap my head around them.