this post was submitted on 13 Dec 2025
2 points (66.7% liked)

Artificial Intelligence

1800 readers
1 users here now

Welcome to the AI Community!

Let's explore AI passionately, foster innovation, and learn together. Follow these guidelines for a vibrant and respectful community:

You can access the AI Wiki at the following link: AI Wiki

Let's create a thriving AI community together!

founded 2 years ago
MODERATORS
 

Hello there

I am starting to understand and discover local (like on my computer) LLM.

LMstudio make it easy for beginners.

There is something I can’t make works right : my laptop has a Nvidia T600 2Gb graphic card. Genuine Nvidia drivers works well on the OS and graphic applications (Linux Mint). But LMstudo can’t use it.

LLM models run fine but only on CPU.

I read somewhere about CUDA not being fully implemented on Linux, is it the limiting factor ? Anyone managed to offload some work to an Nvidia T600 on laptop?

Thanks

top 4 comments
sorted by: hot top controversial new old
[–] MasterBlaster@lemmy.world 1 points 5 days ago

I installed ComfyUI recently and it included some CUDA libraries. When I run a prompt, my GPU spikes to 100%. I'm using an RTX 4080 16G cram Laptop. It's possible your GPU isn't capable. I don't recognize the model, and 2G is tiny.

I'm pretty sure it uses GPU in my LMStudio installation, too.

[–] hendrik@palaver.p3x.de 2 points 1 week ago* (last edited 1 week ago)

CUDA definitely is fully implemented on Linux. I mean all the AI products these days (ChatGPT etc) run on Linux clusters in some datacenters. You'd need to install all the CUDA compute packages in addition to the graphics drivers... But(!) 2GB of VRAM is pretty limiting. Most decent large language models use way more than that, so they won't fit on your graphics card. Maybe try one of the tiniest models you can find. The model download file shouldn't be bigger than 1.5GB or so. Otherwise it'll load it on the CPU anyway. Maybe there's a guide for LMStudio, not sure if it bundles all the requirements like CUDA.

[–] j4k3@piefed.world 1 points 1 week ago (1 children)

You must run the proprietary Nvidia module to use CUDA. The proprietary stuff is not integrated into Linux.

You may find it helpful to know there are two separate sets of software involved in this space. There is the kernel stuff and then there is the user space stuff. In gaming, the user space stuff is what most people are talking about. In AI, we are talking about the kernel module, which is the actual firmware and code in the kernel. In Linux, the kernel uses "modules".

A lot of the ambiguity in this space is related to machines with a UEFI bootloader and secure boot enabled. You may not have this issue yourself, but I am trying to help you contextualize information you may find.

Linux is a monolithic kernel. This means it includes all the drivers/modules in the source code. If you were to compile the kernel yourself, it is actually just a long menu system in a terminal that lets you select whatever hardware modules you need. Then it only complies the kernel for the options you select. Most of the oddball options are for very rare hardware that was basically built wrong in nonstandard ways. Like most modules are built to support a standard way of doing stuff. In other words, most of the modules support a large number of devices from many manufacturers. When you select some oddball option, the kernel just includes that source code. Nothing stops you from manually adding extra junk as models at this point. That is how Android works. These are orphan proprietary kernels when that extra source code is never merged into the mainline Linux kernel.

Some of the Nvidia source code for the firmware is open source. There is a module that can be built into the kernel using this source code. The actual underlying issue is that this open source code is only able to compile using the proprietary nvcc compiler.

Android is a scheme to leverage proprietary orphan kernels to steal device ownership from consumers. Similarly, nvcc is proprietary in order to maintain the option to steal ownership of and control device deprecation.

In other words, you may need to use an older kernel to make it work.

In any case, there are two primary underlying software chains people use, ollama and llama.cpp. Ollama is only able to run models in one hardware space, but is generally easier to use. This means it needs to fit entirely into VRAM. That is a bad option for 2GB. Llama.cpp splits the model between GPU and CPU to run everything at once. Oobabooga Textgen WebUI is a good option for this, or just run a llama.cpp server directly. These have the option to create a local network api that is compatible with open ai, so everything just works the same. It is not hard to do. Getting the sampling settings dialed, and the trial and error of getting a good model split and loaded is a bit more challenging. It is probably better to look for people talking about what they are running on similar hardware and trying those things to get a baseline entry point. In general, a very large quantized model is FAR better than a small model at full precision.

[–] m33@lemmy.zip 2 points 1 week ago

Thanks for this complete explanation