[GH-ISSUE #1898] CUDA and ROCM libraries not loaded correctly (solved) #63128

Closed
opened 2026-05-03 12:15:36 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Zenopheus on GitHub (Jan 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1898

Originally assigned to: @dhiltgen on GitHub.

I was unable to get Ollama to recognize my RTX 5000 under WSL even though other programs have no problem. I would get the following error:

Jan 08 19:28:33 XDFAF ollama[178990]: 2024/01/08 19:28:33 gpu.go:39: CUDA not detected: nvml vram init failure: 9

After digging into the code I figured out it was loading 'libnvidia-ml.so' from the wrong location (/lib/x86_64-linux-gnu) and the symbol lookups failed. Unfortunately, if the symbols don't load it will not try any other locations for that library (that's the bug). If it kept looking, it would have found '/usr/lib/wsl/lib/libnvidia-ml.so.1' and all would be good. This seems to be effecting many CUDA and ROCM people using WSL. #1704 for example (incorrectly labeled as an enhancement).

You can type the following to see if you're suffering from this problem:
ldconfig -p | grep libnvidia-ml
If you're using WSL, the first line should include "/usr/lib/wsl/lib/" otherwise you might have this issue. You could create a symbolic link in this directory like so:

sudo ln -s /usr/lib/wsl/lib/libnvidia-ml.so.1 /usr/lib/wsl/lib/libnvidia-ml.so
sudo ldconfig

This ONLY works as long as you load Ollama directly (ollama serv) but it doesn't work via systemctl because the link is removed when WSL starts up. This is a know issue with WSL that you can read more about here.

The only way to fix this is to modify cuda_init() so that it loads the library from different locations until one can be initialized. I rewrote the code so that it does this. It also loads specific library versions first (libnvidia-ml.so.1.1, libnvidia-ml.so.1, libnvidia-ml.so). I thought Ollama was slow but now it's amazing!

The working code changes are in my gist. I'll try to submit a PR but I'm swamped at work. If someone wants to submit this as a PR then try to make the same changes to the rocm code so they can be happy.

Originally created by @Zenopheus on GitHub (Jan 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1898 Originally assigned to: @dhiltgen on GitHub. I was unable to get Ollama to recognize my RTX 5000 under WSL even though other programs have no problem. I would get the following error: ``` Jan 08 19:28:33 XDFAF ollama[178990]: 2024/01/08 19:28:33 gpu.go:39: CUDA not detected: nvml vram init failure: 9 ``` After digging into the code I figured out it was loading 'libnvidia-ml.so' from the wrong location (/lib/x86_64-linux-gnu) and the symbol lookups failed. Unfortunately, **if the symbols don't load it will not try any other locations for that library** (that's the bug). If it kept looking, it would have found '/usr/lib/wsl/lib/libnvidia-ml.so.1' and all would be good. This seems to be effecting many CUDA and ROCM people using WSL. #1704 for example _(incorrectly labeled as an enhancement)_. You can type the following to see if you're suffering from this problem: `` ldconfig -p | grep libnvidia-ml `` If you're using WSL, the first line should include "/usr/lib/wsl/lib/" otherwise you might have this issue. You could create a symbolic link in this directory like so: ``` sudo ln -s /usr/lib/wsl/lib/libnvidia-ml.so.1 /usr/lib/wsl/lib/libnvidia-ml.so sudo ldconfig ``` This ONLY works as long as you load Ollama directly (ollama serv) but it doesn't work via systemctl because the link is removed when WSL starts up. This is a know issue with WSL that you can read more about [here](https://forums.developer.nvidia.com/t/wsl2-libcuda-so-and-libcuda-so-1-should-be-symlink/236301). The only way to fix this is to modify cuda_init() so that it loads the library from different locations until one can be initialized. I rewrote the code so that it does this. It also loads specific library versions first (libnvidia-ml.so.1.1, libnvidia-ml.so.1, libnvidia-ml.so). I thought Ollama was slow but now it's amazing! The working code changes are in my [gist](https://gist.github.com/Zenopheus/ba4632ec6dcbd6737b6f9b180d897d1d). I'll try to submit a PR but I'm swamped at work. If someone wants to submit this as a PR then try to make the same changes to the rocm code so they can be happy.
GiteaMirror added the bug label 2026-05-03 12:15:36 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63128