[GH-ISSUE #5988] GPU with 12GB VRAM couldn't load 8B model under WSL2 #29505

Closed
opened 2026-04-22 08:27:13 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @hoangminh1109 on GitHub (Jul 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5988

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I'm unable to run any of the small model (8B model) on my RTX 3060 12GB.
Ollama is installed in WSL2 under Win10.
image
Server log uploaded ollama_log_error.txt
Some more information:

  • nvidia-smi works well.
  • cuda installed, cuda example deviceQuery works well.

OS

WSL2

GPU

Nvidia

CPU

Intel

Ollama version

0.3.0

Originally created by @hoangminh1109 on GitHub (Jul 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5988 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I'm unable to run any of the small model (8B model) on my RTX 3060 12GB. Ollama is installed in WSL2 under Win10. ![image](https://github.com/user-attachments/assets/56675022-afb1-4361-b7e8-30add303f8c1) Server log uploaded [ollama_log_error.txt](https://github.com/user-attachments/files/16393770/ollama_log_error.txt) Some more information: - nvidia-smi works well. - cuda installed, cuda example deviceQuery works well. ### OS WSL2 ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.0
GiteaMirror added the wslnvidiabugwindows labels 2026-04-22 08:27:14 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jul 26, 2024):

Can you check nvidia-smi and Task Manager on the host system and see how much available VRAM is detected?

This log message is a bit unexpected

unable to load cuda driver library" library=/usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00 error="symbol lookup for cuCtxCreate_v3 failed: /usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3

Driver 418 is pretty old. Is that the version you have on the host, or is there a version mismatch perhaps?

This next line implies you're in the 500 series of drivers:

inference compute" id=GPU-bca32961-2a44-3c18-c167-8306f3d51df9 library=cuda compute=8.6 driver=12.6 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB

My suspicion is that the WSL cuda components are out of sync, and somehow that's leading to incorrect free VRAM reporting, so we're trying to load too many layers.

<!-- gh-comment-id:2253280577 --> @dhiltgen commented on GitHub (Jul 26, 2024): Can you check `nvidia-smi` and Task Manager on the host system and see how much available VRAM is detected? This log message is a bit unexpected ``` unable to load cuda driver library" library=/usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00 error="symbol lookup for cuCtxCreate_v3 failed: /usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3 ``` Driver 418 is pretty old. Is that the version you have on the host, or is there a version mismatch perhaps? This next line implies you're in the 500 series of drivers: ``` inference compute" id=GPU-bca32961-2a44-3c18-c167-8306f3d51df9 library=cuda compute=8.6 driver=12.6 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB ``` My suspicion is that the WSL cuda components are out of sync, and somehow that's leading to incorrect free VRAM reporting, so we're trying to load too many layers.
Author
Owner

@themanyone commented on GitHub (Jul 27, 2024):

ollama/phi3 models quit working for me. I had to rebuild the gguf myself to work with the new sliding window key. Issue #5956

<!-- gh-comment-id:2254089246 --> @themanyone commented on GitHub (Jul 27, 2024): ollama/phi3 models quit working for me. I had to rebuild the gguf myself to work with the new sliding window key. Issue #5956
Author
Owner

@hoangminh1109 commented on GitHub (Jul 28, 2024):

Can you check nvidia-smi and Task Manager on the host system and see how much available VRAM is detected?

This log message is a bit unexpected

unable to load cuda driver library" library=/usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00 error="symbol lookup for cuCtxCreate_v3 failed: /usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3

Driver 418 is pretty old. Is that the version you have on the host, or is there a version mismatch perhaps?

This next line implies you're in the 500 series of drivers:

inference compute" id=GPU-bca32961-2a44-3c18-c167-8306f3d51df9 library=cuda compute=8.6 driver=12.6 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB

My suspicion is that the WSL cuda components are out of sync, and somehow that's leading to incorrect free VRAM reporting, so we're trying to load too many layers.

Hi @dhiltgen,
Both nvidia-smi and Windows Task Manager can detect 12GB VRAM
image
image

<!-- gh-comment-id:2254560096 --> @hoangminh1109 commented on GitHub (Jul 28, 2024): > Can you check `nvidia-smi` and Task Manager on the host system and see how much available VRAM is detected? > > This log message is a bit unexpected > > ``` > unable to load cuda driver library" library=/usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00 error="symbol lookup for cuCtxCreate_v3 failed: /usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3 > ``` > > Driver 418 is pretty old. Is that the version you have on the host, or is there a version mismatch perhaps? > > This next line implies you're in the 500 series of drivers: > > ``` > inference compute" id=GPU-bca32961-2a44-3c18-c167-8306f3d51df9 library=cuda compute=8.6 driver=12.6 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB > ``` > > My suspicion is that the WSL cuda components are out of sync, and somehow that's leading to incorrect free VRAM reporting, so we're trying to load too many layers. Hi @dhiltgen, Both nvidia-smi and Windows Task Manager can detect 12GB VRAM ![image](https://github.com/user-attachments/assets/a4a33f8f-ea6a-4c24-ad3f-5bb87f53628d) ![image](https://github.com/user-attachments/assets/931176fa-676b-44b7-9d22-ed27d9c51d39) - I've tried few Nvidia Windows drivers on host such as 531.18, 551.23, 560.70 but still same problem. - CUDA was installed follow this guideline [https://canonical-ubuntu-wsl.readthedocs-hosted.com/en/latest/tutorials/gpu-cuda/](url) - Even install CUDA 12.5 follow this link doesn't work [https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local](url)
Author
Owner

@hoangminh1109 commented on GitHub (Jul 29, 2024):

I did a fresh re-insallation

Screenshots
NVIDIA-SMI
image
CUDA
image
devideQuery
image

Still not work
This part in previous log has gone

unable to load cuda driver library" library=/usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00 error="symbol lookup for cuCtxCreate_v3 failed: /usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3

but problem still the same.
image

Server log
ollama_logs_2.txt

<!-- gh-comment-id:2254932246 --> @hoangminh1109 commented on GitHub (Jul 29, 2024): **I did a fresh re-insallation** - Reinstall Nvidia Windows driver 560.70 - Reinstall WSL Ubuntu-20.04 (unregister the old one, and download the new one). - Reinstall CUDA 12.5 follow [https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local](url) - Reinstall Ollama using `curl -fsSL https://ollama.com/install.sh | sh` - Models backed up from previous installation (it is fine, as I can run ollama sucessfully on another real Ubuntu machine) **Screenshots** NVIDIA-SMI ![image](https://github.com/user-attachments/assets/4e55efbb-ca63-4624-9805-b57e89f5c12c) CUDA ![image](https://github.com/user-attachments/assets/79dce2dd-7102-4ec3-99d7-c87fa4296a9f) devideQuery ![image](https://github.com/user-attachments/assets/916672e2-0e34-49f0-a663-5b888d636a82) **Still not work** This part in previous log has gone > unable to load cuda driver library" library=/usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00 error="symbol lookup for cuCtxCreate_v3 failed: /usr/lib/x86_64-linux-gnu/libcuda.so.418.226.00: undefined symbol: cuCtxCreate_v3 but problem still the same. ![image](https://github.com/user-attachments/assets/8c289d57-e515-438c-ab73-9a616ecdd791) **Server log** [ollama_logs_2.txt](https://github.com/user-attachments/files/16407477/ollama_logs_2.txt)
Author
Owner

@dhiltgen commented on GitHub (Jul 29, 2024):

I don't have an identical setup, but on a 12G cuda card, loading llama3 only uses up ~6G of VRAM, so I'm confused why it's hitting OOM when on your system it appears to have 11G free.

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    142720      C   ...unners/cuda_v11/ollama_llama_server       6188MiB |
+-----------------------------------------------------------------------------------------+
% ollama ps
NAME         	ID          	SIZE  	PROCESSOR	UNTIL
llama3:latest	71a106a91016	6.7 GB	100% GPU 	3 minutes from now

An experiment that may help shed some light would be to try to force a smaller number of layers and figure out what it can actually allocate on the GPU, then check nvidia-smi on both host and wsl, and task manager.

% curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "hello",
  "stream": false, "options": {"num_gpu": 32 }
}'
<!-- gh-comment-id:2256323127 --> @dhiltgen commented on GitHub (Jul 29, 2024): I don't have an identical setup, but on a 12G cuda card, loading `llama3` only uses up ~6G of VRAM, so I'm confused why it's hitting OOM when on your system it appears to have 11G free. ``` +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 142720 C ...unners/cuda_v11/ollama_llama_server 6188MiB | +-----------------------------------------------------------------------------------------+ ``` ``` % ollama ps NAME ID SIZE PROCESSOR UNTIL llama3:latest 71a106a91016 6.7 GB 100% GPU 3 minutes from now ``` An experiment that may help shed some light would be to try to force a smaller number of layers and figure out what it can actually allocate on the GPU, then check `nvidia-smi` on both host and wsl, and task manager. ``` % curl http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt": "hello", "stream": false, "options": {"num_gpu": 32 } }' ```
Author
Owner

@hoangminh1109 commented on GitHub (Aug 3, 2024):

Not really what was the problem.
I did a re-installation of Windows10, and it just worked

<!-- gh-comment-id:2266674880 --> @hoangminh1109 commented on GitHub (Aug 3, 2024): Not really what was the problem. I did a re-installation of Windows10, and it just worked
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29505