[GH-ISSUE #11866] Starting qwen3-coder:30b-a3b-q4_K_M failed #7877

Closed
opened 2026-04-12 20:02:04 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @EveningQi on GitHub (Aug 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11866

What is the issue?

Starting qwen3-coder:30b-a3b-q4_K_M failed, with a prompt of insufficient VRAM. The graphics card is NVIDIA 4090D with 24G of VRAM. The occupancy rate is less than 5%, so why is there a prompt of insufficient VRAM?

Image Image Image

Relevant log output

alloc_tensor_range: failed to initialize tensor output.weight
llama_model_load: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
Aug 12 17:59:48 ps ollama[1343661]: time=2025-08-12T17:59:48.932+08:00 level=WARN source=sched.go:685 msg="gpu VRAM usage didn't recover within timeout" seconds=5.538058371 runner.size="18.9 GiB" runner.vram="18.9 GiB" runner.parallel=1 runner.pid=1349054 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-e9183b5c18a0cf736578c1e3d1cbd4b7e98e3ad3be6176b68c20f156d54a07ac

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.11.4

Originally created by @EveningQi on GitHub (Aug 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11866 ### What is the issue? Starting qwen3-coder:30b-a3b-q4_K_M failed, with a prompt of insufficient VRAM. The graphics card is NVIDIA 4090D with 24G of VRAM. The occupancy rate is less than 5%, so why is there a prompt of insufficient VRAM? <img width="1906" height="767" alt="Image" src="https://github.com/user-attachments/assets/9efbec8f-fe5e-4a78-a88c-3fccc0a1dbb0" /> <img width="1058" height="566" alt="Image" src="https://github.com/user-attachments/assets/586c68ea-f55d-4ab2-a8fc-87ed24f9fedd" /> <img width="922" height="566" alt="Image" src="https://github.com/user-attachments/assets/e76f1984-4d90-49d9-8cfb-f6ee8bb07f4d" /> ### Relevant log output ```shell alloc_tensor_range: failed to initialize tensor output.weight llama_model_load: error loading model: unable to allocate CUDA0 buffer llama_model_load_from_file_impl: failed to load model Aug 12 17:59:48 ps ollama[1343661]: time=2025-08-12T17:59:48.932+08:00 level=WARN source=sched.go:685 msg="gpu VRAM usage didn't recover within timeout" seconds=5.538058371 runner.size="18.9 GiB" runner.vram="18.9 GiB" runner.parallel=1 runner.pid=1349054 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-e9183b5c18a0cf736578c1e3d1cbd4b7e98e3ad3be6176b68c20f156d54a07ac ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.11.4
GiteaMirror added the bug label 2026-04-12 20:02:04 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 12, 2025):

Full text server log will help in debugging.

<!-- gh-comment-id:3179383117 --> @rick-github commented on GitHub (Aug 12, 2025): Full text server log will help in debugging.
Author
Owner

@EveningQi commented on GitHub (Aug 13, 2025):

全文服务器日志将有助于调试。
This is the latest log output from running qwen3:30b-a3b-q4_K_M. Could a master take a look at what the cause is

Image

log.txt

<!-- gh-comment-id:3182051598 --> @EveningQi commented on GitHub (Aug 13, 2025): > 全文服务器日志将有助于调试。 This is the latest log output from running qwen3:30b-a3b-q4_K_M. Could a master take a look at what the cause is <img width="993" height="519" alt="Image" src="https://github.com/user-attachments/assets/0fde889e-6196-4dca-8b31-988d0028c299" /> [log.txt](https://github.com/user-attachments/files/21746559/log.txt)
Author
Owner

@rick-github commented on GitHub (Aug 13, 2025):

Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so
Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so

The CUDA library is being loaded from two different locations. #11211. Delete /usr/lib/ollama/cuda_v*.

<!-- gh-comment-id:3184251844 --> @rick-github commented on GitHub (Aug 13, 2025): ``` Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so ``` The CUDA library is being loaded from two different locations. #11211. Delete `/usr/lib/ollama/cuda_v*`.
Author
Owner

@EveningQi commented on GitHub (Aug 13, 2025):

Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so
Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so

CUDA 库正在从两个不同的位置加载。 #11211。删除/usr/lib/ollama/cuda_v*

Thank you, thank you. I'll delete it tomorrow and try to run it again

<!-- gh-comment-id:3184343261 --> @EveningQi commented on GitHub (Aug 13, 2025): > ``` > Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so > Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so > ``` > > CUDA 库正在从两个不同的位置加载。 [#11211](https://github.com/ollama/ollama/issues/11211)。删除`/usr/lib/ollama/cuda_v*`。 Thank you, thank you. I'll delete it tomorrow and try to run it again
Author
Owner

@EveningQi commented on GitHub (Aug 14, 2025):

Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so
Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so

CUDA 库正在从两个不同的位置加载。 #11211。删除/usr/lib/ollama/cuda_v*
Thank you, thank you. The model can now be successfully run

<!-- gh-comment-id:3186383169 --> @EveningQi commented on GitHub (Aug 14, 2025): > ``` > Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so > Aug 13 11:14:18 ps ollama[1437584]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so > ``` > > CUDA 库正在从两个不同的位置加载。 [#11211](https://github.com/ollama/ollama/issues/11211)。删除`/usr/lib/ollama/cuda_v*`。 Thank you, thank you. The model can now be successfully run
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7877