[GH-ISSUE #11865] Failed to start qwen3-coder: 30b-a3b-q4_K_M #54389

Closed
opened 2026-04-29 05:52:40 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @EveningQi on GitHub (Aug 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11865

What is the issue?

Starting qwen3-coder:30b-a3b-q4_K_M failed, with a prompt of insufficient VRAM. The graphics card is NVIDIA 4090D with 24G of VRAM. The occupancy rate is less than 5%, so why is there a prompt of insufficient VRAM?

Image

Relevant log output

alloc_tensor_range: failed to initialize tensor output.weight
llama_model_load: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
Aug 12 17:45:52 ps ollama[1343661]: time=2025-08-12T17:45:52.435+08:00 level=WARN source=sched.go:685 msg="gpu VRAM usage didn't recover within timeout" seconds=5.544757944 runner.size="18.9 GiB" runner.vram="18.9 GiB" runner.parallel=1 runner.pid=1347812 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-e9183b5c18a0cf736578c1e3d1cbd4b7e98e3ad3be6176b68c20f156d54a07ac

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @EveningQi on GitHub (Aug 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11865 ### What is the issue? Starting qwen3-coder:30b-a3b-q4_K_M failed, with a prompt of insufficient VRAM. The graphics card is NVIDIA 4090D with 24G of VRAM. The occupancy rate is less than 5%, so why is there a prompt of insufficient VRAM? <img width="922" height="566" alt="Image" src="https://github.com/user-attachments/assets/9a6e2053-d841-4fb8-906a-5bd2deac6506" /> ### Relevant log output ```shell alloc_tensor_range: failed to initialize tensor output.weight llama_model_load: error loading model: unable to allocate CUDA0 buffer llama_model_load_from_file_impl: failed to load model Aug 12 17:45:52 ps ollama[1343661]: time=2025-08-12T17:45:52.435+08:00 level=WARN source=sched.go:685 msg="gpu VRAM usage didn't recover within timeout" seconds=5.544757944 runner.size="18.9 GiB" runner.vram="18.9 GiB" runner.parallel=1 runner.pid=1347812 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-e9183b5c18a0cf736578c1e3d1cbd4b7e98e3ad3be6176b68c20f156d54a07ac ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-29 05:52:40 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54389