[GH-ISSUE #15156] Ollama fails to use a coding model that doesn't fit into Vulkan memory #56215

Closed
opened 2026-04-29 10:26:29 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @yurivict on GitHub (Mar 31, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15156

What is the issue?

The system has the Vulkan device NVIDIA GeForce RTX 2060 with 6 GB VRAM.

claude is configured to use the model gpt-oss:20b. 20b parameters requires 40+GB of memory.

Ollama serve log.

Ollama serv was loading the model very slowly for some reason.

Ollama should put some number of layers onto the Vulkan device and use CPU for the rest.
But this doesn't happen for some reason.

The log shows that after 15+ minutes computation doesn't start due to whatever issues with slow model loading and timeouts.

Relevant log output

See above.

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.19.0

Originally created by @yurivict on GitHub (Mar 31, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15156 ### What is the issue? The system has the Vulkan device NVIDIA GeForce RTX 2060 with 6 GB VRAM. claude is configured to use the model gpt-oss:20b. 20b parameters requires 40+GB of memory. Ollama serve [log](https://freebsd.org/~yuri/ollama-serve-failure-when-model-is-larger-than-vulkan-memory.log). Ollama serv was loading the model very slowly for some reason. Ollama should put some number of layers onto the Vulkan device and use CPU for the rest. But this doesn't happen for some reason. The log shows that after 15+ minutes computation doesn't start due to whatever issues with slow model loading and timeouts. ### Relevant log output ```shell See above. ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.19.0
GiteaMirror added the bugneeds more info labels 2026-04-29 10:26:29 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 31, 2026):

claude is configured to use the model gpt-oss:20b. 20b parameters requires 40+GB of memory.

NAME                    ID              SIZE      PROCESSOR    CONTEXT    UNTIL   
gpt-oss:20b             17052f91a42e    17 GB     100% GPU     131072     Forever    

Ollama should put some number of layers onto the Vulkan device and use CPU for the rest.

time=2026-03-30T17:17:56.396-07:00 level=INFO source=ggml.go:494 msg="offloaded 9/25 layers to GPU"

The log shows that after 15+ minutes computation doesn't start due to whatever issues with slow model loading and timeouts.

time=2026-03-30T17:22:55.802-07:00 level=WARN source=server.go:1359 msg="client connection closed before server finished loading, aborting load"

The client closed the connection before the load was finished.

After the failed load, subsequent runners fail with a SIGSEGV in ggml_backend_dev_init so it looks like the device is wedged somehow. A reboot would probably fix it but that's suboptimal. Why are you using Vulkan with an RTX 2060 rather than CUDA?

<!-- gh-comment-id:4159121204 --> @rick-github commented on GitHub (Mar 31, 2026): > claude is configured to use the model gpt-oss:20b. 20b parameters requires 40+GB of memory. ``` NAME ID SIZE PROCESSOR CONTEXT UNTIL gpt-oss:20b 17052f91a42e 17 GB 100% GPU 131072 Forever ``` > Ollama should put some number of layers onto the Vulkan device and use CPU for the rest. ``` time=2026-03-30T17:17:56.396-07:00 level=INFO source=ggml.go:494 msg="offloaded 9/25 layers to GPU" ``` > The log shows that after 15+ minutes computation doesn't start due to whatever issues with slow model loading and timeouts. ``` time=2026-03-30T17:22:55.802-07:00 level=WARN source=server.go:1359 msg="client connection closed before server finished loading, aborting load" ``` The client closed the connection before the load was finished. After the failed load, subsequent runners fail with a SIGSEGV in `ggml_backend_dev_init` so it looks like the device is wedged somehow. A reboot would probably fix it but that's suboptimal. Why are you using Vulkan with an RTX 2060 rather than CUDA?
Author
Owner

@yurivict commented on GitHub (Mar 31, 2026):

Why are you using Vulkan with an RTX 2060 rather than CUDA?

CUDA isn't supported on FreeBSD.

Reboot didn't change anything.

<!-- gh-comment-id:4159155040 --> @yurivict commented on GitHub (Mar 31, 2026): > Why are you using Vulkan with an RTX 2060 rather than CUDA? CUDA isn't supported on FreeBSD. Reboot didn't change anything.
Author
Owner

@rick-github commented on GitHub (Mar 31, 2026):

Logs from after the reboot?

<!-- gh-comment-id:4159160646 --> @rick-github commented on GitHub (Mar 31, 2026): Logs from after the reboot?
Author
Owner

@yurivict commented on GitHub (Mar 31, 2026):

This is the log from after the reboot.

<!-- gh-comment-id:4159163395 --> @yurivict commented on GitHub (Mar 31, 2026): This is the log from after the reboot.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56215