[GH-ISSUE #15167] [Windows 10] Older NVIDIA GPUs (Maxwell) force fallback to CPU mode, returns 500 error #56219

Closed
opened 2026-04-29 10:26:41 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @Haru95572 on GitHub (Mar 31, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15167

What is the issue?

Problem Description
Ollama 0.19.0 on Windows 10 fails to detect the older NVIDIA GPU and forces fallback to CPU mode despite correct CUDA 11.8 setup.

  • Log shows id=cpu and total_vram="0 B".
  • GPU usage is 0% (Task Manager).
  • Chat requests return 500 Internal Server Error

Expected behavior
Ollama should detect the GPU, use CUDA acceleration, and handle requests normally.

Relevant log output

time=2026-03-31T16:56:18.686+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="3.0 GiB"
time=2026-03-31T16:56:18.686+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU"
time=2026-03-31T16:56:18.690+08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="3.5 GiB"
time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="1.2 GiB"
time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:272 msg="total memory" size="7.7 GiB"
time=2026-03-31T16:56:18.695+08:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-03-31T16:56:18.697+08:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-03-31T16:56:18.699+08:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-31T16:56:21.454+08:00 level=INFO source=server.go:1390 msg="llama runner started in 7.87 seconds"
[GIN] 2026/03/31 - 16:57:42 | 500 |         1m28s |       127.0.0.1 | POST     "/api/chat"

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @Haru95572 on GitHub (Mar 31, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15167 ### What is the issue? Problem Description Ollama 0.19.0 on Windows 10 fails to detect the older NVIDIA GPU and forces fallback to CPU mode despite correct CUDA 11.8 setup. - Log shows `id=cpu` and `total_vram="0 B"`. - GPU usage is 0% (Task Manager). - Chat requests return **500 Internal Server Error** Expected behavior Ollama should detect the GPU, use CUDA acceleration, and handle requests normally. ### Relevant log output ```shell time=2026-03-31T16:56:18.686+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="3.0 GiB" time=2026-03-31T16:56:18.686+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU" time=2026-03-31T16:56:18.690+08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="3.5 GiB" time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="1.2 GiB" time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:272 msg="total memory" size="7.7 GiB" time=2026-03-31T16:56:18.695+08:00 level=INFO source=sched.go:561 msg="loaded runners" count=1 time=2026-03-31T16:56:18.697+08:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-03-31T16:56:18.699+08:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model" time=2026-03-31T16:56:21.454+08:00 level=INFO source=server.go:1390 msg="llama runner started in 7.87 seconds" [GIN] 2026/03/31 - 16:57:42 | 500 | 1m28s | 127.0.0.1 | POST "/api/chat" ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bugneeds more info labels 2026-04-29 10:26:41 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 31, 2026):

Ollama currently supports cuda_v12 and cuda_v13. The last version to support cuda_v11 was 0.9.2.

The 500 error is likely a separate issue, the model should run fine on CPU. Set OLLAMA_DEBUG=2 to add more detail and post the log.

<!-- gh-comment-id:4162947221 --> @rick-github commented on GitHub (Mar 31, 2026): Ollama currently supports cuda_v12 and cuda_v13. The last version to support cuda_v11 was [0.9.2](https://github.com/ollama/ollama/releases/tag/v0.9.2). The 500 error is likely a separate issue, the model should run fine on CPU. Set `OLLAMA_DEBUG=2` to add more detail and post the log.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56219