[GH-ISSUE #11270] Report when GPU with an incompatible CUDA architecture is used #53943

Open
opened 2026-04-29 04:59:11 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @FireFragment on GitHub (Jul 2, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11270

What is the issue?

When building Ollama, support for different CUDA capabilities can be toggled using the -DCMAKE_CUDA_ARCHITECTURES flag.
When a GPU with an architecture disabled this way is used, it results in the following cryptic error message on the server followed by long stacktraces:

[GIN] 2025/07/02 - 14:20:13 | 200 |  6.234077207s |       127.0.0.1 | POST     "/api/generate"
ggml_cuda_compute_forward: RMS_NORM failed
CUDA error: no kernel image is available for execution on the device
  current device: 0, in function ggml_cuda_compute_forward at /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2366
  err
/build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:76: CUDA error
SIGSEGV: segmentation violation
PC=0x7f7b16424c57 m=4 sigcode=1 addr=0x206203fe0
signal arrived during cgo execution

ollama serve already knows which compute capability does used GPU have, because it prints this information to stdout (the compute=6.1 part):

time=2025-07-02T13:38:19.391+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-8020c948-dcac-7cc5-4991-07408ef9edad library=cuda variant=v12 compute=6.1 driver=12.4 name="NVIDIA GeForce GTX 1060 with Max-Q Design" total="5.9 GiB" available="5.9 GiB"

I suggest reporting a warning when server detects that GPU's CUDA architecture is unsupported, so that the user knows that the issue is caused by incompatibility of GPU and ollama build.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.9.3

Originally created by @FireFragment on GitHub (Jul 2, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11270 ### What is the issue? When building Ollama, support for different CUDA capabilities can be toggled using the `-DCMAKE_CUDA_ARCHITECTURES` flag. When a GPU with an architecture disabled this way is used, it results in the following cryptic error message on the server followed by long stacktraces: ``` [GIN] 2025/07/02 - 14:20:13 | 200 | 6.234077207s | 127.0.0.1 | POST "/api/generate" ggml_cuda_compute_forward: RMS_NORM failed CUDA error: no kernel image is available for execution on the device current device: 0, in function ggml_cuda_compute_forward at /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2366 err /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:76: CUDA error SIGSEGV: segmentation violation PC=0x7f7b16424c57 m=4 sigcode=1 addr=0x206203fe0 signal arrived during cgo execution ``` `ollama serve` already knows which compute capability does used GPU have, because it prints this information to stdout (the `compute=6.1` part): ``` time=2025-07-02T13:38:19.391+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-8020c948-dcac-7cc5-4991-07408ef9edad library=cuda variant=v12 compute=6.1 driver=12.4 name="NVIDIA GeForce GTX 1060 with Max-Q Design" total="5.9 GiB" available="5.9 GiB" ``` I suggest reporting a warning when server detects that GPU's CUDA architecture is unsupported, so that the user knows that the issue is caused by incompatibility of GPU and ollama build. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.9.3
GiteaMirror added the buildfeature request labels 2026-04-29 04:59:11 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53943