[GH-ISSUE #12584] Ollama CUDA Error: the function requires an architectural feature absent from the device #54863

Closed
opened 2026-04-29 07:42:18 -05:00 by GiteaMirror · 18 comments
Owner

Originally created by @pauljoohyunkim on GitHub (Oct 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12584

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I am running Ollama with NVIDIA GPU (on my Dell Inspiron 15 7577 laptop with GTX 1060 card) on Arch Linux, and ever since around ollama-cuda version 0.12.4-1, I get an error related to CUDA (when I am running mistral:latest model)

Currently I am running Ollama with CUDA by downgrading it to 0.12.0-1 version where it seemed to have worked.

Relevant log output

pbjk@PAUL-DELL-ARCH ~ $ sudo journalctl -u ollama --no-pager --follow --pager-end
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: rip    0x7fed9409894c
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: rflags 0x246
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: cs     0x33
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: fs     0x0
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: gs     0x0
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.163+09:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server error"
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.414+09:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server not responding"
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.827+09:00 level=ERROR source=server.go:426 msg="llama runner terminated" error="exit status 2"
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.915+09:00 level=INFO source=sched.go:449 msg="Load failed" model=/var/lib/ollama/blobs/sha256-f5074b1221da0f5a2910d33b642efa5b9eb58cfdddca1c79e16d7ad28aa2b31f error="llama runner process has terminated: CUDA error: the function requires an architectural feature absent from the device\n  current device: 0, in function cublas_handle at /build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/common.cuh:1041\n  cublasCreate_v2(&cublas_handles[device])\n/build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error"
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: [GIN] 2025/10/12 - 11:35:02 | 500 | 26.348420617s |       127.0.0.1 | POST     "/api/generate"

^Cpbjk@PAUL-DELL-ARCH ~ $ ollama run mistral
Error: 500 Internal Server Error: llama runner process has terminated: CUDA error
pbjk@PAUL-DELL-ARCH ~ $ ollama run mistral
Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: the function requires an architectural feature absent from the device
  current device: 0, in function cublas_handle at /build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/common.cuh:1041
  cublasCreate_v2(&cublas_handles[device])
/build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.12.5

Originally created by @pauljoohyunkim on GitHub (Oct 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12584 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I am running Ollama with NVIDIA GPU (on my Dell Inspiron 15 7577 laptop with GTX 1060 card) on Arch Linux, and ever since around ollama-cuda version 0.12.4-1, I get an error related to CUDA (when I am running mistral:latest model) Currently I am running Ollama with CUDA by downgrading it to 0.12.0-1 version where it seemed to have worked. ### Relevant log output ```shell pbjk@PAUL-DELL-ARCH ~ $ sudo journalctl -u ollama --no-pager --follow --pager-end Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: rip 0x7fed9409894c Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: rflags 0x246 Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: cs 0x33 Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: fs 0x0 Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: gs 0x0 Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.163+09:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server error" Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.414+09:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server not responding" Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.827+09:00 level=ERROR source=server.go:426 msg="llama runner terminated" error="exit status 2" Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.915+09:00 level=INFO source=sched.go:449 msg="Load failed" model=/var/lib/ollama/blobs/sha256-f5074b1221da0f5a2910d33b642efa5b9eb58cfdddca1c79e16d7ad28aa2b31f error="llama runner process has terminated: CUDA error: the function requires an architectural feature absent from the device\n current device: 0, in function cublas_handle at /build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/common.cuh:1041\n cublasCreate_v2(&cublas_handles[device])\n/build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error" Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: [GIN] 2025/10/12 - 11:35:02 | 500 | 26.348420617s | 127.0.0.1 | POST "/api/generate" ^Cpbjk@PAUL-DELL-ARCH ~ $ ollama run mistral Error: 500 Internal Server Error: llama runner process has terminated: CUDA error pbjk@PAUL-DELL-ARCH ~ $ ollama run mistral Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: the function requires an architectural feature absent from the device current device: 0, in function cublas_handle at /build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/common.cuh:1041 cublasCreate_v2(&cublas_handles[device]) /build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.5
GiteaMirror added the bugnvidia labels 2026-04-29 07:42:30 -05:00
Author
Owner

@Kaoticz commented on GitHub (Oct 12, 2025):

I'm having the same issue on v0.12.5 with a Nvidia GTX 1070 GPU, also on Arch Linux.
I wonder if the change More reliable and accurate VRAM detection introduced in v0.12.4 is what caused this issue, as my logs suggest Ollama is not able to detect my GPU.

$ ollama serve
time=2025-10-12T13:01:43.845-03:00 level=INFO source=routes.go:1481 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/kotz/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-10-12T13:01:43.848-03:00 level=INFO source=images.go:522 msg="total blobs: 22"
time=2025-10-12T13:01:43.849-03:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-10-12T13:01:43.849-03:00 level=INFO source=routes.go:1534 msg="Listening on 127.0.0.1:11434 (version 0.12.5)"
time=2025-10-12T13:01:43.850-03:00 level=INFO source=runner.go:80 msg="discovering available GPUs..."
time=2025-10-12T13:01:44.301-03:00 level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.3 GiB" available="22.4 GiB"
time=2025-10-12T13:01:44.301-03:00 level=INFO source=routes.go:1575 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
<!-- gh-comment-id:3394780359 --> @Kaoticz commented on GitHub (Oct 12, 2025): I'm having the same issue on v0.12.5 with a Nvidia GTX 1070 GPU, also on Arch Linux. I wonder if the change `More reliable and accurate VRAM detection` introduced in v0.12.4 is what caused this issue, as my logs suggest Ollama is not able to detect my GPU. ``` $ ollama serve time=2025-10-12T13:01:43.845-03:00 level=INFO source=routes.go:1481 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/kotz/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-10-12T13:01:43.848-03:00 level=INFO source=images.go:522 msg="total blobs: 22" time=2025-10-12T13:01:43.849-03:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-12T13:01:43.849-03:00 level=INFO source=routes.go:1534 msg="Listening on 127.0.0.1:11434 (version 0.12.5)" time=2025-10-12T13:01:43.850-03:00 level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-12T13:01:44.301-03:00 level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.3 GiB" available="22.4 GiB" time=2025-10-12T13:01:44.301-03:00 level=INFO source=routes.go:1575 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ```
Author
Owner

@dhiltgen commented on GitHub (Oct 12, 2025):

@pauljoohyunkim it looks like you're building from source. I wasn't able to reproduce the crash on our official builds on a GeForce GT 1030 which is also a Compute Capability 6.1 GPU like yours. Can you try with the official builds and confirm they work properly? What CUDA version are you using to compile? Can you provide more details about how you're building?

@Kaoticz your issue may be unrelated to Paul's. If you run the following that may help shed more light on what's going wrong

sudo systemctl stop ollama
OLLAMA_DEBUG=2 ollama serve 2>&1 | tee serve.log

Then just capture the startup logs up to "inference compute" so we can see why the GPU discovery isn't working properly.

<!-- gh-comment-id:3395341670 --> @dhiltgen commented on GitHub (Oct 12, 2025): @pauljoohyunkim it looks like you're building from source. I wasn't able to reproduce the crash on our official builds on a GeForce GT 1030 which is also a Compute Capability 6.1 GPU like yours. Can you try with the official builds and confirm they work properly? What CUDA version are you using to compile? Can you provide more details about how you're building? @Kaoticz your issue may be unrelated to Paul's. If you run the following that may help shed more light on what's going wrong ``` sudo systemctl stop ollama OLLAMA_DEBUG=2 ollama serve 2>&1 | tee serve.log ``` Then just capture the startup logs up to "inference compute" so we can see why the GPU discovery isn't working properly.
Author
Owner

@pauljoohyunkim commented on GitHub (Oct 12, 2025):

@pauljoohyunkim it looks like you're building from source. I wasn't able to reproduce the crash on our official builds on a GeForce GT 1030 which is also a Compute Capability 6.1 GPU like yours. Can you try with the official builds and confirm they work properly? What CUDA version are you using to compile? Can you provide more details about how you're building?

@Kaoticz your issue may be unrelated to Paul's. If you run the following that may help shed more light on what's going wrong

sudo systemctl stop ollama
OLLAMA_DEBUG=2 ollama serve 2>&1 | tee serve.log

Then just capture the startup logs up to "inference compute" so we can see why the GPU discovery isn't working properly.

I installed it via
pacman -S ollama-cuda
(which would grab the most recent packages from ollama and ollama-cuda, where the former is grabbed due to being dependency of ollama-cuda.)

Downloading and decompressing the package files show that the installed files are mostly shared objects.

./ollama/usr/bin/ollama
./ollama/usr/lib/ollama/libggml-base.so
./ollama/usr/lib/ollama/libggml-cpu-alderlake.so
./ollama/usr/lib/ollama/libggml-cpu-haswell.so
./ollama/usr/lib/ollama/libggml-cpu-icelake.so
./ollama/usr/lib/ollama/libggml-cpu-sandybridge.so
./ollama/usr/lib/ollama/libggml-cpu-skylakex.so
./ollama/usr/lib/ollama/libggml-cpu-sse42.so
./ollama/usr/lib/ollama/libggml-cpu-x64.so
./ollama/usr/lib/systemd/system/ollama.service
./ollama/usr/lib/sysusers.d/ollama.conf
./ollama/usr/lib/tmpfiles.d/ollama.conf
./ollama/usr/share/licenses/ollama/LICENSE
./ollama-cuda/usr/lib/ollama/libggml-cuda.so

nvidia-smi command output shows:

Mon Oct 13 19:39:13 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   39C    P8              6W /   60W |       7MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1430      G   /usr/lib/Xorg                             4MiB |
+-----------------------------------------------------------------------------------------+

I'll see later if getting from source changes anything but it is still weird that the error is due to a CUDA call cublasCreate_v2(&cublas_handles[device]).

Another thing I noticed was that it was having trouble when I was running ollama run mistral or ollama run llama3.1:8b, but ollama run gemma3:4b seemed to be working fine.

<!-- gh-comment-id:3395460914 --> @pauljoohyunkim commented on GitHub (Oct 12, 2025): > [@pauljoohyunkim](https://github.com/pauljoohyunkim) it looks like you're building from source. I wasn't able to reproduce the crash on our official builds on a GeForce GT 1030 which is also a Compute Capability 6.1 GPU like yours. Can you try with the official builds and confirm they work properly? What CUDA version are you using to compile? Can you provide more details about how you're building? > > [@Kaoticz](https://github.com/Kaoticz) your issue may be unrelated to Paul's. If you run the following that may help shed more light on what's going wrong > > ``` > sudo systemctl stop ollama > OLLAMA_DEBUG=2 ollama serve 2>&1 | tee serve.log > ``` > > Then just capture the startup logs up to "inference compute" so we can see why the GPU discovery isn't working properly. I installed it via `pacman -S ollama-cuda` (which would grab the most recent packages from [ollama](https://archive.archlinux.org/packages/o/ollama/) and [ollama-cuda](https://archive.archlinux.org/packages/o/ollama-cuda/), where the former is grabbed due to being dependency of ollama-cuda.) Downloading and decompressing the package files show that the installed files are mostly shared objects. ``` ./ollama/usr/bin/ollama ./ollama/usr/lib/ollama/libggml-base.so ./ollama/usr/lib/ollama/libggml-cpu-alderlake.so ./ollama/usr/lib/ollama/libggml-cpu-haswell.so ./ollama/usr/lib/ollama/libggml-cpu-icelake.so ./ollama/usr/lib/ollama/libggml-cpu-sandybridge.so ./ollama/usr/lib/ollama/libggml-cpu-skylakex.so ./ollama/usr/lib/ollama/libggml-cpu-sse42.so ./ollama/usr/lib/ollama/libggml-cpu-x64.so ./ollama/usr/lib/systemd/system/ollama.service ./ollama/usr/lib/sysusers.d/ollama.conf ./ollama/usr/lib/tmpfiles.d/ollama.conf ./ollama/usr/share/licenses/ollama/LICENSE ./ollama-cuda/usr/lib/ollama/libggml-cuda.so ``` `nvidia-smi` command output shows: ``` Mon Oct 13 19:39:13 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1060 ... Off | 00000000:01:00.0 Off | N/A | | N/A 39C P8 6W / 60W | 7MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1430 G /usr/lib/Xorg 4MiB | +-----------------------------------------------------------------------------------------+ ``` I'll see later if getting from source changes anything but it is still weird that the error is due to a CUDA call `cublasCreate_v2(&cublas_handles[device])`. Another thing I noticed was that it was having trouble when I was running `ollama run mistral` or `ollama run llama3.1:8b`, but `ollama run gemma3:4b` seemed to be working fine.
Author
Owner

@dhiltgen commented on GitHub (Oct 13, 2025):

@pauljoohyunkim it's possible your OS specific packages may not be built properly for the latest Ollama changes. Can you uninstall those packages and use our official binary release and see if it works properly, or has the same failure?

https://github.com/ollama/ollama/blob/main/docs/linux.md

<!-- gh-comment-id:3398631353 --> @dhiltgen commented on GitHub (Oct 13, 2025): @pauljoohyunkim it's possible your OS specific packages may not be built properly for the latest Ollama changes. Can you uninstall those packages and use our official binary release and see if it works properly, or has the same failure? https://github.com/ollama/ollama/blob/main/docs/linux.md
Author
Owner

@pauljoohyunkim commented on GitHub (Oct 14, 2025):

@dhiltgen You might be right. Either the package I had was having an issue or corrupted, but the binary release seems to be working fine.

Actually, I did a full cleanup of the installation directory and reinstalled the package, and now it seems to work, so maybe on the Arch repo part, there was some sort of messy upgrade going on...

<!-- gh-comment-id:3401681677 --> @pauljoohyunkim commented on GitHub (Oct 14, 2025): @dhiltgen You might be right. Either the package I had was having an issue or corrupted, but the binary release seems to be working fine. Actually, I did a full cleanup of the installation directory and reinstalled the package, and now it seems to work, so maybe on the Arch repo part, there was some sort of messy upgrade going on...
Author
Owner

@pauljoohyunkim commented on GitHub (Oct 14, 2025):

Wait, I am getting error again.

It happens when basically run it through systemd by systemctl start ollama where it would give me that CUDA error again.

Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=moz-extension://*"

These are the only additional lines I've added.

<!-- gh-comment-id:3401700848 --> @pauljoohyunkim commented on GitHub (Oct 14, 2025): Wait, I am getting error again. It happens when basically run it through systemd by `systemctl start ollama` where it would give me that CUDA error again. ``` Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_ORIGINS=moz-extension://*" ``` These are the only additional lines I've added.
Author
Owner

@dhiltgen commented on GitHub (Oct 14, 2025):

@pauljoohyunkim I'm not sure where to submit issues for the Arch Linux packages, but it sounds like that's probably where the problem lies. Until those packages are fixed, you can use our binary artifacts.

<!-- gh-comment-id:3402656774 --> @dhiltgen commented on GitHub (Oct 14, 2025): @pauljoohyunkim I'm not sure where to submit issues for the Arch Linux packages, but it sounds like that's probably where the problem lies. Until those packages are fixed, you can use our binary artifacts.
Author
Owner

@Rushmore75 commented on GitHub (Oct 14, 2025):

+1 for it being broken on arch (EndevourOS)

<!-- gh-comment-id:3403843257 --> @Rushmore75 commented on GitHub (Oct 14, 2025): +1 for it being broken on arch (EndevourOS)
Author
Owner

@nplanel commented on GitHub (Oct 17, 2025):

@Rushmore75 In short cuda 13 dropped support for some architecture (50;52;53;60;61;62;70;72) so ollama cuda doesn't support these cards anymore
Alternative is to use AUR packages aur/ollama-cuda12-bin and aur/cuda12.0 ...

<!-- gh-comment-id:3417071652 --> @nplanel commented on GitHub (Oct 17, 2025): @Rushmore75 In short cuda 13 dropped support for some architecture (50;52;53;60;61;62;70;72) so ollama cuda doesn't support these cards anymore Alternative is to use AUR packages aur/ollama-cuda12-bin and aur/cuda12.0 ...
Author
Owner

@nplanel commented on GitHub (Oct 17, 2025):

another alternative is going back in time at the distro level, force cuda==12 in pacman.conf and upgrade the rest of the distro, but you will get more and more package that will not upgrade due to cuda, cudnn .... at this stage docker image would be a better fit

Server=https://archive.archlinux.org/repos/2025/09/22/$repo/os/$arch

https://wiki.archlinux.org/title/Arch_Linux_Archive#3.2

<!-- gh-comment-id:3417150040 --> @nplanel commented on GitHub (Oct 17, 2025): another alternative is going back in time at the distro level, force cuda==12 in pacman.conf and upgrade the rest of the distro, but you will get more and more package that will not upgrade due to cuda, cudnn .... at this stage docker image would be a better fit `Server=https://archive.archlinux.org/repos/2025/09/22/$repo/os/$arch` https://wiki.archlinux.org/title/Arch_Linux_Archive#3.2
Author
Owner

@pauljoohyunkim commented on GitHub (Oct 19, 2025):

@nplanel I will try rolling back CUDA to version that official supports the card and update on if it works or not.

<!-- gh-comment-id:3419704650 --> @pauljoohyunkim commented on GitHub (Oct 19, 2025): @nplanel I will try rolling back CUDA to version that official supports the card and update on if it works or not.
Author
Owner

@Kaoticz commented on GitHub (Oct 19, 2025):

Installing cuda-12.9 and ollama-cuda12-bin fixed the issue for me. It seems that support for Pascal cards really got dropped in CUDA 13.

<!-- gh-comment-id:3419706964 --> @Kaoticz commented on GitHub (Oct 19, 2025): Installing `cuda-12.9` and `ollama-cuda12-bin` fixed the issue for me. It seems that support for Pascal cards really got dropped in CUDA 13.
Author
Owner

@pauljoohyunkim commented on GitHub (Oct 20, 2025):

I've replaced cuda with cuda-12.9 and it seems to work for now, even with the latest ollama and ollama-cuda.
(My nvcc --version shows 12.9, and nvidia-smi shows that CUDA version is 13.0, though. However, this could just be the difference between runtime and driver API I guess. I'll see if this causes problems later on)

<!-- gh-comment-id:3420110186 --> @pauljoohyunkim commented on GitHub (Oct 20, 2025): I've replaced `cuda` with `cuda-12.9` and it seems to work for now, even with the latest `ollama` and `ollama-cuda`. (My `nvcc --version` shows 12.9, and `nvidia-smi` shows that CUDA version is 13.0, though. However, this could just be the difference between runtime and driver API I guess. I'll see if this causes problems later on)
Author
Owner

@ghost commented on GitHub (Nov 24, 2025):

I've hit this with my Quadro/GTX cards but ollama-cuda12-bin is now "13." While downgrading does fix the "CUDA error: the function requires an architectural feature absent from the device" error it also doesn't seem to use my GPU at all.

I enabled debugging and dug through docs but I could not get the AUR ollama-cuda12-bin to use my GPU's at all. Pulling normal ollama install with the downgraded cuda seems to have worked.

<!-- gh-comment-id:3572601028 --> @ghost commented on GitHub (Nov 24, 2025): I've hit this with my Quadro/GTX cards but ollama-cuda12-bin is now "13." While downgrading does fix the "CUDA error: the function requires an architectural feature absent from the device" error it also doesn't seem to use my GPU at all. I enabled debugging and dug through docs but I could not get the AUR ollama-cuda12-bin to use my GPU's at all. Pulling normal ollama install with the downgraded cuda seems to have worked.
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Fix the issue, do not just say it's solved.

<!-- gh-comment-id:3572823072 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): Fix the issue, do not just say it's solved.
Author
Owner

@insanemal commented on GitHub (Jan 8, 2026):

@Bottlecap202

I had to build ollama-cuda from AUR and I had to override the compiler back to gcc-14
This required

export CC=/usr/bin/gcc-14
export CXX=/usr/bin/g++-14
export CUDAHOSTCXX=/usr/bin/g++-14

to be done before running makepkg

As well as editing the PKGBUILD

to change the cmake_options

  local cmake_options=(
    -B build
    -G Ninja
    -W no-dev
    -D CMAKE_BUILD_TYPE=Release
    -D CMAKE_INSTALL_PREFIX=/usr
    # Disable Vulkan/HIP
    -D CMAKE_DISABLE_FIND_PACKAGE_Vulkan=TRUE
    -D CMAKE_HIP_COMPILER=""
    # For CUDA build only
    # Sync GPU targets from CMakePresets.json
    # For CUDA 12
    -D CMAKE_CUDA_ARCHITECTURES="50;52;53;60;61;62;70;72;75;80;86;87;89;90;90a"
    # for CUDA 13
    #-D CMAKE_CUDA_ARCHITECTURES="75;80;86;87;88;89;90;100;103;110;120;121;121-virtual"
    -D CMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-14
    -D CMAKE_C_COMPILER=gcc-14
    -D CMAKE_CXX_COMPILER=g++-14
  )

Then it builds and works correctly.

<!-- gh-comment-id:3721658231 --> @insanemal commented on GitHub (Jan 8, 2026): @Bottlecap202 I had to build ollama-cuda from AUR and I had to override the compiler back to gcc-14 This required ``` export CC=/usr/bin/gcc-14 export CXX=/usr/bin/g++-14 export CUDAHOSTCXX=/usr/bin/g++-14 ``` to be done before running makepkg As well as editing the PKGBUILD to change the cmake_options ``` local cmake_options=( -B build -G Ninja -W no-dev -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr # Disable Vulkan/HIP -D CMAKE_DISABLE_FIND_PACKAGE_Vulkan=TRUE -D CMAKE_HIP_COMPILER="" # For CUDA build only # Sync GPU targets from CMakePresets.json # For CUDA 12 -D CMAKE_CUDA_ARCHITECTURES="50;52;53;60;61;62;70;72;75;80;86;87;89;90;90a" # for CUDA 13 #-D CMAKE_CUDA_ARCHITECTURES="75;80;86;87;88;89;90;100;103;110;120;121;121-virtual" -D CMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-14 -D CMAKE_C_COMPILER=gcc-14 -D CMAKE_CXX_COMPILER=g++-14 ) ``` Then it builds and works correctly.
Author
Owner

@pauljoohyunkim commented on GitHub (Jan 22, 2026):

As of right now, ollama-cuda12-bin seems to work for me (with the correct NVIDIA driver of course) from Arch.

However, it seems like it got fixed today, because it was not using GPU for all my models yesterday, and the comments on AUR page suggests that there was an issue that got fixed a few hours ago.

<!-- gh-comment-id:3784618476 --> @pauljoohyunkim commented on GitHub (Jan 22, 2026): As of right now, ollama-cuda12-bin seems to work for me (with the correct NVIDIA driver of course) from Arch. However, it seems like it got fixed today, because it was not using GPU for all my models yesterday, and the comments on AUR page suggests that there was an issue that got fixed a few hours ago.
Author
Owner

@Zuzupy commented on GitHub (Mar 12, 2026):

This was very useful I made a help forum in the Ollama Discord in case people try looking there

<!-- gh-comment-id:4044487312 --> @Zuzupy commented on GitHub (Mar 12, 2026): This was very useful I made a help forum in the Ollama Discord in case people try looking there
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54863