[GH-ISSUE #12584] Ollama CUDA Error: the function requires an architectural feature absent from the device #54863

New Issue

GiteaMirror · 2026-04-29T07:42:18-05:00

GiteaMirror commented

2026-04-29 07:42:18 -05:00

Originally created by @pauljoohyunkim on GitHub (Oct 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12584

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I am running Ollama with NVIDIA GPU (on my Dell Inspiron 15 7577 laptop with GTX 1060 card) on Arch Linux, and ever since around ollama-cuda version 0.12.4-1, I get an error related to CUDA (when I am running mistral:latest model)

Currently I am running Ollama with CUDA by downgrading it to 0.12.0-1 version where it seemed to have worked.

Relevant log output

pbjk@PAUL-DELL-ARCH ~ $ sudo journalctl -u ollama --no-pager --follow --pager-end
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: rip    0x7fed9409894c
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: rflags 0x246
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: cs     0x33
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: fs     0x0
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: gs     0x0
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.163+09:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server error"
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.414+09:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server not responding"
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.827+09:00 level=ERROR source=server.go:426 msg="llama runner terminated" error="exit status 2"
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.915+09:00 level=INFO source=sched.go:449 msg="Load failed" model=/var/lib/ollama/blobs/sha256-f5074b1221da0f5a2910d33b642efa5b9eb58cfdddca1c79e16d7ad28aa2b31f error="llama runner process has terminated: CUDA error: the function requires an architectural feature absent from the device\n  current device: 0, in function cublas_handle at /build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/common.cuh:1041\n  cublasCreate_v2(&cublas_handles[device])\n/build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error"
Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: [GIN] 2025/10/12 - 11:35:02 | 500 | 26.348420617s |       127.0.0.1 | POST     "/api/generate"

^Cpbjk@PAUL-DELL-ARCH ~ $ ollama run mistral
Error: 500 Internal Server Error: llama runner process has terminated: CUDA error
pbjk@PAUL-DELL-ARCH ~ $ ollama run mistral
Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: the function requires an architectural feature absent from the device
  current device: 0, in function cublas_handle at /build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/common.cuh:1041
  cublasCreate_v2(&cublas_handles[device])
/build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.12.5

Originally created by @pauljoohyunkim on GitHub (Oct 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12584 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I am running Ollama with NVIDIA GPU (on my Dell Inspiron 15 7577 laptop with GTX 1060 card) on Arch Linux, and ever since around ollama-cuda version 0.12.4-1, I get an error related to CUDA (when I am running mistral:latest model) Currently I am running Ollama with CUDA by downgrading it to 0.12.0-1 version where it seemed to have worked. ### Relevant log output ```shell pbjk@PAUL-DELL-ARCH ~ $ sudo journalctl -u ollama --no-pager --follow --pager-end Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: rip 0x7fed9409894c Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: rflags 0x246 Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: cs 0x33 Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: fs 0x0 Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: gs 0x0 Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.163+09:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server error" Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.414+09:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server not responding" Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.827+09:00 level=ERROR source=server.go:426 msg="llama runner terminated" error="exit status 2" Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: time=2025-10-12T11:35:02.915+09:00 level=INFO source=sched.go:449 msg="Load failed" model=/var/lib/ollama/blobs/sha256-f5074b1221da0f5a2910d33b642efa5b9eb58cfdddca1c79e16d7ad28aa2b31f error="llama runner process has terminated: CUDA error: the function requires an architectural feature absent from the device\n current device: 0, in function cublas_handle at /build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/common.cuh:1041\n cublasCreate_v2(&cublas_handles[device])\n/build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error" Oct 12 11:35:02 PAUL-DELL-ARCH ollama[1504]: [GIN] 2025/10/12 - 11:35:02 | 500 | 26.348420617s | 127.0.0.1 | POST "/api/generate" ^Cpbjk@PAUL-DELL-ARCH ~ $ ollama run mistral Error: 500 Internal Server Error: llama runner process has terminated: CUDA error pbjk@PAUL-DELL-ARCH ~ $ ollama run mistral Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: the function requires an architectural feature absent from the device current device: 0, in function cublas_handle at /build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/common.cuh:1041 cublasCreate_v2(&cublas_handles[device]) /build/ollama/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.5

GiteaMirror added the bug nvidia labels 2026-04-29 07:42:30 -05:00

GiteaMirror closed this issue

2026-04-29 07:42:33 -05:00

GiteaMirror commented

2026-04-29 07:42:39 -05:00

@Kaoticz commented on GitHub (Oct 12, 2025):

I'm having the same issue on v0.12.5 with a Nvidia GTX 1070 GPU, also on Arch Linux.
I wonder if the change More reliable and accurate VRAM detection introduced in v0.12.4 is what caused this issue, as my logs suggest Ollama is not able to detect my GPU.

$ ollama serve
time=2025-10-12T13:01:43.845-03:00 level=INFO source=routes.go:1481 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/kotz/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-10-12T13:01:43.848-03:00 level=INFO source=images.go:522 msg="total blobs: 22"
time=2025-10-12T13:01:43.849-03:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-10-12T13:01:43.849-03:00 level=INFO source=routes.go:1534 msg="Listening on 127.0.0.1:11434 (version 0.12.5)"
time=2025-10-12T13:01:43.850-03:00 level=INFO source=runner.go:80 msg="discovering available GPUs..."
time=2025-10-12T13:01:44.301-03:00 level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.3 GiB" available="22.4 GiB"
time=2025-10-12T13:01:44.301-03:00 level=INFO source=routes.go:1575 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

@Kaoticz commented on GitHub (Oct 12, 2025): I'm having the same issue on v0.12.5 with a Nvidia GTX 1070 GPU, also on Arch Linux. I wonder if the change `More reliable and accurate VRAM detection` introduced in v0.12.4 is what caused this issue, as my logs suggest Ollama is not able to detect my GPU. ``` $ ollama serve time=2025-10-12T13:01:43.845-03:00 level=INFO source=routes.go:1481 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/kotz/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-10-12T13:01:43.848-03:00 level=INFO source=images.go:522 msg="total blobs: 22" time=2025-10-12T13:01:43.849-03:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-12T13:01:43.849-03:00 level=INFO source=routes.go:1534 msg="Listening on 127.0.0.1:11434 (version 0.12.5)" time=2025-10-12T13:01:43.850-03:00 level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-12T13:01:44.301-03:00 level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.3 GiB" available="22.4 GiB" time=2025-10-12T13:01:44.301-03:00 level=INFO source=routes.go:1575 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ```

GiteaMirror commented

2026-04-29 07:42:44 -05:00

@dhiltgen commented on GitHub (Oct 12, 2025):

@pauljoohyunkim it looks like you're building from source. I wasn't able to reproduce the crash on our official builds on a GeForce GT 1030 which is also a Compute Capability 6.1 GPU like yours. Can you try with the official builds and confirm they work properly? What CUDA version are you using to compile? Can you provide more details about how you're building?

@Kaoticz your issue may be unrelated to Paul's. If you run the following that may help shed more light on what's going wrong

sudo systemctl stop ollama
OLLAMA_DEBUG=2 ollama serve 2>&1 | tee serve.log

Then just capture the startup logs up to "inference compute" so we can see why the GPU discovery isn't working properly.

@dhiltgen commented on GitHub (Oct 12, 2025): @pauljoohyunkim it looks like you're building from source. I wasn't able to reproduce the crash on our official builds on a GeForce GT 1030 which is also a Compute Capability 6.1 GPU like yours. Can you try with the official builds and confirm they work properly? What CUDA version are you using to compile? Can you provide more details about how you're building? @Kaoticz your issue may be unrelated to Paul's. If you run the following that may help shed more light on what's going wrong ``` sudo systemctl stop ollama OLLAMA_DEBUG=2 ollama serve 2>&1 | tee serve.log ``` Then just capture the startup logs up to "inference compute" so we can see why the GPU discovery isn't working properly.

GiteaMirror commented

2026-04-29 07:42:47 -05:00

@pauljoohyunkim commented on GitHub (Oct 12, 2025):

@pauljoohyunkim it looks like you're building from source. I wasn't able to reproduce the crash on our official builds on a GeForce GT 1030 which is also a Compute Capability 6.1 GPU like yours. Can you try with the official builds and confirm they work properly? What CUDA version are you using to compile? Can you provide more details about how you're building?

@Kaoticz your issue may be unrelated to Paul's. If you run the following that may help shed more light on what's going wrong
sudo systemctl stop ollama
OLLAMA_DEBUG=2 ollama serve 2>&1 | tee serve.log
Then just capture the startup logs up to "inference compute" so we can see why the GPU discovery isn't working properly.

I installed it via
pacman -S ollama-cuda
(which would grab the most recent packages from ollama and ollama-cuda, where the former is grabbed due to being dependency of ollama-cuda.)

Downloading and decompressing the package files show that the installed files are mostly shared objects.

./ollama/usr/bin/ollama
./ollama/usr/lib/ollama/libggml-base.so
./ollama/usr/lib/ollama/libggml-cpu-alderlake.so
./ollama/usr/lib/ollama/libggml-cpu-haswell.so
./ollama/usr/lib/ollama/libggml-cpu-icelake.so
./ollama/usr/lib/ollama/libggml-cpu-sandybridge.so
./ollama/usr/lib/ollama/libggml-cpu-skylakex.so
./ollama/usr/lib/ollama/libggml-cpu-sse42.so
./ollama/usr/lib/ollama/libggml-cpu-x64.so
./ollama/usr/lib/systemd/system/ollama.service
./ollama/usr/lib/sysusers.d/ollama.conf
./ollama/usr/lib/tmpfiles.d/ollama.conf
./ollama/usr/share/licenses/ollama/LICENSE
./ollama-cuda/usr/lib/ollama/libggml-cuda.so

nvidia-smi command output shows:

Mon Oct 13 19:39:13 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   39C    P8              6W /   60W |       7MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1430      G   /usr/lib/Xorg                             4MiB |
+-----------------------------------------------------------------------------------------+

I'll see later if getting from source changes anything but it is still weird that the error is due to a CUDA call cublasCreate_v2(&cublas_handles[device]).

Another thing I noticed was that it was having trouble when I was running ollama run mistral or ollama run llama3.1:8b, but ollama run gemma3:4b seemed to be working fine.

@pauljoohyunkim commented on GitHub (Oct 12, 2025): > [@pauljoohyunkim](https://github.com/pauljoohyunkim) it looks like you're building from source. I wasn't able to reproduce the crash on our official builds on a GeForce GT 1030 which is also a Compute Capability 6.1 GPU like yours. Can you try with the official builds and confirm they work properly? What CUDA version are you using to compile? Can you provide more details about how you're building? > > [@Kaoticz](https://github.com/Kaoticz) your issue may be unrelated to Paul's. If you run the following that may help shed more light on what's going wrong > > ``` > sudo systemctl stop ollama > OLLAMA_DEBUG=2 ollama serve 2>&1 | tee serve.log > ``` > > Then just capture the startup logs up to "inference compute" so we can see why the GPU discovery isn't working properly. I installed it via `pacman -S ollama-cuda` (which would grab the most recent packages from [ollama](https://archive.archlinux.org/packages/o/ollama/) and [ollama-cuda](https://archive.archlinux.org/packages/o/ollama-cuda/), where the former is grabbed due to being dependency of ollama-cuda.) Downloading and decompressing the package files show that the installed files are mostly shared objects. ``` ./ollama/usr/bin/ollama ./ollama/usr/lib/ollama/libggml-base.so ./ollama/usr/lib/ollama/libggml-cpu-alderlake.so ./ollama/usr/lib/ollama/libggml-cpu-haswell.so ./ollama/usr/lib/ollama/libggml-cpu-icelake.so ./ollama/usr/lib/ollama/libggml-cpu-sandybridge.so ./ollama/usr/lib/ollama/libggml-cpu-skylakex.so ./ollama/usr/lib/ollama/libggml-cpu-sse42.so ./ollama/usr/lib/ollama/libggml-cpu-x64.so ./ollama/usr/lib/systemd/system/ollama.service ./ollama/usr/lib/sysusers.d/ollama.conf ./ollama/usr/lib/tmpfiles.d/ollama.conf ./ollama/usr/share/licenses/ollama/LICENSE ./ollama-cuda/usr/lib/ollama/libggml-cuda.so ``` `nvidia-smi` command output shows: ``` Mon Oct 13 19:39:13 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1060 ... Off | 00000000:01:00.0 Off | N/A | | N/A 39C P8 6W / 60W | 7MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1430 G /usr/lib/Xorg 4MiB | +-----------------------------------------------------------------------------------------+ ``` I'll see later if getting from source changes anything but it is still weird that the error is due to a CUDA call `cublasCreate_v2(&cublas_handles[device])`. Another thing I noticed was that it was having trouble when I was running `ollama run mistral` or `ollama run llama3.1:8b`, but `ollama run gemma3:4b` seemed to be working fine.

GiteaMirror commented

2026-04-29 07:42:53 -05:00

@dhiltgen commented on GitHub (Oct 13, 2025):

@pauljoohyunkim it's possible your OS specific packages may not be built properly for the latest Ollama changes. Can you uninstall those packages and use our official binary release and see if it works properly, or has the same failure?

https://github.com/ollama/ollama/blob/main/docs/linux.md

@dhiltgen commented on GitHub (Oct 13, 2025): @pauljoohyunkim it's possible your OS specific packages may not be built properly for the latest Ollama changes. Can you uninstall those packages and use our official binary release and see if it works properly, or has the same failure? https://github.com/ollama/ollama/blob/main/docs/linux.md

GiteaMirror commented

2026-04-29 07:42:55 -05:00

@pauljoohyunkim commented on GitHub (Oct 14, 2025):

@dhiltgen You might be right. Either the package I had was having an issue or corrupted, but the binary release seems to be working fine.

Actually, I did a full cleanup of the installation directory and reinstalled the package, and now it seems to work, so maybe on the Arch repo part, there was some sort of messy upgrade going on...

@pauljoohyunkim commented on GitHub (Oct 14, 2025): @dhiltgen You might be right. Either the package I had was having an issue or corrupted, but the binary release seems to be working fine. Actually, I did a full cleanup of the installation directory and reinstalled the package, and now it seems to work, so maybe on the Arch repo part, there was some sort of messy upgrade going on...

GiteaMirror commented

2026-04-29 07:42:57 -05:00

@pauljoohyunkim commented on GitHub (Oct 14, 2025):

Wait, I am getting error again.

It happens when basically run it through systemd by systemctl start ollama where it would give me that CUDA error again.

Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=moz-extension://*"

These are the only additional lines I've added.

@pauljoohyunkim commented on GitHub (Oct 14, 2025): Wait, I am getting error again. It happens when basically run it through systemd by `systemctl start ollama` where it would give me that CUDA error again. ``` Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_ORIGINS=moz-extension://*" ``` These are the only additional lines I've added.

GiteaMirror commented

2026-04-29 07:43:00 -05:00

@dhiltgen commented on GitHub (Oct 14, 2025):

@pauljoohyunkim I'm not sure where to submit issues for the Arch Linux packages, but it sounds like that's probably where the problem lies. Until those packages are fixed, you can use our binary artifacts.

@dhiltgen commented on GitHub (Oct 14, 2025): @pauljoohyunkim I'm not sure where to submit issues for the Arch Linux packages, but it sounds like that's probably where the problem lies. Until those packages are fixed, you can use our binary artifacts.

GiteaMirror commented

2026-04-29 07:43:06 -05:00

@Rushmore75 commented on GitHub (Oct 14, 2025):

+1 for it being broken on arch (EndevourOS)

@Rushmore75 commented on GitHub (Oct 14, 2025): +1 for it being broken on arch (EndevourOS)

GiteaMirror commented

2026-04-29 07:43:07 -05:00

@nplanel commented on GitHub (Oct 17, 2025):

@Rushmore75 In short cuda 13 dropped support for some architecture (50;52;53;60;61;62;70;72) so ollama cuda doesn't support these cards anymore
Alternative is to use AUR packages aur/ollama-cuda12-bin and aur/cuda12.0 ...

@nplanel commented on GitHub (Oct 17, 2025): @Rushmore75 In short cuda 13 dropped support for some architecture (50;52;53;60;61;62;70;72) so ollama cuda doesn't support these cards anymore Alternative is to use AUR packages aur/ollama-cuda12-bin and aur/cuda12.0 ...

GiteaMirror commented

2026-04-29 07:43:08 -05:00

@nplanel commented on GitHub (Oct 17, 2025):

another alternative is going back in time at the distro level, force cuda==12 in pacman.conf and upgrade the rest of the distro, but you will get more and more package that will not upgrade due to cuda, cudnn .... at this stage docker image would be a better fit

Server=https://archive.archlinux.org/repos/2025/09/22/$repo/os/$arch

https://wiki.archlinux.org/title/Arch_Linux_Archive#3.2

@nplanel commented on GitHub (Oct 17, 2025): another alternative is going back in time at the distro level, force cuda==12 in pacman.conf and upgrade the rest of the distro, but you will get more and more package that will not upgrade due to cuda, cudnn .... at this stage docker image would be a better fit `Server=https://archive.archlinux.org/repos/2025/09/22/$repo/os/$arch` https://wiki.archlinux.org/title/Arch_Linux_Archive#3.2

GiteaMirror commented

2026-04-29 07:43:09 -05:00

@pauljoohyunkim commented on GitHub (Oct 19, 2025):

@nplanel I will try rolling back CUDA to version that official supports the card and update on if it works or not.

@pauljoohyunkim commented on GitHub (Oct 19, 2025): @nplanel I will try rolling back CUDA to version that official supports the card and update on if it works or not.

GiteaMirror commented

2026-04-29 07:43:09 -05:00

@Kaoticz commented on GitHub (Oct 19, 2025):

Installing cuda-12.9 and ollama-cuda12-bin fixed the issue for me. It seems that support for Pascal cards really got dropped in CUDA 13.

@Kaoticz commented on GitHub (Oct 19, 2025): Installing `cuda-12.9` and `ollama-cuda12-bin` fixed the issue for me. It seems that support for Pascal cards really got dropped in CUDA 13.

GiteaMirror commented

2026-04-29 07:43:10 -05:00

@pauljoohyunkim commented on GitHub (Oct 20, 2025):

I've replaced cuda with cuda-12.9 and it seems to work for now, even with the latest ollama and ollama-cuda.
(My nvcc --version shows 12.9, and nvidia-smi shows that CUDA version is 13.0, though. However, this could just be the difference between runtime and driver API I guess. I'll see if this causes problems later on)

@pauljoohyunkim commented on GitHub (Oct 20, 2025): I've replaced `cuda` with `cuda-12.9` and it seems to work for now, even with the latest `ollama` and `ollama-cuda`. (My `nvcc --version` shows 12.9, and `nvidia-smi` shows that CUDA version is 13.0, though. However, this could just be the difference between runtime and driver API I guess. I'll see if this causes problems later on)

GiteaMirror commented

2026-04-29 07:43:11 -05:00

@ghost commented on GitHub (Nov 24, 2025):

I've hit this with my Quadro/GTX cards but ollama-cuda12-bin is now "13." While downgrading does fix the "CUDA error: the function requires an architectural feature absent from the device" error it also doesn't seem to use my GPU at all.

I enabled debugging and dug through docs but I could not get the AUR ollama-cuda12-bin to use my GPU's at all. Pulling normal ollama install with the downgraded cuda seems to have worked.

@ghost commented on GitHub (Nov 24, 2025): I've hit this with my Quadro/GTX cards but ollama-cuda12-bin is now "13." While downgrading does fix the "CUDA error: the function requires an architectural feature absent from the device" error it also doesn't seem to use my GPU at all. I enabled debugging and dug through docs but I could not get the AUR ollama-cuda12-bin to use my GPU's at all. Pulling normal ollama install with the downgraded cuda seems to have worked.

GiteaMirror commented

2026-04-29 07:43:11 -05:00

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Fix the issue, do not just say it's solved.

@Bottlecap202 commented on GitHub (Nov 24, 2025): Fix the issue, do not just say it's solved.

GiteaMirror commented

2026-04-29 07:43:12 -05:00

@insanemal commented on GitHub (Jan 8, 2026):

@Bottlecap202

I had to build ollama-cuda from AUR and I had to override the compiler back to gcc-14
This required

export CC=/usr/bin/gcc-14
export CXX=/usr/bin/g++-14
export CUDAHOSTCXX=/usr/bin/g++-14

to be done before running makepkg

As well as editing the PKGBUILD

to change the cmake_options

  local cmake_options=(
    -B build
    -G Ninja
    -W no-dev
    -D CMAKE_BUILD_TYPE=Release
    -D CMAKE_INSTALL_PREFIX=/usr
    # Disable Vulkan/HIP
    -D CMAKE_DISABLE_FIND_PACKAGE_Vulkan=TRUE
    -D CMAKE_HIP_COMPILER=""
    # For CUDA build only
    # Sync GPU targets from CMakePresets.json
    # For CUDA 12
    -D CMAKE_CUDA_ARCHITECTURES="50;52;53;60;61;62;70;72;75;80;86;87;89;90;90a"
    # for CUDA 13
    #-D CMAKE_CUDA_ARCHITECTURES="75;80;86;87;88;89;90;100;103;110;120;121;121-virtual"
    -D CMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-14
    -D CMAKE_C_COMPILER=gcc-14
    -D CMAKE_CXX_COMPILER=g++-14
  )

Then it builds and works correctly.

@insanemal commented on GitHub (Jan 8, 2026): @Bottlecap202 I had to build ollama-cuda from AUR and I had to override the compiler back to gcc-14 This required ``` export CC=/usr/bin/gcc-14 export CXX=/usr/bin/g++-14 export CUDAHOSTCXX=/usr/bin/g++-14 ``` to be done before running makepkg As well as editing the PKGBUILD to change the cmake_options ``` local cmake_options=( -B build -G Ninja -W no-dev -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr # Disable Vulkan/HIP -D CMAKE_DISABLE_FIND_PACKAGE_Vulkan=TRUE -D CMAKE_HIP_COMPILER="" # For CUDA build only # Sync GPU targets from CMakePresets.json # For CUDA 12 -D CMAKE_CUDA_ARCHITECTURES="50;52;53;60;61;62;70;72;75;80;86;87;89;90;90a" # for CUDA 13 #-D CMAKE_CUDA_ARCHITECTURES="75;80;86;87;88;89;90;100;103;110;120;121;121-virtual" -D CMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-14 -D CMAKE_C_COMPILER=gcc-14 -D CMAKE_CXX_COMPILER=g++-14 ) ``` Then it builds and works correctly.

GiteaMirror commented

2026-04-29 07:43:13 -05:00

@pauljoohyunkim commented on GitHub (Jan 22, 2026):

As of right now, ollama-cuda12-bin seems to work for me (with the correct NVIDIA driver of course) from Arch.

However, it seems like it got fixed today, because it was not using GPU for all my models yesterday, and the comments on AUR page suggests that there was an issue that got fixed a few hours ago.

@pauljoohyunkim commented on GitHub (Jan 22, 2026): As of right now, ollama-cuda12-bin seems to work for me (with the correct NVIDIA driver of course) from Arch. However, it seems like it got fixed today, because it was not using GPU for all my models yesterday, and the comments on AUR page suggests that there was an issue that got fixed a few hours ago.

GiteaMirror commented

2026-04-29 07:43:13 -05:00

@Zuzupy commented on GitHub (Mar 12, 2026):

This was very useful I made a help forum in the Ollama Discord in case people try looking there

@Zuzupy commented on GitHub (Mar 12, 2026): This was very useful I made a help forum in the Ollama Discord in case people try looking there

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#54863