[GH-ISSUE #12136] CUDA Detection issues #8065

New Issue

GiteaMirror · 2026-04-12T20:19:38-05:00

GiteaMirror commented

2026-04-12 20:19:38 -05:00

Originally created by @seedrick on GitHub (Sep 1, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12136

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Running Debian 12, a relatively fresh system I put together about a week ago using some parts I had lying around. I can't get ollama v 0.11.7 to detect My GPU. It shows up in nvidia-smi with CUDA version 13.0 and driver 580.65.06. I have the UUID for the gpu from nvidia-smi -L set as the environment variable for CUDA_VISIBLE_DEVICES.

All I'm met with is no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.580.65.06. I don't see what I'm missing, logs aren't saying much else. sudo nvidia-modprobe -u from the troubleshooting guide returns nothing.

https://cdn.discordapp.com/attachments/1410683002333302886/1410771399701954600/image.png?ex=68b62f40&is=68b4ddc0&hm=224b7cce72d7e77b339082b54605d93bce9118c3573d418b15e21d548bb2125d&

https://cdn.discordapp.com/attachments/1410683002333302886/1411503133200482427/image.png?ex=68b635bb&is=68b4e43b&hm=bb33a5097a2858ee385a02aacbb7dda866aa47d7c090c1d12015bda90ee09c12&

Relevant log output

=2025-08-30T21:33:24.584-04:00 level=INFO source=routes.go:1331 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/yunohost.app/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[https://sub.domain.tld http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-08-30T21:33:27.044-04:00 level=INFO source=images.go:477 msg="total blobs: 8"         
time=2025-08-30T21:33:27.045-04:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2025-08-30T21:33:27.088-04:00 level=INFO source=routes.go:1384 msg="Listening on [::]:11434 (version 0.11.8)"
time=2025-08-30T21:33:27.149-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-08-30T21:33:43.893-04:00 level=INFO source=gpu.go:604 msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.580.65.06"
time=2025-08-30T21:33:47.084-04:00 level=INFO source=gpu.go:379 msg="no compatible GPUs were discovered"
time=2025-08-30T21:33:47.084-04:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="15.4 GiB" available="14.3 GiB"
time=2025-08-30T21:33:47.084-04:00 level=INFO source=routes.go:1425 msg="entering low vram mode" "total vram"="15.4 GiB" threshold="20.0 GiB"

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.11.8

Originally created by @seedrick on GitHub (Sep 1, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12136 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Running Debian 12, a relatively fresh system I put together about a week ago using some parts I had lying around. I can't get ollama v 0.11.7 to detect My GPU. It shows up in `nvidia-smi` with CUDA version 13.0 and driver 580.65.06. I have the UUID for the gpu from `nvidia-smi -L` set as the environment variable for CUDA_VISIBLE_DEVICES. All I'm met with is `no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.580.65.06`. I don't see what I'm missing, logs aren't saying much else. `sudo nvidia-modprobe -u` from the troubleshooting guide returns nothing. https://cdn.discordapp.com/attachments/1410683002333302886/1410771399701954600/image.png?ex=68b62f40&is=68b4ddc0&hm=224b7cce72d7e77b339082b54605d93bce9118c3573d418b15e21d548bb2125d& https://cdn.discordapp.com/attachments/1410683002333302886/1411503133200482427/image.png?ex=68b635bb&is=68b4e43b&hm=bb33a5097a2858ee385a02aacbb7dda866aa47d7c090c1d12015bda90ee09c12& ### Relevant log output ```shell =2025-08-30T21:33:24.584-04:00 level=INFO source=routes.go:1331 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/yunohost.app/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[https://sub.domain.tld http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-08-30T21:33:27.044-04:00 level=INFO source=images.go:477 msg="total blobs: 8" time=2025-08-30T21:33:27.045-04:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2025-08-30T21:33:27.088-04:00 level=INFO source=routes.go:1384 msg="Listening on [::]:11434 (version 0.11.8)" time=2025-08-30T21:33:27.149-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-08-30T21:33:43.893-04:00 level=INFO source=gpu.go:604 msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.580.65.06" time=2025-08-30T21:33:47.084-04:00 level=INFO source=gpu.go:379 msg="no compatible GPUs were discovered" time=2025-08-30T21:33:47.084-04:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="15.4 GiB" available="14.3 GiB" time=2025-08-30T21:33:47.084-04:00 level=INFO source=routes.go:1425 msg="entering low vram mode" "total vram"="15.4 GiB" threshold="20.0 GiB" ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.11.8

GiteaMirror added the linux nvidia bug labels 2026-04-12 20:19:38 -05:00

GiteaMirror closed this issue

2026-04-12 20:19:40 -05:00

GiteaMirror commented

2026-04-12 20:19:42 -05:00

@dngettler commented on GitHub (Sep 1, 2025):

Your issues are caused by a subtle compatibility mismatch between Ollama and the installed CUDA/driver stack on Debian 12, despite the NVIDIA GPU being properly detected by nvidia-smi and the CUDA-visible environment variable being set.¹²³⁴⁵⁶

Root Cause Breakdown

CUDA & NVIDIA Driver Incompatibility

The user has CUDA 13.0 and driver 580.65.06, both very new releases.⁷
Multiple GitHub reports confirm that Ollama is not fully compatible with the latest CUDA (13.x) and driver (580.x)—users note successful GPU detection only with CUDA 12.x/driver 550.x or earlier.²³⁵
When Ollama tries to link against /usr/lib/x86_64-linux-gnu/libcuda.so.580.65.06, the API may have changed, or the loader expects different symbols, leading to the “no nvidia devices detected” error—even though nvidia-smi lists the device.⁴⁶¹²

Environment Variables & Hardware

Setting CUDA_VISIBLE_DEVICES to the GPU's UUID is correct and accepted practice, but doesn’t resolve low-level library mismatches.⁶¹
nvidia-modprobe -u returning nothing suggests kernel modules are loaded correctly for nvidia-smi, but user-space software (Ollama) can’t reach the GPU through the intended library interface.³²

Specific Troubleshooting Steps (from issues/forums)

Downgrade CUDA Toolkit and NVIDIA Driver
- Remove CUDA 13.0 and install CUDA 12.3 (or 12.2) and driver 550.x, as most Ollama success reports on Debian/Ubuntu stem from this combo.⁵²⁴⁶
- Confirm or fix symlinks: /usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.550.x for driver 550.x.
Reinstall Ollama
- After downgrading, reinstall Ollama to ensure correct library linkage at install time—the binary may link dynamically.³⁵
Check for Multiple CUDA Installs
- Ensure only one version is present and that PATH/LD_LIBRARY_PATH does not point to a mixture of 13.x and older CUDA folders.⁸⁴
Double-Check Kernel Modules
- lsmod | grep nvidia must output the nvidia module. Run sudo depmod -a if modules look off²⁶.
Validate Environment
- Try running simple CUDA sample apps (deviceQuery) outside Ollama to isolate software/library/config issues.⁸
- Remove mesa/OpenCL conflicting libraries if present.⁸

The Key Issue

Ollama currently does not recognize GPUs using the latest CUDA/driver stack (13.0/580.x) on Debian 12 because the API or binary interface has changed, and Ollama’s GPU detection is not yet updated to match. Downgrading driver and CUDA versions to match known-working setups is the fastest solution.¹²⁴⁵⁶⁷³⁸

Table: Actions to Address the Issue

Step	Command/Action	Reason	Source
Remove CUDA 13.0/driver 580.x	`apt purge cuda*`, install CUDA 12.x, 550.x	Verified working config for Ollama	²⁵⁶
Check GPU symlinks	`ls -l /usr/lib/x86_64-linux-gnu/libcuda.so*`	Match symlink/version to driver	⁸
Reinstall Ollama	`sudo apt remove ollama` then fresh install	Dynamic library linkage	³⁵
Test with CUDA samples	Run `/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery`	Diagnoses CUDA install issues	⁸

This reflects the current best knowledge, as reported by peers who have solved the exact scenario on recent Debian builds.²⁴⁵⁶

⁂

@dngettler commented on GitHub (Sep 1, 2025): Your issues are caused by a subtle **compatibility mismatch between Ollama and the installed CUDA/driver stack** on Debian 12, despite the NVIDIA GPU being properly detected by nvidia-smi and the CUDA-visible environment variable being set.[^1][^2][^3][^4][^5][^6] ## Root Cause Breakdown ### CUDA \& NVIDIA Driver Incompatibility - The user has CUDA 13.0 and driver 580.65.06, both very new releases.[^7] - Multiple GitHub reports confirm that **Ollama is not fully compatible with the latest CUDA (13.x) and driver (580.x)**—users note successful GPU detection only with CUDA 12.x/driver 550.x or earlier.[^2][^3][^5] - When Ollama tries to link against `/usr/lib/x86_64-linux-gnu/libcuda.so.580.65.06`, the API may have changed, or the loader expects different symbols, leading to the “no nvidia devices detected” error—even though nvidia-smi lists the device.[^4][^6][^1][^2] ### Environment Variables \& Hardware - Setting `CUDA_VISIBLE_DEVICES` to the GPU's UUID is correct and accepted practice, but doesn’t resolve low-level library mismatches.[^6][^1] - `nvidia-modprobe -u` returning nothing suggests kernel modules are loaded correctly for nvidia-smi, but user-space software (Ollama) can’t reach the GPU through the intended library interface.[^3][^2] ## Specific Troubleshooting Steps (from issues/forums) 1. **Downgrade CUDA Toolkit and NVIDIA Driver** - Remove CUDA 13.0 and install CUDA 12.3 (or 12.2) and driver 550.x, as most Ollama success reports on Debian/Ubuntu stem from this combo.[^5][^2][^4][^6] - Confirm or fix symlinks: `/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.550.x` for driver 550.x. 2. **Reinstall Ollama** - After downgrading, reinstall Ollama to ensure correct library linkage at install time—the binary may link dynamically.[^3][^5] 3. **Check for Multiple CUDA Installs** - Ensure only one version is present and that PATH/LD_LIBRARY_PATH does not point to a mixture of 13.x and older CUDA folders.[^8][^4] 4. **Double-Check Kernel Modules** - `lsmod | grep nvidia` must output the nvidia module. Run `sudo depmod -a` if modules look off[^2][^6]. 5. **Validate Environment** - Try running simple CUDA sample apps (`deviceQuery`) outside Ollama to isolate software/library/config issues.[^8] - Remove mesa/OpenCL conflicting libraries if present.[^8] ## The Key Issue **Ollama currently does not recognize GPUs using the latest CUDA/driver stack (13.0/580.x) on Debian 12 because the API or binary interface has changed, and Ollama’s GPU detection is not yet updated to match. Downgrading driver and CUDA versions to match known-working setups is the fastest solution.**[^1][^2][^4][^5][^6][^7][^3][^8] ## Table: Actions to Address the Issue | Step | Command/Action | Reason | Source | | :-- | :-- | :-- | :-- | | Remove CUDA 13.0/driver 580.x | `apt purge cuda*`, install CUDA 12.x, 550.x | Verified working config for Ollama | [^2][^5][^6] | | Check GPU symlinks | `ls -l /usr/lib/x86_64-linux-gnu/libcuda.so*` | Match symlink/version to driver | [^8] | | Reinstall Ollama | `sudo apt remove ollama` then fresh install | Dynamic library linkage | [^3][^5] | | Test with CUDA samples | Run `/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery` | Diagnoses CUDA install issues | [^8] | This reflects the current best knowledge, as reported by peers who have solved the exact scenario on recent Debian builds.[^2][^4][^5][^6] <div style="text-align: center">⁂</div> [^1]: https://github.com/ollama/ollama/issues/12136 [^2]: https://github.com/ollama/ollama/issues/6840 [^3]: https://github.com/ollama/ollama/issues/10883 [^4]: https://www.reddit.com/r/ollama/comments/1hs4l72/ollama_not_use_nvidia_gpu_on_ubuntu_24/ [^5]: https://github.com/ollama/ollama/issues/11676 [^6]: https://www.reddit.com/r/ollama/comments/1jkfsmr/gpu_not_recognized_in_ollama_running_in_lxc_host/ [^7]: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html [^8]: https://download.nvidia.com/XFree86/Linux-x86_64/580.65.06/README/installedcomponents.html

GiteaMirror commented

2026-04-12 20:19:43 -05:00

@Mikhail42 commented on GitHub (Sep 5, 2025):

Have you tried to reboot your PC? The issue may occur after "hibernation" due to Nvidia problem on Linux.
https://github.com/ollama/ollama/issues/8426

@Mikhail42 commented on GitHub (Sep 5, 2025): Have you tried to reboot your PC? The issue may occur after "hibernation" due to Nvidia problem on Linux. https://github.com/ollama/ollama/issues/8426

GiteaMirror commented

2026-04-12 20:19:44 -05:00

@seedrick commented on GitHub (Sep 5, 2025):

Seems my CUDA config is OK:


 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 1070"
  CUDA Driver Version / Runtime Version          12.3 / 12.3
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8111 MBytes (8504934400 bytes)
  (015) Multiprocessors, (128) CUDA Cores/MP:    1920 CUDA Cores
  GPU Max Clock rate:                            1721 MHz (1.72 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.3, CUDA Runtime Version = 12.3, NumDevs = 1

Result:
msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.545.23.06"

I've tried two combos, one with CUDA 12.3 and driver 545 and CUDA 12.9 with driver 550. How would I go about getting to the combo you suggested, 12.3 and 550?

@seedrick commented on GitHub (Sep 5, 2025): Seems my CUDA config is OK: ```./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "NVIDIA GeForce GTX 1070" CUDA Driver Version / Runtime Version 12.3 / 12.3 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 8111 MBytes (8504934400 bytes) (015) Multiprocessors, (128) CUDA Cores/MP: 1920 CUDA Cores GPU Max Clock rate: 1721 MHz (1.72 GHz) Memory Clock rate: 4004 Mhz Memory Bus Width: 256-bit L2 Cache Size: 2097152 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total shared memory per multiprocessor: 98304 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.3, CUDA Runtime Version = 12.3, NumDevs = 1 ``` Result: `msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.545.23.06"` I've tried two combos, one with CUDA 12.3 and driver 545 and CUDA 12.9 with driver 550. How would I go about getting to the combo you suggested, 12.3 and 550?

GiteaMirror commented

2026-04-12 20:19:44 -05:00

@seedrick commented on GitHub (Sep 5, 2025):

Have you tried to reboot your PC? The issue may occur after "hibernation" due to Nvidia problem on Linux. #8426

Yes, multiple reboots. Just did one now as a sanity check, same story.

@seedrick commented on GitHub (Sep 5, 2025): > Have you tried to reboot your PC? The issue may occur after "hibernation" due to Nvidia problem on Linux. [#8426](https://github.com/ollama/ollama/issues/8426) Yes, multiple reboots. Just did one now as a sanity check, same story.

GiteaMirror commented

2026-04-12 20:19:46 -05:00

@cheatofrom commented on GitHub (Sep 15, 2025):

If sucessful ,please call me .my device is RTXA5000.I have the same question.cuda 12.4

@cheatofrom commented on GitHub (Sep 15, 2025): If sucessful ,please call me .my device is RTXA5000.I have the same question.cuda 12.4

GiteaMirror commented

2026-04-12 20:19:47 -05:00

@seedrick commented on GitHub (Sep 15, 2025):

If sucessful ,please call me .my device is RTXA5000.I have the same question.cuda 12.4

nope, never was able to get it @dngettler's steps didn't work for me, though I have yet to figure out how to try the exact combo mentioned.

@seedrick commented on GitHub (Sep 15, 2025): > If sucessful ,please call me .my device is RTXA5000.I have the same question.cuda 12.4 > nope, never was able to get it @dngettler's steps didn't work for me, though I have yet to figure out how to try the exact combo mentioned.

GiteaMirror commented

2026-04-12 20:19:49 -05:00

@justinclift commented on GitHub (Oct 24, 2025):

As a data point with this, I'm using a variant of Debian 13 (Proxmox 9.x), and have CUDA 13.0.2 (ie CUDA 13 update 2) working with a self compiled Ollama 0.12.6 and Nvidia driver version 580.

The easiest way to get CUDA detection to work was to ensure nvcc was in my PATH environment (ie export PATH=/usr/local/cuda/bin:$PATH) before running the cmake commands to build Ollama:

$ export PATH=/usr/local/cuda/bin:$PATH
$ cmake -B build -G Ninja
$ cmake --build build
$ go build .
$ ./ollama serve

The output from that ollama serve command shows the just compiled libraries being used:

time=2025-10-24T22:28:52.419+10:00 level=INFO source=types.go:40 msg="inference compute" id=GPU-8b5b1c9d-aa73-b1ff-4e7d-d3adf0e48012 library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3070" libdirs=ollama driver=13.0 pci_id=07:00.0 type=discrete total="8.0 GiB" available="7.5 GiB"
...
time=2025-10-24T22:37:37.597+10:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/jc/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 library=CUDA parallel=1 required="5.2 GiB" gpus=1
...
load_backend: loaded CUDA backend from /home/jc/git_repos/ollama/build/lib/ollama/libggml-cuda.so
...
llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/jc/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest))

Then in a separate terminal window to call the model:

$ ./ollama run qwen2.5-coder:7b
>>> /set verbose
Set 'verbose' mode.
>>> Hi, how are you doing?
I'm just a computer program, so I don't have feelings, but I'm here and ready to help you! How can I assist you today?

total duration:       565.326515ms
load duration:        98.815018ms
prompt eval count:    36 token(s)
prompt eval duration: 38.345324ms
prompt eval rate:     938.84 tokens/s
eval count:           32 token(s)
eval duration:        399.239468ms
eval rate:            80.15 tokens/s
>>>

Those token rates are definitely using the RTX 3070 in this system, as doing the same thing using just the CPU (a Ryzen 5950X) goes at a small fraction of the speed:

$ ./ollama run qwen2.5-coder:7b
>>> /set verbose
Set 'verbose' mode.
>>> Hi, how are you doing?
I'm just a computer program, so I don't have feelings, but I'm here and ready to help you! How can I assist you today?

total duration:       18.7745496s
load duration:        91.212082ms
prompt eval count:    36 token(s)
prompt eval duration: 9.269417687s
prompt eval rate:     3.88 tokens/s
eval count:           32 token(s)
eval duration:        9.384860511s
eval rate:            3.41 tokens/s
>>>

Anyway, hopefully that's helpful for someone. 😄

@justinclift commented on GitHub (Oct 24, 2025): As a data point with this, I'm using a variant of Debian 13 (Proxmox 9.x), and have CUDA 13.0.2 (ie CUDA 13 update 2) working with a self compiled Ollama 0.12.6 and Nvidia driver version 580. The easiest way to get CUDA detection to work was to ensure `nvcc` was in my PATH environment (ie `export PATH=/usr/local/cuda/bin:$PATH`) before running the `cmake` commands to build Ollama: ``` $ export PATH=/usr/local/cuda/bin:$PATH $ cmake -B build -G Ninja $ cmake --build build $ go build . $ ./ollama serve ``` The output from that `ollama serve` command shows the just compiled libraries being used: ``` time=2025-10-24T22:28:52.419+10:00 level=INFO source=types.go:40 msg="inference compute" id=GPU-8b5b1c9d-aa73-b1ff-4e7d-d3adf0e48012 library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3070" libdirs=ollama driver=13.0 pci_id=07:00.0 type=discrete total="8.0 GiB" available="7.5 GiB" ... time=2025-10-24T22:37:37.597+10:00 level=INFO source=memory.go:37 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/jc/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 library=CUDA parallel=1 required="5.2 GiB" gpus=1 ... load_backend: loaded CUDA backend from /home/jc/git_repos/ollama/build/lib/ollama/libggml-cuda.so ... llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/jc/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) ``` Then in a separate terminal window to call the model: ``` $ ./ollama run qwen2.5-coder:7b >>> /set verbose Set 'verbose' mode. >>> Hi, how are you doing? I'm just a computer program, so I don't have feelings, but I'm here and ready to help you! How can I assist you today? total duration: 565.326515ms load duration: 98.815018ms prompt eval count: 36 token(s) prompt eval duration: 38.345324ms prompt eval rate: 938.84 tokens/s eval count: 32 token(s) eval duration: 399.239468ms eval rate: 80.15 tokens/s >>> ``` Those token rates are definitely using the RTX 3070 in this system, as doing the same thing using just the CPU (a Ryzen 5950X) goes at a small fraction of the speed: ``` $ ./ollama run qwen2.5-coder:7b >>> /set verbose Set 'verbose' mode. >>> Hi, how are you doing? I'm just a computer program, so I don't have feelings, but I'm here and ready to help you! How can I assist you today? total duration: 18.7745496s load duration: 91.212082ms prompt eval count: 36 token(s) prompt eval duration: 9.269417687s prompt eval rate: 3.88 tokens/s eval count: 32 token(s) eval duration: 9.384860511s eval rate: 3.41 tokens/s >>> ``` Anyway, hopefully that's helpful for someone. :smile:

GiteaMirror commented

2026-04-12 20:19:51 -05:00

@dhiltgen commented on GitHub (Nov 6, 2025):

Please give version 0.12.10 a try and see if it correctly discovers your GPU. If it still falls back to CPU, please run the server with OLLAMA_DEBUG=2 set and share the startup logs up to the point it reports "inference compute"

@dhiltgen commented on GitHub (Nov 6, 2025): Please give version 0.12.10 a try and see if it correctly discovers your GPU. If it still falls back to CPU, please run the server with OLLAMA_DEBUG=2 set and share the startup logs up to the point it reports "inference compute"

GiteaMirror commented

2026-04-12 20:19:51 -05:00

@crazynds commented on GitHub (Nov 10, 2025):

CUDA 13 is not yet supported?
I'm running a clean instalation using docker and the ollama is not using the GPU.

@crazynds commented on GitHub (Nov 10, 2025): CUDA 13 is not yet supported? I'm running a clean instalation using docker and the ollama is not using the GPU.

GiteaMirror commented

2026-04-12 20:19:52 -05:00

@justinclift commented on GitHub (Nov 10, 2025):

@crazynds Just as a data point, CUDA 13 works for me with Ollama when I compile it myself. That was directly on the box with the hardware though, not through docker.

It's a wild guess here, but maybe your docker configuration isn't making the GPU available to your containers?

If that's the case, then maybe this would help?

https://docs.docker.com/engine/containers/resource_constraints/#access-an-nvidia-gpu

Expose GPUs for use

Include the --gpus flag when you start a container to access GPU resources. Specify how many GPUs to use.

@justinclift commented on GitHub (Nov 10, 2025): @crazynds Just as a data point, CUDA 13 works for me with Ollama when I compile it myself. That was directly on the box with the hardware though, not through docker. It's a wild guess here, but maybe your docker configuration isn't making the GPU available to your containers? If that's the case, then maybe this would help? https://docs.docker.com/engine/containers/resource_constraints/#access-an-nvidia-gpu > [Expose GPUs for use](https://docs.docker.com/engine/containers/resource_constraints/#expose-gpus-for-use) > > Include the --gpus flag when you start a container to access GPU resources. Specify how many GPUs to use.

GiteaMirror commented

2026-04-12 20:19:53 -05:00

@incogno commented on GitHub (Nov 11, 2025):

Same identical problem. I started a fresh install of Ollama 0.12.10 with Drivers 580 and CUDA 13. Downgraded drivers and CUDA versions through multiple versions and combinations. Decided to stop with Drivers 550 and CUDA 12.5. Still no detection of the GPUs.

I also played with multiple variations of no environmental variable for CUDA_VISIBLE_DEVICES, setting it to just 1 GPU, 2 GPUs, etc. I set the OLLAMA_LOAD_TIMEOUT to 30 minutes but it times out almost immediately.

Quick Edit: This is a bare-metal install from the .tgz archive on Ubuntu 24.04. Not docker or a snap.

Detailed journalctl -f -U ollama.service output.

Nov 11 01:03:03 gputestsystem systemd[1]: Started ollama.service - Ollama Service.
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.068Z level=INFO source=routes.go:1525 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0,1,2,3,4,5,6,7 GGML_VK_VISIBLE_DEVICES: GP       U_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA       _HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:30m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:       /usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost ht       tp://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* v       scode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=INFO source=images.go:522 msg="total blobs: 4"
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=INFO source=routes.go:1578 msg="Listening on 127.0.0.1:11434 (version 0.12.10)"
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=DEBUG source=sched.go:120 msg="starting llm scheduler"
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=TRACE source=runner.go:418 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/oll       ama/cuda_v12]" extraEnvs=map[]
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.071Z level=INFO source=server.go:400 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45517"
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.071Z level=DEBUG source=server.go:401 msg=subprocess CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OLLAMA_LOAD_TIMEOUT=30m OLLAMA_DEBUG=2 PA       TH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollam       a/cuda_v12
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.088Z level=INFO source=runner.go:1349 msg="starting ollama engine"
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.088Z level=INFO source=runner.go:1384 msg="Server listening on 127.0.0.1:45517"
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=gguf.go:590 msg=general.architecture type=string
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=       3
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
Nov 11 01:03:03 gputestsystem prime-run[120521]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so
Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.099Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12
Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=INFO source=runner.go:442 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[] error="failed to finish discovery before timeout"
Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=TRACE source=runner.go:445 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" devices=[]
Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=DEBUG source=runner.go:415 msg="bootstrap discovery took" duration=30.03031462s OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[]
Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=DEBUG source=runner.go:113 msg="evluating which if any devices to filter out" initial_count=0
Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=TRACE source=runner.go:153 msg="supported GPU library combinations before filtering" supported=map[]
Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=30.030543676s
Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.100Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="4031.4 GiB" available="4007.7 GiB"
Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.100Z level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

I have a few GPUs installed.

Tue Nov 11 00:53:02 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.163.01             Driver Version: 550.163.01     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          Off |   00000000:19:00.0 Off |                    0 |
| N/A   24C    P0             69W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          Off |   00000000:3B:00.0 Off |                    0 |
| N/A   22C    P0             68W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          Off |   00000000:4C:00.0 Off |                    0 |
| N/A   19C    P0             69W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          Off |   00000000:5D:00.0 Off |                    0 |
| N/A   22C    P0             69W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          Off |   00000000:9B:00.0 Off |                    0 |
| N/A   23C    P0             70W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          Off |   00000000:BB:00.0 Off |                    0 |
| N/A   21C    P0             69W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          Off |   00000000:CB:00.0 Off |                    0 |
| N/A   22C    P0             68W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          Off |   00000000:DB:00.0 Off |                    0 |
| N/A   20C    P0             68W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

Last edit for CPU RAM.

               total        used        free      shared  buff/cache   available
Mem:           3.9Ti        23Gi       3.9Ti       8.5Mi        22Gi       3.9Ti
Swap:             0B          0B          0B

Let me know if there's anything else I can provide.

@incogno commented on GitHub (Nov 11, 2025): Same identical problem. I started a fresh install of Ollama 0.12.10 with Drivers 580 and CUDA 13. Downgraded drivers and CUDA versions through multiple versions and combinations. Decided to stop with Drivers 550 and CUDA 12.5. Still no detection of the GPUs. I also played with multiple variations of no environmental variable for CUDA_VISIBLE_DEVICES, setting it to just 1 GPU, 2 GPUs, etc. I set the OLLAMA_LOAD_TIMEOUT to 30 minutes but it times out almost immediately. Quick Edit: This is a bare-metal install from the .tgz archive on Ubuntu 24.04. Not docker or a snap. Detailed journalctl -f -U ollama.service output. ``` Nov 11 01:03:03 gputestsystem systemd[1]: Started ollama.service - Ollama Service. Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.068Z level=INFO source=routes.go:1525 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0,1,2,3,4,5,6,7 GGML_VK_VISIBLE_DEVICES: GP U_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA _HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:30m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS: /usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost ht tp://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* v scode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=INFO source=images.go:522 msg="total blobs: 4" Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=INFO source=routes.go:1578 msg="Listening on 127.0.0.1:11434 (version 0.12.10)" Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=DEBUG source=sched.go:120 msg="starting llm scheduler" Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=INFO source=runner.go:67 msg="discovering available GPUs..." Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.069Z level=TRACE source=runner.go:418 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/oll ama/cuda_v12]" extraEnvs=map[] Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.071Z level=INFO source=server.go:400 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45517" Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.071Z level=DEBUG source=server.go:401 msg=subprocess CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OLLAMA_LOAD_TIMEOUT=30m OLLAMA_DEBUG=2 PA TH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollam a/cuda_v12 Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.088Z level=INFO source=runner.go:1349 msg="starting ollama engine" Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.088Z level=INFO source=runner.go:1384 msg="Server listening on 127.0.0.1:45517" Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=gguf.go:590 msg=general.architecture type=string Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values= 3 Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.093Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama Nov 11 01:03:03 gputestsystem prime-run[120521]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so Nov 11 01:03:03 gputestsystem prime-run[120521]: time=2025-11-11T01:03:03.099Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12 Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=INFO source=runner.go:442 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[] error="failed to finish discovery before timeout" Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=TRACE source=runner.go:445 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" devices=[] Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=DEBUG source=runner.go:415 msg="bootstrap discovery took" duration=30.03031462s OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[] Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=DEBUG source=runner.go:113 msg="evluating which if any devices to filter out" initial_count=0 Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=TRACE source=runner.go:153 msg="supported GPU library combinations before filtering" supported=map[] Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.099Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=30.030543676s Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.100Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="4031.4 GiB" available="4007.7 GiB" Nov 11 01:03:33 gputestsystem prime-run[120521]: time=2025-11-11T01:03:33.100Z level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ``` I have a few GPUs installed. ``` Tue Nov 11 00:53:02 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.163.01 Driver Version: 550.163.01 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA H100 80GB HBM3 Off | 00000000:19:00.0 Off | 0 | | N/A 24C P0 69W / 700W | 1MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA H100 80GB HBM3 Off | 00000000:3B:00.0 Off | 0 | | N/A 22C P0 68W / 700W | 1MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA H100 80GB HBM3 Off | 00000000:4C:00.0 Off | 0 | | N/A 19C P0 69W / 700W | 1MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA H100 80GB HBM3 Off | 00000000:5D:00.0 Off | 0 | | N/A 22C P0 69W / 700W | 1MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA H100 80GB HBM3 Off | 00000000:9B:00.0 Off | 0 | | N/A 23C P0 70W / 700W | 1MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 5 NVIDIA H100 80GB HBM3 Off | 00000000:BB:00.0 Off | 0 | | N/A 21C P0 69W / 700W | 1MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 6 NVIDIA H100 80GB HBM3 Off | 00000000:CB:00.0 Off | 0 | | N/A 22C P0 68W / 700W | 1MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 7 NVIDIA H100 80GB HBM3 Off | 00000000:DB:00.0 Off | 0 | | N/A 20C P0 68W / 700W | 1MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ``` ``` nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Fri_Jan__6_16:45:21_PST_2023 Cuda compilation tools, release 12.0, V12.0.140 Build cuda_12.0.r12.0/compiler.32267302_0 ``` Last edit for CPU RAM. ``` total used free shared buff/cache available Mem: 3.9Ti 23Gi 3.9Ti 8.5Mi 22Gi 3.9Ti Swap: 0B 0B 0B ``` Let me know if there's anything else I can provide.

GiteaMirror commented

2026-04-12 20:19:54 -05:00

@justinclift commented on GitHub (Nov 11, 2025):

This is a bare-metal install from the .tgz archive on Ubuntu 24.04. Not docker or a snap.

That's good info.

Personally, I'd clone the Ollama git repo and try compiling it. You seem pretty ok with using a shell, so compiling Ollama is fairly straight forward and (for me) worked. 😄

You'll need Go and the general compile chain dependencies installed first:

$ git clone https://github.com/ollama/ollama
$ cd ollama
$ export PATH=/usr/local/cuda/bin:$PATH
$ cmake -B build
$ cmake --build build
$ go build .
$ ./ollama serve

More detailed steps here if that helps: https://github.com/ollama/ollama/blob/main/docs/development.md#linux

@justinclift commented on GitHub (Nov 11, 2025): > This is a bare-metal install from the .tgz archive on Ubuntu 24.04. Not docker or a snap. That's good info. *Personally*, I'd clone the Ollama git repo and try compiling it. You seem pretty ok with using a shell, so compiling Ollama is fairly straight forward and (for me) worked. :smile: You'll need Go and the general compile chain dependencies installed first: ``` $ git clone https://github.com/ollama/ollama $ cd ollama $ export PATH=/usr/local/cuda/bin:$PATH $ cmake -B build $ cmake --build build $ go build . $ ./ollama serve ``` More detailed steps here if that helps: https://github.com/ollama/ollama/blob/main/docs/development.md#linux

GiteaMirror commented

2026-04-12 20:19:55 -05:00

@incogno commented on GitHub (Nov 11, 2025):

Thanks @justinclift I appreciate the suggestion. I did compile but am having the same results with the final build. Here's what I've seen below. I brought the drivers up to 575 with CUDA 12.9 before the build. This was the last driver/toolkit combo just prior to the 580/13.0 release. Hopefully this info doesn't muddy the waters and create irrelevant noise to the conversation. It just looked potentially relevant to me. (If I need to start a new thread somewhere just let me know.)

Besides go and the compiler chain, I also needed glslc, glslang-tools, and glslang-dev. Here's what I did to compile it.

git clone https://github.com/ollama/ollama
cd ollama
export PATH=/usr/local/cuda/bin:$PATH
cmake -B build
cmake --build build
go build .
./ollama serve

And here's the various warnings I saw during the build.

This warning is presented for the building of each CUDA object located in ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir. So for all of these object files in ggml-cuda.dir. Not sure if this is a sign of a problem.

nvcc warning : Cannot find valid GPU for '-arch=native', default arch is used

The valid arch types according to my nvcc are the following. From a Google search it looks like -arch=compute_90 and -code=sm_90 would be correct for the H100s.

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

nvcc --list-gpu-arch
compute_50
compute_52
compute_53
compute_60
compute_61
compute_62
compute_70
compute_72
compute_75
compute_80
compute_86
compute_87
compute_89
compute_90

Another warning that I'm not sure is relevant.

/root/source/ollama/ml/backend/ggml/ggml/src/ggml-cuda/rope.cu(151): warning #177-D: variable "sec_w" was declared but never referenced
      const int sec_w = sections.v[1] + sections.v[0];
                ^
          detected during:
            instantiation of "void rope_multi_cuda<forward,T>(const T *, T *, int, int, int, int, int, int, int, const int32_t *, float, float, float, float, rope_corr_dims, const float *, mrope_sections, cudaStream_t) [with forward=true, T=float]" at line 403
            instantiation of "void ggml_cuda_op_rope_impl<forward>(ggml_backend_cuda_context &, ggml_tensor *) [with forward=true]" at line 439

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

When running the built Ollama in debug mode I get the same output as before. I tried without CUDA_VISIBLE_DEVICES and also setting it to other values or just one value. Also tried without setting anything and just running ./ollama serve.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OLLAMA_LOAD_TIMEOUT=30m ./ollama serve

time=2025-11-11T16:05:46.936Z level=INFO source=routes.go:1578 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2025-11-11T16:05:46.936Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-11T16:05:46.937Z level=INFO source=server.go:400 msg="starting runner" cmd="/root/source/ollama/ollama runner --ollama-engine --port 46723"
time=2025-11-11T16:06:16.937Z level=INFO source=runner.go:442 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] extra_envs=map[] error="failed to finish discovery before timeout"
time=2025-11-11T16:06:16.937Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="4031.4 GiB" available="4005.2 GiB"
time=2025-11-11T16:06:16.937Z level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

And for good measure here's my nvidia-smi output.

nvidia-smi
Tue Nov 11 16:12:37 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08              Driver Version: 575.57.08      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          On  |   00000000:19:00.0 Off |                    0 |
| N/A   25C    P0             69W /  700W |       0MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          On  |   00000000:3B:00.0 Off |                    0 |
| N/A   22C    P0             68W /  700W |       0MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          On  |   00000000:4C:00.0 Off |                    0 |
| N/A   19C    P0             69W /  700W |       0MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          On  |   00000000:5D:00.0 Off |                    0 |
| N/A   23C    P0             69W /  700W |       0MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          On  |   00000000:9B:00.0 Off |                    0 |
| N/A   23C    P0             69W /  700W |       0MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          On  |   00000000:BB:00.0 Off |                    0 |
| N/A   21C    P0             68W /  700W |       0MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          On  |   00000000:CB:00.0 Off |                    0 |
| N/A   22C    P0             68W /  700W |       0MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          On  |   00000000:DB:00.0 Off |                    0 |
| N/A   21C    P0             68W /  700W |       0MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

@incogno commented on GitHub (Nov 11, 2025): Thanks @justinclift I appreciate the suggestion. I did compile but am having the same results with the final build. Here's what I've seen below. I brought the drivers up to 575 with CUDA 12.9 before the build. This was the last driver/toolkit combo just prior to the 580/13.0 release. Hopefully this info doesn't muddy the waters and create irrelevant noise to the conversation. It just looked potentially relevant to me. (If I need to start a new thread somewhere just let me know.) Besides go and the compiler chain, I also needed glslc, glslang-tools, and glslang-dev. Here's what I did to compile it. ``` git clone https://github.com/ollama/ollama cd ollama export PATH=/usr/local/cuda/bin:$PATH cmake -B build cmake --build build go build . ./ollama serve ``` And here's the various warnings I saw during the build. This warning is presented for the building of each CUDA object located in ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir. So for all of these object files in ggml-cuda.dir. Not sure if this is a sign of a problem. ``` nvcc warning : Cannot find valid GPU for '-arch=native', default arch is used ``` The valid arch types according to my nvcc are the following. From a Google search it looks like -arch=compute_90 and -code=sm_90 would be correct for the H100s. ``` nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Fri_Jan__6_16:45:21_PST_2023 Cuda compilation tools, release 12.0, V12.0.140 Build cuda_12.0.r12.0/compiler.32267302_0 nvcc --list-gpu-arch compute_50 compute_52 compute_53 compute_60 compute_61 compute_62 compute_70 compute_72 compute_75 compute_80 compute_86 compute_87 compute_89 compute_90 ``` Another warning that I'm not sure is relevant. ``` /root/source/ollama/ml/backend/ggml/ggml/src/ggml-cuda/rope.cu(151): warning #177-D: variable "sec_w" was declared but never referenced const int sec_w = sections.v[1] + sections.v[0]; ^ detected during: instantiation of "void rope_multi_cuda<forward,T>(const T *, T *, int, int, int, int, int, int, int, const int32_t *, float, float, float, float, rope_corr_dims, const float *, mrope_sections, cudaStream_t) [with forward=true, T=float]" at line 403 instantiation of "void ggml_cuda_op_rope_impl<forward>(ggml_backend_cuda_context &, ggml_tensor *) [with forward=true]" at line 439 Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" ``` When running the built Ollama in debug mode I get the same output as before. I tried without CUDA_VISIBLE_DEVICES and also setting it to other values or just one value. Also tried without setting anything and just running ./ollama serve. ``` CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OLLAMA_LOAD_TIMEOUT=30m ./ollama serve time=2025-11-11T16:05:46.936Z level=INFO source=routes.go:1578 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" time=2025-11-11T16:05:46.936Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-11T16:05:46.937Z level=INFO source=server.go:400 msg="starting runner" cmd="/root/source/ollama/ollama runner --ollama-engine --port 46723" time=2025-11-11T16:06:16.937Z level=INFO source=runner.go:442 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] extra_envs=map[] error="failed to finish discovery before timeout" time=2025-11-11T16:06:16.937Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="4031.4 GiB" available="4005.2 GiB" time=2025-11-11T16:06:16.937Z level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ``` And for good measure here's my nvidia-smi output. ``` nvidia-smi Tue Nov 11 16:12:37 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA H100 80GB HBM3 On | 00000000:19:00.0 Off | 0 | | N/A 25C P0 69W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA H100 80GB HBM3 On | 00000000:3B:00.0 Off | 0 | | N/A 22C P0 68W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA H100 80GB HBM3 On | 00000000:4C:00.0 Off | 0 | | N/A 19C P0 69W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA H100 80GB HBM3 On | 00000000:5D:00.0 Off | 0 | | N/A 23C P0 69W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA H100 80GB HBM3 On | 00000000:9B:00.0 Off | 0 | | N/A 23C P0 69W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 5 NVIDIA H100 80GB HBM3 On | 00000000:BB:00.0 Off | 0 | | N/A 21C P0 68W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 6 NVIDIA H100 80GB HBM3 On | 00000000:CB:00.0 Off | 0 | | N/A 22C P0 68W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 7 NVIDIA H100 80GB HBM3 On | 00000000:DB:00.0 Off | 0 | | N/A 21C P0 68W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ```

GiteaMirror commented

2026-04-12 20:19:57 -05:00

@dhiltgen commented on GitHub (Nov 11, 2025):

@incogno please run the server with OLLAMA_DEBUG=2 set and share the startup logs up to the point it reports "inference compute" so we can see why it's hitting a timeout trying to discover the GPU and falling back to CPU.

@dhiltgen commented on GitHub (Nov 11, 2025): @incogno please run the server with OLLAMA_DEBUG=2 set and share the startup logs up to the point it reports "inference compute" so we can see why it's hitting a timeout trying to discover the GPU and falling back to CPU.

GiteaMirror commented

2026-04-12 20:19:57 -05:00

@incogno commented on GitHub (Nov 11, 2025):

@incogno please run the server with OLLAMA_DEBUG=2 set and share the startup logs up to the point it reports "inference compute" so we can see why it's hitting a timeout trying to discover the GPU and falling back to CPU.

Thanks @dhiltgen for the comment. Here's the output from doing so.

OLLAMA_DEBUG=2 ./ollama serve
time=2025-11-11T18:22:07.683Z level=INFO source=routes.go:1525 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-11T18:22:07.683Z level=INFO source=images.go:522 msg="total blobs: 0"
time=2025-11-11T18:22:07.683Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/me                   --> github.com/ollama/ollama/server.(*Server).WhoamiHandler-fm (5 handlers)
[GIN-debug] POST   /api/signout              --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers)
[GIN-debug] DELETE /api/user/keys/:encodedKey --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-11-11T18:22:07.684Z level=INFO source=routes.go:1578 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2025-11-11T18:22:07.684Z level=DEBUG source=sched.go:120 msg="starting llm scheduler"
time=2025-11-11T18:22:07.684Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-11T18:22:07.684Z level=TRACE source=runner.go:418 msg="starting runner for device discovery" libDirs=[/root/source/ollama/build/lib/ollama] extraEnvs=map[]
time=2025-11-11T18:22:07.686Z level=INFO source=server.go:400 msg="starting runner" cmd="/root/source/ollama/ollama runner --ollama-engine --port 40325"
time=2025-11-11T18:22:07.686Z level=DEBUG source=server.go:401 msg=subprocess OLLAMA_DEBUG=2 GGML_CCACHE=OFF PATH=/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/root/source/ollama/build/lib/ollama OLLAMA_LIBRARY_PATH=/root/source/ollama/build/lib/ollama
time=2025-11-11T18:22:07.700Z level=INFO source=runner.go:1349 msg="starting ollama engine"
time=2025-11-11T18:22:07.701Z level=INFO source=runner.go:1384 msg="Server listening on 127.0.0.1:40325"
time=2025-11-11T18:22:07.707Z level=DEBUG source=gguf.go:590 msg=general.architecture type=string
time=2025-11-11T18:22:07.707Z level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string
time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-11-11T18:22:07.707Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/root/source/ollama/build/lib/ollama
time=2025-11-11T18:22:37.714Z level=INFO source=runner.go:442 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] extra_envs=map[] error="failed to finish discovery before timeout"
time=2025-11-11T18:22:37.714Z level=TRACE source=runner.go:445 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] devices=[]
time=2025-11-11T18:22:37.714Z level=DEBUG source=runner.go:415 msg="bootstrap discovery took" duration=30.029907605s OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] extra_envs=map[]
time=2025-11-11T18:22:37.714Z level=DEBUG source=runner.go:113 msg="evluating which if any devices to filter out" initial_count=0
time=2025-11-11T18:22:37.714Z level=TRACE source=runner.go:153 msg="supported GPU library combinations before filtering" supported=map[]
time=2025-11-11T18:22:37.714Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=30.030129754s
time=2025-11-11T18:22:37.714Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="4031.4 GiB" available="4005.1 GiB"
time=2025-11-11T18:22:37.714Z level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

@incogno commented on GitHub (Nov 11, 2025): > [@incogno](https://github.com/incogno) please run the server with OLLAMA_DEBUG=2 set and share the startup logs up to the point it reports "inference compute" so we can see why it's hitting a timeout trying to discover the GPU and falling back to CPU. Thanks @dhiltgen for the comment. Here's the output from doing so. ``` OLLAMA_DEBUG=2 ./ollama serve time=2025-11-11T18:22:07.683Z level=INFO source=routes.go:1525 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-11T18:22:07.683Z level=INFO source=images.go:522 msg="total blobs: 0" time=2025-11-11T18:22:07.683Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers) [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) [GIN-debug] POST /api/me --> github.com/ollama/ollama/server.(*Server).WhoamiHandler-fm (5 handlers) [GIN-debug] POST /api/signout --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers) [GIN-debug] DELETE /api/user/keys/:encodedKey --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers) [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) time=2025-11-11T18:22:07.684Z level=INFO source=routes.go:1578 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" time=2025-11-11T18:22:07.684Z level=DEBUG source=sched.go:120 msg="starting llm scheduler" time=2025-11-11T18:22:07.684Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-11T18:22:07.684Z level=TRACE source=runner.go:418 msg="starting runner for device discovery" libDirs=[/root/source/ollama/build/lib/ollama] extraEnvs=map[] time=2025-11-11T18:22:07.686Z level=INFO source=server.go:400 msg="starting runner" cmd="/root/source/ollama/ollama runner --ollama-engine --port 40325" time=2025-11-11T18:22:07.686Z level=DEBUG source=server.go:401 msg=subprocess OLLAMA_DEBUG=2 GGML_CCACHE=OFF PATH=/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/root/source/ollama/build/lib/ollama OLLAMA_LIBRARY_PATH=/root/source/ollama/build/lib/ollama time=2025-11-11T18:22:07.700Z level=INFO source=runner.go:1349 msg="starting ollama engine" time=2025-11-11T18:22:07.701Z level=INFO source=runner.go:1384 msg="Server listening on 127.0.0.1:40325" time=2025-11-11T18:22:07.707Z level=DEBUG source=gguf.go:590 msg=general.architecture type=string time=2025-11-11T18:22:07.707Z level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-11-11T18:22:07.707Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-11-11T18:22:07.707Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/root/source/ollama/build/lib/ollama time=2025-11-11T18:22:37.714Z level=INFO source=runner.go:442 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] extra_envs=map[] error="failed to finish discovery before timeout" time=2025-11-11T18:22:37.714Z level=TRACE source=runner.go:445 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] devices=[] time=2025-11-11T18:22:37.714Z level=DEBUG source=runner.go:415 msg="bootstrap discovery took" duration=30.029907605s OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] extra_envs=map[] time=2025-11-11T18:22:37.714Z level=DEBUG source=runner.go:113 msg="evluating which if any devices to filter out" initial_count=0 time=2025-11-11T18:22:37.714Z level=TRACE source=runner.go:153 msg="supported GPU library combinations before filtering" supported=map[] time=2025-11-11T18:22:37.714Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=30.030129754s time=2025-11-11T18:22:37.714Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="4031.4 GiB" available="4005.1 GiB" time=2025-11-11T18:22:37.714Z level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ```

GiteaMirror commented

2026-04-12 20:19:59 -05:00

@dhiltgen commented on GitHub (Nov 11, 2025):

For some reason, loading the libraries is taking more than 30s and giving up. Since you're building from source, it might be interesting to increase the timeout to something much larger here and see if that at least gets us to a better error message.

@dhiltgen commented on GitHub (Nov 11, 2025): For some reason, loading the libraries is taking more than 30s and giving up. Since you're building from source, it might be interesting to increase the timeout to something much larger [here](https://github.com/ollama/ollama/blob/main/discover/runner.go#L81) and see if that at least gets us to a better error message.

GiteaMirror commented

2026-04-12 20:19:59 -05:00

@incogno commented on GitHub (Nov 11, 2025):

For some reason, loading the libraries is taking more than 30s and giving up. Since you're building from source, it might be interesting to increase the timeout to something much larger here and see if that at least gets us to a better error message.

Thanks. I set it to a timeout of 1 hour.

./discover/runner.go:                   bootstrapTimeout := 3600 * time.Second

I didn't notice these build errors previously. Both of these are repeated dozens of times. Unsure if there were others besides these.

/bin/glslc -fshader-stage=compute --target-env=vulkan1.3 -O /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /root/source/ollama/build/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders.spv/matmul_id_subgroup_q6_k_f16_aligned_f16acc_cm2.spv -DACC_TYPE=float16_t -DACC_TYPE_MAX="float16_t(65504.0)" -DACC_TYPE_VEC2=f16vec2 -DALIGNED=1 -DB_TYPE=float16_t -DDATA_A_Q6_K=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DFLOAT_TYPE_VEC2=f16vec2 -DFLOAT_TYPE_VEC4=f16vec4 -DFLOAT_TYPE_VEC8=f16mat2x4 -DLOAD_VEC_A=1 -DLOAD_VEC_B=1 -DMUL_MAT_ID=1 -DMUL_MAT_ID_USE_SUBGROUPS=1

/root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
/root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:263: error: 'tensorLayoutNV' : undeclared identifier
/root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:263: error: 'tensorLayoutA' : undeclared identifier
/root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:263: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
/root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:263: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_id_subgroup_q5_k_f16_aligned_f16acc_cm2


/bin/glslc -fshader-stage=compute --target-env=vulkan1.2 -O /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq.comp -o /root/source/ollama/build/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders.spv/matmul_q4_1_q8_1_f16acc.spv -DACC_TYPE=float16_t -DACC_TYPE_MAX="float16_t(65504.0)" -DACC_TYPE_VEC2=f16vec2 -DDATA_A_Q4_1=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DFLOAT_TYPE_VEC2=f16vec2 -DFLOAT_TYPE_VEC4=f16vec4 -DFLOAT_TYPE_VEC8=f16mat2x4

/root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq.comp:7: error: '#extension' : extension not supported: GL_EXT_integer_dot_product
/root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq.comp:356: error: 'dotPacked4x8EXT' : no matching overloaded function found
/root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq.comp:355: error: 'assign' :  cannot convert from ' const float' to ' temp highp int'
3 errors generated.

cannot compile matmul_q5_0_q8_1_f16acc

It did finish building though. Ran it as desired. The timeout value helped.

OLLAMA_DEBUG=2 ./ollama serve
time=2025-11-11T22:10:13.051Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-11T22:10:13.052Z level=INFO source=images.go:522 msg="total blobs: 0"
time=2025-11-11T22:10:13.052Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/me                   --> github.com/ollama/ollama/server.(*Server).WhoamiHandler-fm (5 handlers)
[GIN-debug] POST   /api/signout              --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers)
[GIN-debug] DELETE /api/user/keys/:encodedKey --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-11-11T22:10:13.052Z level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2025-11-11T22:10:13.052Z level=DEBUG source=sched.go:120 msg="starting llm scheduler"
time=2025-11-11T22:10:13.052Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-11T22:10:13.052Z level=TRACE source=runner.go:418 msg="starting runner for device discovery" libDirs=[/root/source/ollama/build/lib/ollama] extraEnvs=map[]
time=2025-11-11T22:10:13.053Z level=INFO source=server.go:400 msg="starting runner" cmd="/root/source/ollama/ollama runner --ollama-engine --port 42681"
time=2025-11-11T22:10:13.053Z level=DEBUG source=server.go:401 msg=subprocess OLLAMA_DEBUG=2 PATH=/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/root/source/ollama/build/lib/ollama OLLAMA_LIBRARY_PATH=/root/source/ollama/build/lib/ollama
time=2025-11-11T22:10:13.067Z level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-11-11T22:10:13.067Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:42681"
time=2025-11-11T22:10:13.074Z level=DEBUG source=gguf.go:590 msg=general.architecture type=string
time=2025-11-11T22:10:13.074Z level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string
time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-11-11T22:10:13.074Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/root/source/ollama/build/lib/ollama
ggml_cuda_init: failed to initialize CUDA: system not yet initialized
load_backend: loaded CUDA backend from /root/source/ollama/build/lib/ollama/libggml-cuda.so
load_backend: loaded CPU backend from /root/source/ollama/build/lib/ollama/libggml-cpu-icelake.so
time=2025-11-11T22:10:43.327Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-11-11T22:10:43.328Z level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=30.254099612s
time=2025-11-11T22:10:43.328Z level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=497ns
time=2025-11-11T22:10:43.338Z level=TRACE source=runner.go:445 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] devices=[]
time=2025-11-11T22:10:43.338Z level=DEBUG source=runner.go:415 msg="bootstrap discovery took" duration=30.28573514s OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] extra_envs=map[]
time=2025-11-11T22:10:43.338Z level=DEBUG source=runner.go:113 msg="evluating which if any devices to filter out" initial_count=0
time=2025-11-11T22:10:43.338Z level=TRACE source=runner.go:153 msg="supported GPU library combinations before filtering" supported=map[]
time=2025-11-11T22:10:43.338Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=30.286055923s
time=2025-11-11T22:10:43.338Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="4031.4 GiB" available="4006.9 GiB"
time=2025-11-11T22:10:43.338Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

@incogno commented on GitHub (Nov 11, 2025): > For some reason, loading the libraries is taking more than 30s and giving up. Since you're building from source, it might be interesting to increase the timeout to something much larger [here](https://github.com/ollama/ollama/blob/main/discover/runner.go#L81) and see if that at least gets us to a better error message. Thanks. I set it to a timeout of 1 hour. ``` ./discover/runner.go: bootstrapTimeout := 3600 * time.Second ``` I didn't notice these build errors previously. Both of these are repeated dozens of times. Unsure if there were others besides these. ``` /bin/glslc -fshader-stage=compute --target-env=vulkan1.3 -O /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /root/source/ollama/build/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders.spv/matmul_id_subgroup_q6_k_f16_aligned_f16acc_cm2.spv -DACC_TYPE=float16_t -DACC_TYPE_MAX="float16_t(65504.0)" -DACC_TYPE_VEC2=f16vec2 -DALIGNED=1 -DB_TYPE=float16_t -DDATA_A_Q6_K=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DFLOAT_TYPE_VEC2=f16vec2 -DFLOAT_TYPE_VEC4=f16vec4 -DFLOAT_TYPE_VEC8=f16mat2x4 -DLOAD_VEC_A=1 -DLOAD_VEC_B=1 -DMUL_MAT_ID=1 -DMUL_MAT_ID_USE_SUBGROUPS=1 /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2 /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:263: error: 'tensorLayoutNV' : undeclared identifier /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:263: error: 'tensorLayoutA' : undeclared identifier /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:263: error: '>' : wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion) /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:263: error: '' : syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON 1 warning and 4 errors generated. cannot compile matmul_id_subgroup_q5_k_f16_aligned_f16acc_cm2 /bin/glslc -fshader-stage=compute --target-env=vulkan1.2 -O /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq.comp -o /root/source/ollama/build/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders.spv/matmul_q4_1_q8_1_f16acc.spv -DACC_TYPE=float16_t -DACC_TYPE_MAX="float16_t(65504.0)" -DACC_TYPE_VEC2=f16vec2 -DDATA_A_Q4_1=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DFLOAT_TYPE_VEC2=f16vec2 -DFLOAT_TYPE_VEC4=f16vec4 -DFLOAT_TYPE_VEC8=f16mat2x4 /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq.comp:7: error: '#extension' : extension not supported: GL_EXT_integer_dot_product /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq.comp:356: error: 'dotPacked4x8EXT' : no matching overloaded function found /root/source/ollama/ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq.comp:355: error: 'assign' : cannot convert from ' const float' to ' temp highp int' 3 errors generated. cannot compile matmul_q5_0_q8_1_f16acc ``` It did finish building though. Ran it as desired. The timeout value helped. ``` OLLAMA_DEBUG=2 ./ollama serve time=2025-11-11T22:10:13.051Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-11T22:10:13.052Z level=INFO source=images.go:522 msg="total blobs: 0" time=2025-11-11T22:10:13.052Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers) [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) [GIN-debug] POST /api/me --> github.com/ollama/ollama/server.(*Server).WhoamiHandler-fm (5 handlers) [GIN-debug] POST /api/signout --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers) [GIN-debug] DELETE /api/user/keys/:encodedKey --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers) [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) time=2025-11-11T22:10:13.052Z level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" time=2025-11-11T22:10:13.052Z level=DEBUG source=sched.go:120 msg="starting llm scheduler" time=2025-11-11T22:10:13.052Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-11T22:10:13.052Z level=TRACE source=runner.go:418 msg="starting runner for device discovery" libDirs=[/root/source/ollama/build/lib/ollama] extraEnvs=map[] time=2025-11-11T22:10:13.053Z level=INFO source=server.go:400 msg="starting runner" cmd="/root/source/ollama/ollama runner --ollama-engine --port 42681" time=2025-11-11T22:10:13.053Z level=DEBUG source=server.go:401 msg=subprocess OLLAMA_DEBUG=2 PATH=/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/root/source/ollama/build/lib/ollama OLLAMA_LIBRARY_PATH=/root/source/ollama/build/lib/ollama time=2025-11-11T22:10:13.067Z level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-11-11T22:10:13.067Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:42681" time=2025-11-11T22:10:13.074Z level=DEBUG source=gguf.go:590 msg=general.architecture type=string time=2025-11-11T22:10:13.074Z level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-11-11T22:10:13.074Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-11-11T22:10:13.074Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/root/source/ollama/build/lib/ollama ggml_cuda_init: failed to initialize CUDA: system not yet initialized load_backend: loaded CUDA backend from /root/source/ollama/build/lib/ollama/libggml-cuda.so load_backend: loaded CPU backend from /root/source/ollama/build/lib/ollama/libggml-cpu-icelake.so time=2025-11-11T22:10:43.327Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-11-11T22:10:43.328Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-11-11T22:10:43.328Z level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=30.254099612s time=2025-11-11T22:10:43.328Z level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=497ns time=2025-11-11T22:10:43.338Z level=TRACE source=runner.go:445 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] devices=[] time=2025-11-11T22:10:43.338Z level=DEBUG source=runner.go:415 msg="bootstrap discovery took" duration=30.28573514s OLLAMA_LIBRARY_PATH=[/root/source/ollama/build/lib/ollama] extra_envs=map[] time=2025-11-11T22:10:43.338Z level=DEBUG source=runner.go:113 msg="evluating which if any devices to filter out" initial_count=0 time=2025-11-11T22:10:43.338Z level=TRACE source=runner.go:153 msg="supported GPU library combinations before filtering" supported=map[] time=2025-11-11T22:10:43.338Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=30.286055923s time=2025-11-11T22:10:43.338Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="4031.4 GiB" available="4006.9 GiB" time=2025-11-11T22:10:43.338Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ```

GiteaMirror commented

2026-04-12 20:20:00 -05:00

@dhiltgen commented on GitHub (Nov 12, 2025):

ggml_cuda_init: failed to initialize CUDA: system not yet initialized

Searching online, I see some mentions that this can happen if you don't have NVLink/NVSwitch fabric manager set up correctly. Do other CUDA apps work correctly on your GPUs?

@dhiltgen commented on GitHub (Nov 12, 2025): > ggml_cuda_init: failed to initialize CUDA: system not yet initialized Searching online, I see some mentions that this can happen if you don't have NVLink/NVSwitch fabric manager set up correctly. Do other CUDA apps work correctly on your GPUs?

GiteaMirror commented

2026-04-12 20:20:01 -05:00

@incogno commented on GitHub (Nov 12, 2025):

ggml_cuda_init: failed to initialize CUDA: system not yet initialized

Searching online, I see some mentions that this can happen if you don't have NVLink/NVSwitch fabric manager set up correctly. Do other CUDA apps work correctly on your GPUs?

Bingo. Thank you for suggesting that. I frankly hadn't thought to investigate the fabric manager. Turns out that's what my issue was and why the <580 drivers and <13.0 CUDA toolkit didn't seem to be working. I managed to mess it up with all of the troubleshooting and driver/toolkit switching that I was doing and neglected the fabric manager version. I purged everything associated with it and reinstalled the right version with a reboot. Voila. Check out this beautiful output.

time=2025-11-12T03:10:56.706Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-12T03:10:56.709Z level=INFO source=server.go:400 msg="starting runner" cmd="/root/source/ollama/ollama runner --ollama-engine --port 40335"
time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-cdf3c132-675f-077e-3bf7-99d79213d062 filter_id="" library=CUDA compute=9.0 name=CUDA0 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:19:00.0 type=discrete total="79.6 GiB" available="79.2 GiB"
time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-8507fda8-a792-4579-9f79-af49229e2e85 filter_id="" library=CUDA compute=9.0 name=CUDA1 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:3b:00.0 type=discrete total="79.6 GiB" available="78.7 GiB"
time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-d0743f0a-2abb-1d5c-4780-71936c03d52f filter_id="" library=CUDA compute=9.0 name=CUDA2 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:4c:00.0 type=discrete total="79.6 GiB" available="78.7 GiB"
time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-4546002c-99d1-db87-9e57-f2f7aeb4f6fe filter_id="" library=CUDA compute=9.0 name=CUDA3 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:5d:00.0 type=discrete total="79.6 GiB" available="78.7 GiB"
time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-aff31e0e-b6b3-93cf-690f-a911d78293a8 filter_id="" library=CUDA compute=9.0 name=CUDA4 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:9b:00.0 type=discrete total="79.6 GiB" available="78.7 GiB"
time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-3d386698-ff50-ba69-d9e9-2f8a982801c4 filter_id="" library=CUDA compute=9.0 name=CUDA5 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:bb:00.0 type=discrete total="79.6 GiB" available="78.7 GiB"
time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-c41daaf4-eb48-3327-dc2f-679e0aecc80c filter_id="" library=CUDA compute=9.0 name=CUDA6 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:cb:00.0 type=discrete total="79.6 GiB" available="78.7 GiB"
time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-e6f4a3c5-1306-4de0-d954-97a187d923b1 filter_id="" library=CUDA compute=9.0 name=CUDA7 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:db:00.0 type=discrete total="79.6 GiB" available="78.7 GiB"

NAME               ID              SIZE      PROCESSOR    CONTEXT    UNTIL
granite4:latest    4235724a127c    3.2 GB    100% GPU     4096       3 minutes from now

Thanks again @dhiltgen and @justinclift for the assistance guys!!!! I guess this was a bit more of a self-inflicted wound. This beefy system is a fresh loaner given to me to play around with for two weeks before I have to sadly part ways with it and send it back to the R&D department that owns it.

So that solves my problem, but I believe @seedrick was the original reporter of this issue for his own problem. I believe he still has a problem and I just happened to piggy back on this report.

@incogno commented on GitHub (Nov 12, 2025): > > ggml_cuda_init: failed to initialize CUDA: system not yet initialized > > Searching online, I see some mentions that this can happen if you don't have NVLink/NVSwitch fabric manager set up correctly. Do other CUDA apps work correctly on your GPUs? Bingo. Thank you for suggesting that. I frankly hadn't thought to investigate the fabric manager. Turns out that's what my issue was and why the <580 drivers and <13.0 CUDA toolkit didn't seem to be working. I managed to mess it up with all of the troubleshooting and driver/toolkit switching that I was doing and neglected the fabric manager version. I purged everything associated with it and reinstalled the right version with a reboot. Voila. Check out this beautiful output. ``` time=2025-11-12T03:10:56.706Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-12T03:10:56.709Z level=INFO source=server.go:400 msg="starting runner" cmd="/root/source/ollama/ollama runner --ollama-engine --port 40335" time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-cdf3c132-675f-077e-3bf7-99d79213d062 filter_id="" library=CUDA compute=9.0 name=CUDA0 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:19:00.0 type=discrete total="79.6 GiB" available="79.2 GiB" time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-8507fda8-a792-4579-9f79-af49229e2e85 filter_id="" library=CUDA compute=9.0 name=CUDA1 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:3b:00.0 type=discrete total="79.6 GiB" available="78.7 GiB" time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-d0743f0a-2abb-1d5c-4780-71936c03d52f filter_id="" library=CUDA compute=9.0 name=CUDA2 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:4c:00.0 type=discrete total="79.6 GiB" available="78.7 GiB" time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-4546002c-99d1-db87-9e57-f2f7aeb4f6fe filter_id="" library=CUDA compute=9.0 name=CUDA3 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:5d:00.0 type=discrete total="79.6 GiB" available="78.7 GiB" time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-aff31e0e-b6b3-93cf-690f-a911d78293a8 filter_id="" library=CUDA compute=9.0 name=CUDA4 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:9b:00.0 type=discrete total="79.6 GiB" available="78.7 GiB" time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-3d386698-ff50-ba69-d9e9-2f8a982801c4 filter_id="" library=CUDA compute=9.0 name=CUDA5 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:bb:00.0 type=discrete total="79.6 GiB" available="78.7 GiB" time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-c41daaf4-eb48-3327-dc2f-679e0aecc80c filter_id="" library=CUDA compute=9.0 name=CUDA6 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:cb:00.0 type=discrete total="79.6 GiB" available="78.7 GiB" time=2025-11-12T03:11:01.218Z level=INFO source=types.go:42 msg="inference compute" id=GPU-e6f4a3c5-1306-4de0-d954-97a187d923b1 filter_id="" library=CUDA compute=9.0 name=CUDA7 description="NVIDIA H100 80GB HBM3" libdirs=ollama driver=12.9 pci_id=0000:db:00.0 type=discrete total="79.6 GiB" available="78.7 GiB" ``` ``` NAME ID SIZE PROCESSOR CONTEXT UNTIL granite4:latest 4235724a127c 3.2 GB 100% GPU 4096 3 minutes from now ``` Thanks again @dhiltgen and @justinclift for the assistance guys!!!! I guess this was a bit more of a self-inflicted wound. This beefy system is a fresh loaner given to me to play around with for two weeks before I have to sadly part ways with it and send it back to the R&D department that owns it. So that solves my problem, but I believe @seedrick was the original reporter of this issue for his own problem. I believe he still has a problem and I just happened to piggy back on this report.

GiteaMirror commented

2026-04-12 20:20:02 -05:00

@dhiltgen commented on GitHub (Nov 12, 2025):

I'm going to go ahead and close this one now. @seedrick if you are still having trouble, please upgrade to the latest version, and if that doesn't clear it up, please run the server with OLLAMA_DEBUG=2 for additional diagnostic information during GPU discovery, and share the startup log then I'll reopen and we'll investigate.

@dhiltgen commented on GitHub (Nov 12, 2025): I'm going to go ahead and close this one now. @seedrick if you are still having trouble, please upgrade to the latest version, and if that doesn't clear it up, please run the server with OLLAMA_DEBUG=2 for additional diagnostic information during GPU discovery, and share the startup log then I'll reopen and we'll investigate.

GiteaMirror commented

2026-04-12 20:20:03 -05:00

@seedrick commented on GitHub (Jan 28, 2026):

I'm going to go ahead and close this one now. @seedrick if you are still having trouble, please upgrade to the latest version, and if that doesn't clear it up, please run the server with OLLAMA_DEBUG=2 for additional diagnostic information during GPU discovery, and share the startup log then I'll reopen and we'll investigate.

@dhiltgen, Finally got to revisit this. Here's the result:

time=2026-01-28T15:34:43.510-05:00 level=INFO source=images.go:473 msg="total blobs: 8"
time=2026-01-28T15:34:43.510-05:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-01-28T15:34:43.510-05:00 level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.1)"
time=2026-01-28T15:34:43.510-05:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler"
time=2026-01-28T15:34:43.510-05:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-01-28T15:34:43.510-05:00 level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v12]" extraEnvs=map[]
time=2026-01-28T15:34:43.511-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/var/www/ollama__2/bin/ollama runner --ollama-engine --port 36667"
time=2026-01-28T15:34:43.511-05:00 level=DEBUG source=server.go:430 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_ORIGINS=https://ollama.REDACTED.io OLLAMA_HOST=http://0.0.0.0:11434 OLLAMA_MODELS=/home/REDACTED.app/ollama__2/models OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/var/www/ollama__2/lib/ollama:/var/www/ollama__2/lib/ollama/cuda_v12 OLLAMA_LIBRARY_PATH=/var/www/ollama__2/lib/ollama:/var/www/ollama__2/lib/ollama/cuda_v12
time=2026-01-28T15:34:43.521-05:00 level=INFO source=runner.go:1405 msg="starting ollama engine"
time=2026-01-28T15:34:43.521-05:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:36667"
time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=gguf.go:589 msg=general.architecture type=string
time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string
time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32
time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32
time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.file_type default=0
time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.name default=""
time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.description default=""
time=2026-01-28T15:34:43.532-05:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/var/www/ollama__2/lib/ollama
load_backend: loaded CPU backend from /var/www/ollama__2/lib/ollama/libggml-cpu-haswell.so
time=2026-01-28T15:34:43.537-05:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/var/www/ollama__2/lib/ollama/cuda_v12
time=2026-01-28T15:34:43.538-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.pooling_type default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.expert_count default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.embedding_length default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=runner.go:1380 msg="dummy model load took" duration=6.229167ms
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=runner.go:1385 msg="gathering device infos took" duration=468ns
time=2026-01-28T15:34:43.538-05:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v12]" devices=[]
time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=28.353307ms OLLAMA_LIBRARY_PATH="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v12]" extra_envs=map[]
time=2026-01-28T15:34:43.538-05:00 level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v13]" extraEnvs=map[]
time=2026-01-28T15:34:43.539-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/var/www/ollama__2/bin/ollama runner --ollama-engine --port 33591"
time=2026-01-28T15:34:43.539-05:00 level=DEBUG source=server.go:430 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_ORIGINS=https://ollama.REDACTED.io OLLAMA_HOST=http://0.0.0.0:11434 OLLAMA_MODELS=/home/REDACTED.app/ollama__2/models OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/var/www/ollama__2/lib/ollama:/var/www/ollama__2/lib/ollama/cuda_v13 OLLAMA_LIBRARY_PATH=/var/www/ollama__2/lib/ollama:/var/www/ollama__2/lib/ollama/cuda_v13
time=2026-01-28T15:34:43.549-05:00 level=INFO source=runner.go:1405 msg="starting ollama engine"
time=2026-01-28T15:34:43.550-05:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:33591"
time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=gguf.go:589 msg=general.architecture type=string
time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string
time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32
time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32
time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.file_type default=0
time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.name default=""
time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.description default=""
time=2026-01-28T15:34:43.560-05:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/var/www/ollama__2/lib/ollama
load_backend: loaded CPU backend from /var/www/ollama__2/lib/ollama/libggml-cpu-haswell.so
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/var/www/ollama__2/lib/ollama/cuda_v13
time=2026-01-28T15:34:43.566-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.pooling_type default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.expert_count default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.embedding_length default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=runner.go:1380 msg="dummy model load took" duration=6.308208ms
time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=runner.go:1385 msg="gathering device infos took" duration=492ns
time=2026-01-28T15:34:43.567-05:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v13]" devices=[]
time=2026-01-28T15:34:43.567-05:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=28.104184ms OLLAMA_LIBRARY_PATH="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v13]" extra_envs=map[]
time=2026-01-28T15:34:43.567-05:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-01-28T15:34:43.567-05:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=0
time=2026-01-28T15:34:43.567-05:00 level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[]
time=2026-01-28T15:34:43.567-05:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=56.737007ms
time=2026-01-28T15:34:43.567-05:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="15.4 GiB" available="7.0 GiB"
time=2026-01-28T15:34:43.567-05:00 level=INFO source=routes.go:1725 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"```

@seedrick commented on GitHub (Jan 28, 2026): > I'm going to go ahead and close this one now. [@seedrick](https://github.com/seedrick) if you are still having trouble, please upgrade to the latest version, and if that doesn't clear it up, please run the server with OLLAMA_DEBUG=2 for additional diagnostic information during GPU discovery, and share the startup log then I'll reopen and we'll investigate. @dhiltgen, Finally got to revisit this. Here's the result: ```time=2026-01-28T15:34:43.509-05:00 level=INFO source=routes.go:1631 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/REDACTED.app/ollama__2/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[https://ollama.REDACTED.io http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-01-28T15:34:43.510-05:00 level=INFO source=images.go:473 msg="total blobs: 8" time=2026-01-28T15:34:43.510-05:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0" time=2026-01-28T15:34:43.510-05:00 level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.1)" time=2026-01-28T15:34:43.510-05:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler" time=2026-01-28T15:34:43.510-05:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-01-28T15:34:43.510-05:00 level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v12]" extraEnvs=map[] time=2026-01-28T15:34:43.511-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/var/www/ollama__2/bin/ollama runner --ollama-engine --port 36667" time=2026-01-28T15:34:43.511-05:00 level=DEBUG source=server.go:430 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_ORIGINS=https://ollama.REDACTED.io OLLAMA_HOST=http://0.0.0.0:11434 OLLAMA_MODELS=/home/REDACTED.app/ollama__2/models OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/var/www/ollama__2/lib/ollama:/var/www/ollama__2/lib/ollama/cuda_v12 OLLAMA_LIBRARY_PATH=/var/www/ollama__2/lib/ollama:/var/www/ollama__2/lib/ollama/cuda_v12 time=2026-01-28T15:34:43.521-05:00 level=INFO source=runner.go:1405 msg="starting ollama engine" time=2026-01-28T15:34:43.521-05:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:36667" time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=gguf.go:589 msg=general.architecture type=string time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32 time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32 time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.file_type default=0 time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.name default="" time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.description default="" time=2026-01-28T15:34:43.532-05:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-01-28T15:34:43.532-05:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/var/www/ollama__2/lib/ollama load_backend: loaded CPU backend from /var/www/ollama__2/lib/ollama/libggml-cpu-haswell.so time=2026-01-28T15:34:43.537-05:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/var/www/ollama__2/lib/ollama/cuda_v12 time=2026-01-28T15:34:43.538-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.pooling_type default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.expert_count default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.embedding_length default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=runner.go:1380 msg="dummy model load took" duration=6.229167ms time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=runner.go:1385 msg="gathering device infos took" duration=468ns time=2026-01-28T15:34:43.538-05:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v12]" devices=[] time=2026-01-28T15:34:43.538-05:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=28.353307ms OLLAMA_LIBRARY_PATH="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v12]" extra_envs=map[] time=2026-01-28T15:34:43.538-05:00 level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v13]" extraEnvs=map[] time=2026-01-28T15:34:43.539-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/var/www/ollama__2/bin/ollama runner --ollama-engine --port 33591" time=2026-01-28T15:34:43.539-05:00 level=DEBUG source=server.go:430 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_ORIGINS=https://ollama.REDACTED.io OLLAMA_HOST=http://0.0.0.0:11434 OLLAMA_MODELS=/home/REDACTED.app/ollama__2/models OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/var/www/ollama__2/lib/ollama:/var/www/ollama__2/lib/ollama/cuda_v13 OLLAMA_LIBRARY_PATH=/var/www/ollama__2/lib/ollama:/var/www/ollama__2/lib/ollama/cuda_v13 time=2026-01-28T15:34:43.549-05:00 level=INFO source=runner.go:1405 msg="starting ollama engine" time=2026-01-28T15:34:43.550-05:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:33591" time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=gguf.go:589 msg=general.architecture type=string time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32 time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32 time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.file_type default=0 time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.name default="" time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.description default="" time=2026-01-28T15:34:43.560-05:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-01-28T15:34:43.560-05:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/var/www/ollama__2/lib/ollama load_backend: loaded CPU backend from /var/www/ollama__2/lib/ollama/libggml-cpu-haswell.so time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/var/www/ollama__2/lib/ollama/cuda_v13 time=2026-01-28T15:34:43.566-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.pooling_type default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.expert_count default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.embedding_length default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=runner.go:1380 msg="dummy model load took" duration=6.308208ms time=2026-01-28T15:34:43.566-05:00 level=DEBUG source=runner.go:1385 msg="gathering device infos took" duration=492ns time=2026-01-28T15:34:43.567-05:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v13]" devices=[] time=2026-01-28T15:34:43.567-05:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=28.104184ms OLLAMA_LIBRARY_PATH="[/var/www/ollama__2/lib/ollama /var/www/ollama__2/lib/ollama/cuda_v13]" extra_envs=map[] time=2026-01-28T15:34:43.567-05:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-01-28T15:34:43.567-05:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=0 time=2026-01-28T15:34:43.567-05:00 level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[] time=2026-01-28T15:34:43.567-05:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=56.737007ms time=2026-01-28T15:34:43.567-05:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="15.4 GiB" available="7.0 GiB" time=2026-01-28T15:34:43.567-05:00 level=INFO source=routes.go:1725 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"```

GiteaMirror referenced this issue

2026-04-22 10:54:31 -05:00

[GH-ISSUE #8065] dial tcp: lookup registry.ollama.ai on 127.0.0.53:53: server misbehaving #30916

GiteaMirror referenced this issue

2026-04-28 20:43:18 -05:00

[GH-ISSUE #8065] dial tcp: lookup registry.ollama.ai on 127.0.0.53:53: server misbehaving #51667

GiteaMirror referenced this issue

2026-05-04 09:37:30 -05:00

[GH-ISSUE #8065] dial tcp: lookup registry.ollama.ai on 127.0.0.53:53: server misbehaving #67212

GiteaMirror referenced this issue

2026-05-09 15:54:46 -05:00

[GH-ISSUE #8065] dial tcp: lookup registry.ollama.ai on 127.0.0.53:53: server misbehaving #82838

Sign in to join this conversation.

Branches Tags

main

parth-update-hermes-launch

parth-agent-system-prompt-cwd

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-fix-claude-model-picker

parth-api-status-context-length

docs/vscode-extension-setup

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#8065