[GH-ISSUE #15178] Not able to run ollama on top of RTX PRO 4500 Server Edition + GPU mig enabled + driver 590.48.01 + Cuda13.1 #9716

New Issue

GiteaMirror · 2026-04-12T22:35:53-05:00

GiteaMirror commented

2026-04-12 22:35:53 -05:00

Originally created by @g21chen on GitHub (Mar 31, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15178

What is the issue?

Not able to run ollama on top of RTX PRO 4500 Server Edition + GPU mig enabled + driver 590.48.01 + Cuda13.1

the error logs:

In host server side

[root@nvidia-driver-daemonset-9 drivers]# nvidia-smi
Tue Mar 31 16:50:24 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 4500 Blac... On | 00000000:0B:00.0 Off | On |
| N/A 29C P8 17W / 165W | 116MiB / 32623MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Shared Memory-Usage | Vol| Shared |
| ID ID Dev | Shared BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 1 0 0 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 0 2 0 1 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

in the ollama pod

root@ollama-0:/# root@ollama-0:/# nvidia-smi
Tue Mar 31 16:52:04 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 4500 Blac... On | 00000000:0B:00.0 Off | On |
| N/A 29C P8 17W / 165W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Shared Memory-Usage | Vol| Shared |
| ID ID Dev | Shared BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 1 0 0 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

pod spec to deploy ollama

[core@quanta-tmo-4500-cran mig]$ cat ollama-deployment.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ollama
namespace: open-webui
spec:
serviceName: "ollama"
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:

name: ollama
image: docker.io/ollama/ollama:latest
imagePullPolicy: IfNotPresent
env:
- name: OLLAMA_DEBUG
value: "1"
ports:
containerPort: 11434
resources:
requests:
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"

the GPU resource pools in host side (MIG policy is "single")

[core@XXXX]$ oc describe node |grep -i "nvidia.com/gpu:" -B 8
Capacity:
cpu: 128
ephemeral-storage: 1873933640Ki
hugepages-1Gi: 50Gi
hugepages-2Mi: 0
memory: 263630500Ki
nvidia.com/gpu: 2

pod is running

[core@xxxx]$ oc get pods -n open-webui
NAME READY STATUS RESTARTS AGE
ollama-0 1/1 Running 0 61s

error logs.

[core@xxxxx]$ oc logs ollama-0 -n open-webui
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDfkDTv/WVAHAEBTB3xxg80OJRJCi/IsOMNJUfVSyn+f

time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1742 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1744 msg="Ollama cloud disabled: false"
time=2026-03-31T16:54:49.030Z level=INFO source=images.go:477 msg="total blobs: 0"
time=2026-03-31T16:54:49.030Z level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1800 msg="Listening on [::]:11434 (version 0.19.0)"
time=2026-03-31T16:54:49.030Z level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-03-31T16:54:49.031Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-31T16:54:49.033Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42887"
time=2026-03-31T16:54:49.033Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12
time=2026-03-31T16:54:49.413Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=382.320152ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]
time=2026-03-31T16:54:49.414Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38815"
time=2026-03-31T16:54:49.414Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13
time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=245.184387ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]
time=2026-03-31T16:54:49.659Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1"
time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2
time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0
time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0
time=2026-03-31T16:54:49.659Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34101"
time=2026-03-31T16:54:49.659Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02
time=2026-03-31T16:54:49.659Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34747"
time=2026-03-31T16:54:49.659Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02
time=2026-03-31T16:54:49.758Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=99.18846ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T16:54:49.758Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v12 pci_id=0000:0b:00.0 library=CUDA
time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=105.845603ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v13 pci_id=0000:0b:00.0 library=CUDA
time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=733.993471ms
time=2026-03-31T16:54:49.765Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="251.4 GiB" available="251.3 GiB"
time=2026-03-31T16:54:49.765Z level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096

Relevant log output

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @g21chen on GitHub (Mar 31, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15178 ### What is the issue? Not able to run ollama on top of RTX PRO 4500 Server Edition + GPU mig enabled + driver 590.48.01 + Cuda13.1 the error logs: In host server side ``` [root@nvidia-driver-daemonset-9 drivers]# nvidia-smi Tue Mar 31 16:50:24 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA RTX PRO 4500 Blac... On | 00000000:0B:00.0 Off | On | | N/A 29C P8 17W / 165W | 116MiB / 32623MiB | N/A Default | | | | Enabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------------------+-----------+-----------------------+ | GPU GI CI MIG | Shared Memory-Usage | Vol| Shared | | ID ID Dev | Shared BAR1-Usage | SM Unc| CE ENC DEC OFA JPG | | | | ECC| | |==================+==================================+===========+=======================| | 0 1 0 0 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 | | | 0MiB / 16383MiB | | | +------------------+----------------------------------+-----------+-----------------------+ | 0 2 0 1 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 | | | 0MiB / 16383MiB | | | +------------------+----------------------------------+-----------+-----------------------+ ``` in the ollama pod ``` root@ollama-0:/# root@ollama-0:/# nvidia-smi Tue Mar 31 16:52:04 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA RTX PRO 4500 Blac... On | 00000000:0B:00.0 Off | On | | N/A 29C P8 17W / 165W | N/A | N/A Default | | | | Enabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------------------+-----------+-----------------------+ | GPU GI CI MIG | Shared Memory-Usage | Vol| Shared | | ID ID Dev | Shared BAR1-Usage | SM Unc| CE ENC DEC OFA JPG | | | | ECC| | |==================+==================================+===========+=======================| | 0 1 0 0 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 | | | 0MiB / 16383MiB | | | +------------------+----------------------------------+-----------+-----------------------+ ``` pod spec to deploy ollama ``` [core@quanta-tmo-4500-cran mig]$ cat ollama-deployment.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: ollama namespace: open-webui spec: serviceName: "ollama" selector: matchLabels: app: ollama template: metadata: labels: app: ollama spec: containers: name: ollama image: docker.io/ollama/ollama:latest imagePullPolicy: IfNotPresent env: - name: OLLAMA_DEBUG value: "1" ports: containerPort: 11434 resources: requests: nvidia.com/gpu: "1" limits: nvidia.com/gpu: "1" ``` the GPU resource pools in host side (MIG policy is "single") ``` [core@XXXX]$ oc describe node |grep -i "nvidia.com/gpu:" -B 8 Capacity: cpu: 128 ephemeral-storage: 1873933640Ki hugepages-1Gi: 50Gi hugepages-2Mi: 0 memory: 263630500Ki nvidia.com/gpu: 2 ``` pod is running ``` [core@xxxx]$ oc get pods -n open-webui NAME READY STATUS RESTARTS AGE ollama-0 1/1 Running 0 61s ``` error logs. ``` [core@xxxxx]$ oc logs ollama-0 -n open-webui Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. Your new public key is: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDfkDTv/WVAHAEBTB3xxg80OJRJCi/IsOMNJUfVSyn+f time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1742 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1744 msg="Ollama cloud disabled: false" time=2026-03-31T16:54:49.030Z level=INFO source=images.go:477 msg="total blobs: 0" time=2026-03-31T16:54:49.030Z level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1800 msg="Listening on [::]:11434 (version 0.19.0)" time=2026-03-31T16:54:49.030Z level=DEBUG source=sched.go:145 msg="starting llm scheduler" time=2026-03-31T16:54:49.031Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-31T16:54:49.033Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42887" time=2026-03-31T16:54:49.033Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 time=2026-03-31T16:54:49.413Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=382.320152ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] time=2026-03-31T16:54:49.414Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38815" time=2026-03-31T16:54:49.414Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=245.184387ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] time=2026-03-31T16:54:49.659Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2 time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0 time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0 time=2026-03-31T16:54:49.659Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34101" time=2026-03-31T16:54:49.659Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 time=2026-03-31T16:54:49.659Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34747" time=2026-03-31T16:54:49.659Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 time=2026-03-31T16:54:49.758Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=99.18846ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T16:54:49.758Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v12 pci_id=0000:0b:00.0 library=CUDA time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=105.845603ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v13 pci_id=0000:0b:00.0 library=CUDA time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=733.993471ms time=2026-03-31T16:54:49.765Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="251.4 GiB" available="251.3 GiB" time=2026-03-31T16:54:49.765Z level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 ``` ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_

GiteaMirror added the bug label 2026-04-12 22:35:53 -05:00

GiteaMirror commented

2026-04-12 22:35:55 -05:00

@g21chen commented on GitHub (Mar 31, 2026):

add more logs in ollama pod

root@ollama-0:/# nvidia-smi -L
GPU 0: NVIDIA RTX PRO 4500 Blackwell Server Edition (UUID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02)
  MIG 1g.16gb     Device  0: (UUID: MIG-fc57ef1d-199f-5ca5-91fa-d48bb00d3467)
root@ollama-0:/#

@g21chen commented on GitHub (Mar 31, 2026): add more logs in ollama pod ``` root@ollama-0:/# nvidia-smi -L GPU 0: NVIDIA RTX PRO 4500 Blackwell Server Edition (UUID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02) MIG 1g.16gb Device 0: (UUID: MIG-fc57ef1d-199f-5ca5-91fa-d48bb00d3467) root@ollama-0:/# ```

GiteaMirror commented

2026-04-12 22:35:56 -05:00

@rick-github commented on GitHub (Mar 31, 2026):

Set OLLAMA_DEBUG=2 in the environment of the ollama server, restart it and post the log.

@rick-github commented on GitHub (Mar 31, 2026): Set `OLLAMA_DEBUG=2` in the environment of the ollama server, restart it and post the log.

GiteaMirror commented

2026-04-12 22:35:56 -05:00

@g21chen commented on GitHub (Mar 31, 2026):

Hi Rick

Thanks for the quick response. logs are pasted below:

[core@xxxxx]$ oc logs ollama-0 -n open-webui
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJcD1FiWHjKDIts9zuGUVwhSHN++ahXD188YeHJ6Ro0J

time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1742 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1744 msg="Ollama cloud disabled: false"
time=2026-03-31T18:21:56.150Z level=INFO source=images.go:477 msg="total blobs: 0"
time=2026-03-31T18:21:56.150Z level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1800 msg="Listening on [::]:11434 (version 0.19.0)"
time=2026-03-31T18:21:56.151Z level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-03-31T18:21:56.152Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-31T18:21:56.152Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extraEnvs=map[]
time=2026-03-31T18:21:56.153Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36249"
time=2026-03-31T18:21:56.153Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12
time=2026-03-31T18:21:56.169Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-31T18:21:56.169Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:36249"
time=2026-03-31T18:21:56.174Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string
time=2026-03-31T18:21:56.174Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-31T18:21:56.174Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-03-31T18:21:56.180Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb, compute capability 12.0, VMM: yes, ID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-03-31T18:21:56.351Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-03-31T18:21:56.351Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=177.508363ms
time=2026-03-31T18:21:56.473Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=121.404308ms
time=2026-03-31T18:21:56.473Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices="[{DeviceID:{ID:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 Library:CUDA} Name:CUDA0 Description:NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb FilterID: Integrated:false PCIID:0000:0b:00.0 TotalMemory:16810770432 FreeMemory:16614490112 ComputeMajor:12 ComputeMinor:0 DriverMajor:13 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/cuda_v12]}]"
time=2026-03-31T18:21:56.473Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=320.951467ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]
time=2026-03-31T18:21:56.473Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extraEnvs=map[]
time=2026-03-31T18:21:56.473Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43961"
time=2026-03-31T18:21:56.473Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13
time=2026-03-31T18:21:56.488Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-31T18:21:56.489Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:43961"
time=2026-03-31T18:21:56.495Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string
time=2026-03-31T18:21:56.495Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-31T18:21:56.495Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-03-31T18:21:56.500Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb, compute capability 12.0, VMM: yes, ID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so
time=2026-03-31T18:21:56.607Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-03-31T18:21:56.607Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=112.192034ms
time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=104.541956ms
time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" devices="[{DeviceID:{ID:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 Library:CUDA} Name:CUDA0 Description:NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb FilterID: Integrated:false PCIID:0000:0b:00.0 TotalMemory:16810770432 FreeMemory:16614490112 ComputeMajor:12 ComputeMinor:0 DriverMajor:13 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/cuda_v13]}]"
time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=238.913755ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]
time=2026-03-31T18:21:56.712Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2
time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0
time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0
time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extraEnvs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extraEnvs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T18:21:56.712Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41427"
time=2026-03-31T18:21:56.712Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT=1
time=2026-03-31T18:21:56.713Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43293"
time=2026-03-31T18:21:56.713Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT=1
time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:41427"
time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-31T18:21:56.726Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:43293"
time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string
time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
time=2026-03-31T18:21:56.734Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-31T18:21:56.734Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-03-31T18:21:56.739Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13
time=2026-03-31T18:21:56.739Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12
ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-03-31T18:21:56.810Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-03-31T18:21:56.811Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=77.553908ms
time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=882ns
time=2026-03-31T18:21:56.812Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices=[]
time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=99.5831ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v12 pci_id=0000:0b:00.0 library=CUDA
ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so
time=2026-03-31T18:21:56.814Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-03-31T18:21:56.814Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=81.982831ms
time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=726ns
time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" devices=[]
time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=104.013646ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v13 pci_id=0000:0b:00.0 library=CUDA
time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[]
time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:183 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 pci_id=0000:0b:00.0
time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:183 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 pci_id=0000:0b:00.0
time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=665.6487ms
time=2026-03-31T18:21:56.816Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="251.4 GiB" available="251.2 GiB"
time=2026-03-31T18:21:56.816Z level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096
[core@quanta-tmo-4500-cran mig]$

Gang

@g21chen commented on GitHub (Mar 31, 2026): Hi Rick Thanks for the quick response. logs are pasted below: ``` [core@xxxxx]$ oc logs ollama-0 -n open-webui Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. Your new public key is: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJcD1FiWHjKDIts9zuGUVwhSHN++ahXD188YeHJ6Ro0J time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1742 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1744 msg="Ollama cloud disabled: false" time=2026-03-31T18:21:56.150Z level=INFO source=images.go:477 msg="total blobs: 0" time=2026-03-31T18:21:56.150Z level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1800 msg="Listening on [::]:11434 (version 0.19.0)" time=2026-03-31T18:21:56.151Z level=DEBUG source=sched.go:145 msg="starting llm scheduler" time=2026-03-31T18:21:56.152Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-31T18:21:56.152Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extraEnvs=map[] time=2026-03-31T18:21:56.153Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36249" time=2026-03-31T18:21:56.153Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 time=2026-03-31T18:21:56.169Z level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-31T18:21:56.169Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:36249" time=2026-03-31T18:21:56.174Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string time=2026-03-31T18:21:56.174Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-31T18:21:56.174Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-03-31T18:21:56.180Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb, compute capability 12.0, VMM: yes, ID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02 load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so time=2026-03-31T18:21:56.351Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-03-31T18:21:56.351Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=177.508363ms time=2026-03-31T18:21:56.473Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=121.404308ms time=2026-03-31T18:21:56.473Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices="[{DeviceID:{ID:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 Library:CUDA} Name:CUDA0 Description:NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb FilterID: Integrated:false PCIID:0000:0b:00.0 TotalMemory:16810770432 FreeMemory:16614490112 ComputeMajor:12 ComputeMinor:0 DriverMajor:13 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/cuda_v12]}]" time=2026-03-31T18:21:56.473Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=320.951467ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] time=2026-03-31T18:21:56.473Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extraEnvs=map[] time=2026-03-31T18:21:56.473Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43961" time=2026-03-31T18:21:56.473Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 time=2026-03-31T18:21:56.488Z level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-31T18:21:56.489Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:43961" time=2026-03-31T18:21:56.495Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string time=2026-03-31T18:21:56.495Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-31T18:21:56.495Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-03-31T18:21:56.500Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb, compute capability 12.0, VMM: yes, ID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02 load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so time=2026-03-31T18:21:56.607Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-03-31T18:21:56.607Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=112.192034ms time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=104.541956ms time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" devices="[{DeviceID:{ID:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 Library:CUDA} Name:CUDA0 Description:NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb FilterID: Integrated:false PCIID:0000:0b:00.0 TotalMemory:16810770432 FreeMemory:16614490112 ComputeMajor:12 ComputeMinor:0 DriverMajor:13 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/cuda_v13]}]" time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=238.913755ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] time=2026-03-31T18:21:56.712Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2 time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0 time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0 time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extraEnvs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extraEnvs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T18:21:56.712Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41427" time=2026-03-31T18:21:56.712Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT=1 time=2026-03-31T18:21:56.713Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43293" time=2026-03-31T18:21:56.713Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT=1 time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:41427" time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-31T18:21:56.726Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:43293" time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string time=2026-03-31T18:21:56.734Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-31T18:21:56.734Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-03-31T18:21:56.739Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13 time=2026-03-31T18:21:56.739Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12 ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so time=2026-03-31T18:21:56.810Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-03-31T18:21:56.811Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=77.553908ms time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=882ns time=2026-03-31T18:21:56.812Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices=[] time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=99.5831ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v12 pci_id=0000:0b:00.0 library=CUDA ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so time=2026-03-31T18:21:56.814Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-03-31T18:21:56.814Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=81.982831ms time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=726ns time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" devices=[] time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=104.013646ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v13 pci_id=0000:0b:00.0 library=CUDA time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[] time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:183 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 pci_id=0000:0b:00.0 time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:183 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 pci_id=0000:0b:00.0 time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=665.6487ms time=2026-03-31T18:21:56.816Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="251.4 GiB" available="251.2 GiB" time=2026-03-31T18:21:56.816Z level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 [core@quanta-tmo-4500-cran mig]$ ``` Gang

GiteaMirror commented

2026-04-12 22:35:57 -05:00

@rick-github commented on GitHub (Mar 31, 2026):

What's the output of

nvidia-smi -q

@rick-github commented on GitHub (Mar 31, 2026): What's the output of ``` nvidia-smi -q ```

GiteaMirror commented

2026-04-12 22:35:57 -05:00

@g21chen commented on GitHub (Mar 31, 2026):

output with executing command in ollama pod.

root@ollama-0:/# nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                              : Tue Mar 31 18:31:27 2026
Driver Version                                         : 590.48.01
CUDA Version                                           : 13.1

Attached GPUs                                          : 1
GPU 00000000:0B:00.0
    Product Name                                       : NVIDIA RTX PRO 4500 Blackwell Server Edition
    Product Brand                                      : NVIDIA
    Product Architecture                               : Blackwell
    Display Mode                                       : Requested functionality has been deprecated
    Display Attached                                   : No
    Display Active                                     : Disabled
    Persistence Mode                                   : Enabled
    Addressing Mode                                    : HMM
    MIG Mode
        Current                                        : Enabled
        Pending                                        : Enabled
    MIG Device
        Index                                          : 0
        GPU Instance ID                                : 1
        Compute Instance ID                            : 0
        Device Attributes
            Shared
                Multiprocessor count                   : 42
                Copy Engine count                      : 1
                Encoder count                          : 1
                Decoder count                          : 1
                OFA count                              : 0
                JPG count                              : 1
        ECC Errors
            Volatile
                SRAM Uncorrectable                     : 0
        Shared FB Memory Usage
            Total                                      : 16032 MiB
            Reserved                                   : 0 MiB
            Used                                       : 58 MiB
            Free                                       : 15975 MiB
        Shared BAR1 Memory
            Total                                      : 16383 MiB
            Used                                       : 0 MiB
            Free                                       : 16383 MiB
    Accounting Mode                                    : Disabled
    Accounting Mode Buffer Size                        : 4000
    Driver Model
        Current                                        : N/A
        Pending                                        : N/A
    Serial Number                                      : 1795125089725
    GPU UUID                                           : GPU-01991e90-02de-d9b8-d852-cd5265aefb02
    GPU PDI                                            : 0x648fae272cf1d560
    Minor Number                                       : 0
    VBIOS Version                                      : 98.03.B1.00.01
    MultiGPU Board                                     : No
    Board ID                                           : 0xb00
    Board Part Number                                  : 900-2G147-0000-000
    GPU Part Number                                    : 2C3A-895-A1
    FRU Part Number                                    : N/A
    Platform Info
        Chassis Serial Number                          :
        Slot Number                                    : 0
        Tray Index                                     : 0
        Host ID                                        : 1
        Peer Type                                      : Direct Connected
        Module Id                                      : 1
        GPU Fabric GUID                                : 0x0000000000000000
    Inforom Version
        Image Version                                  : G147.0210.00.01
        OEM Object                                     : 2.1
        ECC Object                                     : 7.16
        Power Management Object                        : N/A
    Inforom BBX Object Flush
        Latest Timestamp                               : 2026/03/30 19:01:47.841
        Latest Duration                                : 48934 us
    GPU Operation Mode
        Current                                        : N/A
        Pending                                        : N/A
    GPU C2C Mode                                       : Disabled
    GPU Virtualization Mode
        Virtualization Mode                            : None
        Host VGPU Mode                                 : N/A
        vGPU Heterogeneous Mode                        : N/A
    GPU Recovery Action                                : None
    GSP Firmware Version                               : 590.48.01
    IBMNPU
        Relaxed Ordering Mode                          : N/A
    PCI
        Bus                                            : 0x0B
        Device                                         : 0x00
        Domain                                         : 0x0000
        Base Classcode                                 : 0x3
        Sub Classcode                                  : 0x2
        Device Id                                      : 0x2C3A10DE
        Bus Id                                         : 00000000:0B:00.0
        Sub System Id                                  : 0x21F410DE
        GPU Link Info
            PCIe Generation
                Max                                    : 5
                Current                                : 1
                Device Current                         : 1
                Device Max                             : 5
                Host Max                               : 5
            Link Width
                Max                                    : 16x
                Current                                : 16x
        Bridge Chip
            Type                                       : N/A
            Firmware                                   : N/A
        Replays Since Reset                            : 0
        Replay Number Rollovers                        : 0
        Tx Throughput                                  : 481 KB/s
        Rx Throughput                                  : 426 KB/s
        Atomic Caps Outbound                           : N/A
        Atomic Caps Inbound                            : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64
    Fan Speed                                          : N/A
    Performance State                                  : P8
    Clocks Event Reasons
        Idle                                           : Active
        Applications Clocks Setting                    : Not Active
        SW Power Cap                                   : Not Active
        HW Slowdown                                    : Not Active
            HW Thermal Slowdown                        : Not Active
            HW Power Brake Slowdown                    : Not Active
        Sync Boost                                     : Not Active
        SW Thermal Slowdown                            : Not Active
        Display Clock Setting                          : Not Active
    Clocks Event Reasons Counters
        SW Power Capping                               : 384727561 us
        Sync Boost                                     : 0 us
        SW Thermal Slowdown                            : 0 us
        HW Thermal Slowdown                            : 0 us
        HW Power Braking                               : 0 us
    Sparse Operation Mode                              : N/A
    FB Memory Usage
        Total                                          : Insufficient Permissions
        Reserved                                       : Insufficient Permissions
        Used                                           : Insufficient Permissions
        Free                                           : Insufficient Permissions
    BAR1 Memory Usage
        Total                                          : Insufficient Permissions
        Used                                           : Insufficient Permissions
        Free                                           : Insufficient Permissions
    Conf Compute Protected Memory Usage
        Total                                          : Insufficient Permissions
        Used                                           : Insufficient Permissions
        Free                                           : Insufficient Permissions
    Compute Mode                                       : Default
    Utilization
        GPU                                            : N/A
        Memory                                         : N/A
        Encoder                                        : N/A
        Decoder                                        : N/A
        JPEG                                           : N/A
        OFA                                            : N/A
    Encoder Stats
        Active Sessions                                : 0
        Average FPS                                    : 0
        Average Latency                                : 0
    FBC Stats
        Active Sessions                                : 0
        Average FPS                                    : 0
        Average Latency                                : 0
    DRAM Encryption Mode
        Current                                        : Disabled
        Pending                                        : Disabled
    ECC Mode
        Current                                        : Enabled
        Pending                                        : Enabled
    ECC Errors
        Volatile
            SRAM Correctable                           : 0
            SRAM Uncorrectable Parity                  : 0
            SRAM Uncorrectable SEC-DED                 : 0
            DRAM Correctable                           : 0
            DRAM Uncorrectable                         : 0
        Aggregate
            SRAM Correctable                           : 0
            SRAM Uncorrectable Parity                  : 0
            SRAM Uncorrectable SEC-DED                 : 0
            DRAM Correctable                           : 0
            DRAM Uncorrectable                         : 0
            SRAM Threshold Exceeded                    : No
        Aggregate Uncorrectable SRAM Sources
            SRAM L2                                    : 0
            SRAM SM                                    : 0
            SRAM Microcontroller                       : 0
            SRAM PCIE                                  : 0
            SRAM Other                                 : 0
        Channel Repair Pending                         : No
        TPC Repair Pending                             : No
        Unrepairable Memory                            : No
    Retired Pages
        Single Bit ECC                                 : N/A
        Double Bit ECC                                 : N/A
        Pending Page Blacklist                         : N/A
    Remapped Rows                                      : N/A
    Temperature
        GPU Current Temp                               : 34 C
        GPU T.Limit Temp                               : 58 C
        GPU Shutdown T.Limit Temp                      : -5 C
        GPU Slowdown T.Limit Temp                      : -2 C
        GPU Max Operating T.Limit Temp                 : 0 C
        GPU Target Temperature                         : N/A
        Memory Current Temp                            : N/A
        Memory Max Operating T.Limit Temp              : N/A
    GPU Power Readings
        Average Power Draw                             : 17.86 W
        Instantaneous Power Draw                       : 17.27 W
        Current Power Limit                            : 165.00 W
        Requested Power Limit                          : 165.00 W
        Default Power Limit                            : 165.00 W
        Min Power Limit                                : 100.00 W
        Max Power Limit                                : 165.00 W
    GPU Memory Power Readings
        Average Power Draw                             : N/A
        Instantaneous Power Draw                       : N/A
    Module Power Readings
        Average Power Draw                             : N/A
        Instantaneous Power Draw                       : N/A
        Current Power Limit                            : N/A
        Requested Power Limit                          : N/A
        Default Power Limit                            : N/A
        Min Power Limit                                : N/A
        Max Power Limit                                : N/A
    Power Smoothing                                    : N/A
    Workload Power Profiles
        Requested Profiles                             : N/A
        Enforced Profiles                              : N/A
    EDPp Multiplier                                    : N/A
    Clocks
        Graphics                                       : 180 MHz
        SM                                             : 180 MHz
        Memory                                         : 405 MHz
        Video                                          : 600 MHz
    Applications Clocks
        Graphics                                       : Requested functionality has been deprecated
        Memory                                         : Requested functionality has been deprecated
    Default Applications Clocks
        Graphics                                       : Requested functionality has been deprecated
        Memory                                         : Requested functionality has been deprecated
    Deferred Clocks
        Memory                                         : N/A
    Max Clocks
        Graphics                                       : 2415 MHz
        SM                                             : 2415 MHz
        Memory                                         : 12501 MHz
        Video                                          : 2100 MHz
    Max Customer Boost Clocks
        Graphics                                       : 2415 MHz
    Clock Policy
        Auto Boost                                     : N/A
        Auto Boost Default                             : N/A
    Fabric
        State                                          : N/A
        Status                                         : N/A
        CliqueId                                       : N/A
        ClusterUUID                                    : N/A
        Health
            Summary                                    : N/A
            Bandwidth                                  : N/A
            Route Recovery in progress                 : N/A
            Route Unhealthy                            : N/A
            Access Timeout Recovery                    : N/A
            Incorrect Configuration                    : N/A
    Processes                                          : None
    Capabilities
        EGM                                            : disabled

root@ollama-0:/#

@g21chen commented on GitHub (Mar 31, 2026): output with executing command in ollama pod. ``` root@ollama-0:/# nvidia-smi -q ==============NVSMI LOG============== Timestamp : Tue Mar 31 18:31:27 2026 Driver Version : 590.48.01 CUDA Version : 13.1 Attached GPUs : 1 GPU 00000000:0B:00.0 Product Name : NVIDIA RTX PRO 4500 Blackwell Server Edition Product Brand : NVIDIA Product Architecture : Blackwell Display Mode : Requested functionality has been deprecated Display Attached : No Display Active : Disabled Persistence Mode : Enabled Addressing Mode : HMM MIG Mode Current : Enabled Pending : Enabled MIG Device Index : 0 GPU Instance ID : 1 Compute Instance ID : 0 Device Attributes Shared Multiprocessor count : 42 Copy Engine count : 1 Encoder count : 1 Decoder count : 1 OFA count : 0 JPG count : 1 ECC Errors Volatile SRAM Uncorrectable : 0 Shared FB Memory Usage Total : 16032 MiB Reserved : 0 MiB Used : 58 MiB Free : 15975 MiB Shared BAR1 Memory Total : 16383 MiB Used : 0 MiB Free : 16383 MiB Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1795125089725 GPU UUID : GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GPU PDI : 0x648fae272cf1d560 Minor Number : 0 VBIOS Version : 98.03.B1.00.01 MultiGPU Board : No Board ID : 0xb00 Board Part Number : 900-2G147-0000-000 GPU Part Number : 2C3A-895-A1 FRU Part Number : N/A Platform Info Chassis Serial Number : Slot Number : 0 Tray Index : 0 Host ID : 1 Peer Type : Direct Connected Module Id : 1 GPU Fabric GUID : 0x0000000000000000 Inforom Version Image Version : G147.0210.00.01 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2026/03/30 19:01:47.841 Latest Duration : 48934 us GPU Operation Mode Current : N/A Pending : N/A GPU C2C Mode : Disabled GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A vGPU Heterogeneous Mode : N/A GPU Recovery Action : None GSP Firmware Version : 590.48.01 IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x0B Device : 0x00 Domain : 0x0000 Base Classcode : 0x3 Sub Classcode : 0x2 Device Id : 0x2C3A10DE Bus Id : 00000000:0B:00.0 Sub System Id : 0x21F410DE GPU Link Info PCIe Generation Max : 5 Current : 1 Device Current : 1 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 481 KB/s Rx Throughput : 426 KB/s Atomic Caps Outbound : N/A Atomic Caps Inbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P8 Clocks Event Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Clocks Event Reasons Counters SW Power Capping : 384727561 us Sync Boost : 0 us SW Thermal Slowdown : 0 us HW Thermal Slowdown : 0 us HW Power Braking : 0 us Sparse Operation Mode : N/A FB Memory Usage Total : Insufficient Permissions Reserved : Insufficient Permissions Used : Insufficient Permissions Free : Insufficient Permissions BAR1 Memory Usage Total : Insufficient Permissions Used : Insufficient Permissions Free : Insufficient Permissions Conf Compute Protected Memory Usage Total : Insufficient Permissions Used : Insufficient Permissions Free : Insufficient Permissions Compute Mode : Default Utilization GPU : N/A Memory : N/A Encoder : N/A Decoder : N/A JPEG : N/A OFA : N/A Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 DRAM Encryption Mode Current : Disabled Pending : Disabled ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Channel Repair Pending : No TPC Repair Pending : No Unrepairable Memory : No Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 34 C GPU T.Limit Temp : 58 C GPU Shutdown T.Limit Temp : -5 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : N/A Memory Max Operating T.Limit Temp : N/A GPU Power Readings Average Power Draw : 17.86 W Instantaneous Power Draw : 17.27 W Current Power Limit : 165.00 W Requested Power Limit : 165.00 W Default Power Limit : 165.00 W Min Power Limit : 100.00 W Max Power Limit : 165.00 W GPU Memory Power Readings Average Power Draw : N/A Instantaneous Power Draw : N/A Module Power Readings Average Power Draw : N/A Instantaneous Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Power Smoothing : N/A Workload Power Profiles Requested Profiles : N/A Enforced Profiles : N/A EDPp Multiplier : N/A Clocks Graphics : 180 MHz SM : 180 MHz Memory : 405 MHz Video : 600 MHz Applications Clocks Graphics : Requested functionality has been deprecated Memory : Requested functionality has been deprecated Default Applications Clocks Graphics : Requested functionality has been deprecated Memory : Requested functionality has been deprecated Deferred Clocks Memory : N/A Max Clocks Graphics : 2415 MHz SM : 2415 MHz Memory : 12501 MHz Video : 2100 MHz Max Customer Boost Clocks Graphics : 2415 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Fabric State : N/A Status : N/A CliqueId : N/A ClusterUUID : N/A Health Summary : N/A Bandwidth : N/A Route Recovery in progress : N/A Route Unhealthy : N/A Access Timeout Recovery : N/A Incorrect Configuration : N/A Processes : None Capabilities EGM : disabled root@ollama-0:/# ```

GiteaMirror commented

2026-04-12 22:35:58 -05:00

@g21chen commented on GitHub (Mar 31, 2026):

F.Y.I. it works with vLLM instead of ollama in the same server with same mig cfg

pod spec with vLLM
[core@xxxxx]$ cat vll-model-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-verify
namespace: open-webui
spec:
replicas: 1
selector:
matchLabels:
app: vllm-verify
template:
metadata:
labels:
app: vllm-verify
spec:
containers:
- name: vllm-container
  image: vllm/vllm-openai:latest
  env:
  - name: NVIDIA_VISIBLE_DEVICES
    Use the specific UUID for Device 1
    value: "MIG-12ea3b46-39a7-5d8c-9dd6-7d8482d5e3f8"
  - name: NVIDIA_DRIVER_CAPABILITIES
    value: "all"
    args: [
    "--model", "unsloth/Llama-3.2-3B-Instruct",
    "--max-model-len", "4096",
    "--gpu-memory-utilization", "0.9"
    ]
    resources:
    limits:
    nvidia.com/gpu: "1"
    requests:
    nvidia.com/gpu: "1"
query results

[core@xxxxxxxxxx]$ curl http://10.128.1.92:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "unsloth/Llama-3.2-3B-Instruct",
"messages": [
{"role": "user", "content": "List the top 30 countries by population."}
],
"max_tokens": 1000
}'
{"id":"chatcmpl-88d9bbbe3705f88e","object":"chat.completion","created":1774982640,"model":"unsloth/Llama-3.2-3B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Here's a list of the top 30 countries by population based on available data up to 2023:\n\n1. China - 1.449 billion\n2. India - 1.393 billion\n3. United States - 334 million\n4. Indonesia - 283 million\n5. Pakistan - 229 million\n6. Brazil - 215 million\n7. Nigeria - 213 million\n8. Bangladesh - 166 million\n9. Russia - 146 million\n10. Japan - 128 million\n11. Mexico - 127 million\n12. Ethiopia - 125 million\n13. Philippines - 115 million\n14. Vietnam - 99 million\n15. Egypt - 98 million\n16. Democratic Republic of the Congo - 92 million\n17. Turkey - 84 million\n18. Iran - 83 million\n19. Thailand - 69 million\n20. Germany - 68 million\n21. South Korea - 68 million\n22. Iran - 67 million\n23. Italy - 64 million\n24. France - 63 million\n25. United Kingdom - 62 million\n26. Myanmar - 58 million\n27. Tanzania - 58 million\n28. Kenya - 56 million\n29. Algeria - 55 million\n30. Colombia - 54 million","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":44,"total_tokens":317,"completion_tokens":273,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}[core@quanta-tmo-4500-cran mig]$

@g21chen commented on GitHub (Mar 31, 2026): F.Y.I. it works with vLLM instead of ollama in the same server with same mig cfg 1. pod spec with vLLM [core@xxxxx]$ cat vll-model-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: vllm-verify namespace: open-webui spec: replicas: 1 selector: matchLabels: app: vllm-verify template: metadata: labels: app: vllm-verify spec: containers: - name: vllm-container image: vllm/vllm-openai:latest env: - name: NVIDIA_VISIBLE_DEVICES # Use the specific UUID for Device 1 value: "MIG-12ea3b46-39a7-5d8c-9dd6-7d8482d5e3f8" - name: NVIDIA_DRIVER_CAPABILITIES value: "all" args: [ "--model", "unsloth/Llama-3.2-3B-Instruct", "--max-model-len", "4096", "--gpu-memory-utilization", "0.9" ] resources: limits: nvidia.com/gpu: "1" requests: nvidia.com/gpu: "1" 2. query results [core@xxxxxxxxxx]$ curl http://10.128.1.92:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "unsloth/Llama-3.2-3B-Instruct", "messages": [ {"role": "user", "content": "List the top 30 countries by population."} ], "max_tokens": 1000 }' {"id":"chatcmpl-88d9bbbe3705f88e","object":"chat.completion","created":1774982640,"model":"unsloth/Llama-3.2-3B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Here's a list of the top 30 countries by population based on available data up to 2023:\n\n1. China - 1.449 billion\n2. India - 1.393 billion\n3. United States - 334 million\n4. Indonesia - 283 million\n5. Pakistan - 229 million\n6. Brazil - 215 million\n7. Nigeria - 213 million\n8. Bangladesh - 166 million\n9. Russia - 146 million\n10. Japan - 128 million\n11. Mexico - 127 million\n12. Ethiopia - 125 million\n13. Philippines - 115 million\n14. Vietnam - 99 million\n15. Egypt - 98 million\n16. Democratic Republic of the Congo - 92 million\n17. Turkey - 84 million\n18. Iran - 83 million\n19. Thailand - 69 million\n20. Germany - 68 million\n21. South Korea - 68 million\n22. Iran - 67 million\n23. Italy - 64 million\n24. France - 63 million\n25. United Kingdom - 62 million\n26. Myanmar - 58 million\n27. Tanzania - 58 million\n28. Kenya - 56 million\n29. Algeria - 55 million\n30. Colombia - 54 million","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":44,"total_tokens":317,"completion_tokens":273,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}[core@quanta-tmo-4500-cran mig]$

GiteaMirror commented

2026-04-12 22:35:59 -05:00

@g21chen commented on GitHub (Apr 3, 2026):

may I ask if any update? Thanks!

@g21chen commented on GitHub (Apr 3, 2026): may I ask if any update? Thanks!

GiteaMirror referenced this issue

2026-04-22 13:03:42 -05:00

[GH-ISSUE #9716] server panic when run gemma-3-27b-it-GGUF #32109

GiteaMirror referenced this issue

2026-04-29 01:13:31 -05:00

[GH-ISSUE #9716] server panic when run gemma-3-27b-it-GGUF #52861

GiteaMirror referenced this issue

2026-05-04 13:51:07 -05:00

[GH-ISSUE #9716] server panic when run gemma-3-27b-it-GGUF #68406

GiteaMirror referenced this issue

2026-05-09 19:45:38 -05:00

[GH-ISSUE #9716] server panic when run gemma-3-27b-it-GGUF #84034

Sign in to join this conversation.

Branches Tags

main

parth-update-hermes-launch

parth-agent-system-prompt-cwd

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-fix-claude-model-picker

parth-api-status-context-length

docs/vscode-extension-setup

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#9716