[GH-ISSUE #15178] Not able to run ollama on top of RTX PRO 4500 Server Edition + GPU mig enabled + driver 590.48.01 + Cuda13.1 #9716

Open
opened 2026-04-12 22:35:53 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @g21chen on GitHub (Mar 31, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15178

What is the issue?

Not able to run ollama on top of RTX PRO 4500 Server Edition + GPU mig enabled + driver 590.48.01 + Cuda13.1

the error logs:

In host server side

[root@nvidia-driver-daemonset-9 drivers]# nvidia-smi
Tue Mar 31 16:50:24 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 4500 Blac... On | 00000000:0B:00.0 Off | On |
| N/A 29C P8 17W / 165W | 116MiB / 32623MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Shared Memory-Usage | Vol| Shared |
| ID ID Dev | Shared BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 1 0 0 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 0 2 0 1 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

in the ollama pod

root@ollama-0:/# root@ollama-0:/# nvidia-smi
Tue Mar 31 16:52:04 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 4500 Blac... On | 00000000:0B:00.0 Off | On |
| N/A 29C P8 17W / 165W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Shared Memory-Usage | Vol| Shared |
| ID ID Dev | Shared BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 1 0 0 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

pod spec to deploy ollama

[core@quanta-tmo-4500-cran mig]$ cat ollama-deployment.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ollama
namespace: open-webui
spec:
serviceName: "ollama"
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:

name: ollama
image: docker.io/ollama/ollama:latest
imagePullPolicy: IfNotPresent
env:
- name: OLLAMA_DEBUG
value: "1"
ports:
containerPort: 11434
resources:
requests:
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"

the GPU resource pools in host side (MIG policy is "single")

[core@XXXX]$ oc describe node |grep -i "nvidia.com/gpu:" -B 8
Capacity:
cpu: 128
ephemeral-storage: 1873933640Ki
hugepages-1Gi: 50Gi
hugepages-2Mi: 0
memory: 263630500Ki
nvidia.com/gpu: 2

pod is running

[core@xxxx]$ oc get pods -n open-webui
NAME READY STATUS RESTARTS AGE
ollama-0 1/1 Running 0 61s

error logs.

[core@xxxxx]$ oc logs ollama-0 -n open-webui
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDfkDTv/WVAHAEBTB3xxg80OJRJCi/IsOMNJUfVSyn+f

time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1742 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1744 msg="Ollama cloud disabled: false"
time=2026-03-31T16:54:49.030Z level=INFO source=images.go:477 msg="total blobs: 0"
time=2026-03-31T16:54:49.030Z level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1800 msg="Listening on [::]:11434 (version 0.19.0)"
time=2026-03-31T16:54:49.030Z level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-03-31T16:54:49.031Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-31T16:54:49.033Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42887"
time=2026-03-31T16:54:49.033Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12
time=2026-03-31T16:54:49.413Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=382.320152ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]
time=2026-03-31T16:54:49.414Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38815"
time=2026-03-31T16:54:49.414Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13
time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=245.184387ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]
time=2026-03-31T16:54:49.659Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1"
time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2
time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0
time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0
time=2026-03-31T16:54:49.659Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34101"
time=2026-03-31T16:54:49.659Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02
time=2026-03-31T16:54:49.659Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34747"
time=2026-03-31T16:54:49.659Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02
time=2026-03-31T16:54:49.758Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=99.18846ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T16:54:49.758Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v12 pci_id=0000:0b:00.0 library=CUDA
time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=105.845603ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v13 pci_id=0000:0b:00.0 library=CUDA
time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=733.993471ms
time=2026-03-31T16:54:49.765Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="251.4 GiB" available="251.3 GiB"
time=2026-03-31T16:54:49.765Z level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @g21chen on GitHub (Mar 31, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15178 ### What is the issue? Not able to run ollama on top of RTX PRO 4500 Server Edition + GPU mig enabled + driver 590.48.01 + Cuda13.1 the error logs: In host server side ``` [root@nvidia-driver-daemonset-9 drivers]# nvidia-smi Tue Mar 31 16:50:24 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA RTX PRO 4500 Blac... On | 00000000:0B:00.0 Off | On | | N/A 29C P8 17W / 165W | 116MiB / 32623MiB | N/A Default | | | | Enabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------------------+-----------+-----------------------+ | GPU GI CI MIG | Shared Memory-Usage | Vol| Shared | | ID ID Dev | Shared BAR1-Usage | SM Unc| CE ENC DEC OFA JPG | | | | ECC| | |==================+==================================+===========+=======================| | 0 1 0 0 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 | | | 0MiB / 16383MiB | | | +------------------+----------------------------------+-----------+-----------------------+ | 0 2 0 1 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 | | | 0MiB / 16383MiB | | | +------------------+----------------------------------+-----------+-----------------------+ ``` in the ollama pod ``` root@ollama-0:/# root@ollama-0:/# nvidia-smi Tue Mar 31 16:52:04 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA RTX PRO 4500 Blac... On | 00000000:0B:00.0 Off | On | | N/A 29C P8 17W / 165W | N/A | N/A Default | | | | Enabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------------------+-----------+-----------------------+ | GPU GI CI MIG | Shared Memory-Usage | Vol| Shared | | ID ID Dev | Shared BAR1-Usage | SM Unc| CE ENC DEC OFA JPG | | | | ECC| | |==================+==================================+===========+=======================| | 0 1 0 0 | 58MiB / 16032MiB | 42 0 | 1 1 1 0 1 | | | 0MiB / 16383MiB | | | +------------------+----------------------------------+-----------+-----------------------+ ``` pod spec to deploy ollama ``` [core@quanta-tmo-4500-cran mig]$ cat ollama-deployment.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: ollama namespace: open-webui spec: serviceName: "ollama" selector: matchLabels: app: ollama template: metadata: labels: app: ollama spec: containers: name: ollama image: docker.io/ollama/ollama:latest imagePullPolicy: IfNotPresent env: - name: OLLAMA_DEBUG value: "1" ports: containerPort: 11434 resources: requests: nvidia.com/gpu: "1" limits: nvidia.com/gpu: "1" ``` the GPU resource pools in host side (MIG policy is "single") ``` [core@XXXX]$ oc describe node |grep -i "nvidia.com/gpu:" -B 8 Capacity: cpu: 128 ephemeral-storage: 1873933640Ki hugepages-1Gi: 50Gi hugepages-2Mi: 0 memory: 263630500Ki nvidia.com/gpu: 2 ``` pod is running ``` [core@xxxx]$ oc get pods -n open-webui NAME READY STATUS RESTARTS AGE ollama-0 1/1 Running 0 61s ``` error logs. ``` [core@xxxxx]$ oc logs ollama-0 -n open-webui Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. Your new public key is: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDfkDTv/WVAHAEBTB3xxg80OJRJCi/IsOMNJUfVSyn+f time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1742 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1744 msg="Ollama cloud disabled: false" time=2026-03-31T16:54:49.030Z level=INFO source=images.go:477 msg="total blobs: 0" time=2026-03-31T16:54:49.030Z level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-31T16:54:49.030Z level=INFO source=routes.go:1800 msg="Listening on [::]:11434 (version 0.19.0)" time=2026-03-31T16:54:49.030Z level=DEBUG source=sched.go:145 msg="starting llm scheduler" time=2026-03-31T16:54:49.031Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-31T16:54:49.033Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42887" time=2026-03-31T16:54:49.033Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 time=2026-03-31T16:54:49.413Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=382.320152ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] time=2026-03-31T16:54:49.414Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38815" time=2026-03-31T16:54:49.414Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=245.184387ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] time=2026-03-31T16:54:49.659Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2 time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0 time=2026-03-31T16:54:49.659Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0 time=2026-03-31T16:54:49.659Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34101" time=2026-03-31T16:54:49.659Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 time=2026-03-31T16:54:49.659Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34747" time=2026-03-31T16:54:49.659Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 time=2026-03-31T16:54:49.758Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=99.18846ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T16:54:49.758Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v12 pci_id=0000:0b:00.0 library=CUDA time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=105.845603ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v13 pci_id=0000:0b:00.0 library=CUDA time=2026-03-31T16:54:49.765Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=733.993471ms time=2026-03-31T16:54:49.765Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="251.4 GiB" available="251.3 GiB" time=2026-03-31T16:54:49.765Z level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 ``` ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 22:35:53 -05:00
Author
Owner

@g21chen commented on GitHub (Mar 31, 2026):

add more logs in ollama pod

root@ollama-0:/# nvidia-smi -L
GPU 0: NVIDIA RTX PRO 4500 Blackwell Server Edition (UUID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02)
  MIG 1g.16gb     Device  0: (UUID: MIG-fc57ef1d-199f-5ca5-91fa-d48bb00d3467)
root@ollama-0:/#
<!-- gh-comment-id:4164174556 --> @g21chen commented on GitHub (Mar 31, 2026): add more logs in ollama pod ``` root@ollama-0:/# nvidia-smi -L GPU 0: NVIDIA RTX PRO 4500 Blackwell Server Edition (UUID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02) MIG 1g.16gb Device 0: (UUID: MIG-fc57ef1d-199f-5ca5-91fa-d48bb00d3467) root@ollama-0:/# ```
Author
Owner

@rick-github commented on GitHub (Mar 31, 2026):

Set OLLAMA_DEBUG=2 in the environment of the ollama server, restart it and post the log.

<!-- gh-comment-id:4164527867 --> @rick-github commented on GitHub (Mar 31, 2026): Set `OLLAMA_DEBUG=2` in the environment of the ollama server, restart it and post the log.
Author
Owner

@g21chen commented on GitHub (Mar 31, 2026):

Hi Rick

Thanks for the quick response. logs are pasted below:

[core@xxxxx]$ oc logs ollama-0 -n open-webui
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJcD1FiWHjKDIts9zuGUVwhSHN++ahXD188YeHJ6Ro0J

time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1742 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1744 msg="Ollama cloud disabled: false"
time=2026-03-31T18:21:56.150Z level=INFO source=images.go:477 msg="total blobs: 0"
time=2026-03-31T18:21:56.150Z level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1800 msg="Listening on [::]:11434 (version 0.19.0)"
time=2026-03-31T18:21:56.151Z level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-03-31T18:21:56.152Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-31T18:21:56.152Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extraEnvs=map[]
time=2026-03-31T18:21:56.153Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36249"
time=2026-03-31T18:21:56.153Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12
time=2026-03-31T18:21:56.169Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-31T18:21:56.169Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:36249"
time=2026-03-31T18:21:56.174Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string
time=2026-03-31T18:21:56.174Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-31T18:21:56.174Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-03-31T18:21:56.180Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb, compute capability 12.0, VMM: yes, ID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-03-31T18:21:56.351Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-03-31T18:21:56.351Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=177.508363ms
time=2026-03-31T18:21:56.473Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=121.404308ms
time=2026-03-31T18:21:56.473Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices="[{DeviceID:{ID:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 Library:CUDA} Name:CUDA0 Description:NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb FilterID: Integrated:false PCIID:0000:0b:00.0 TotalMemory:16810770432 FreeMemory:16614490112 ComputeMajor:12 ComputeMinor:0 DriverMajor:13 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/cuda_v12]}]"
time=2026-03-31T18:21:56.473Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=320.951467ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]
time=2026-03-31T18:21:56.473Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extraEnvs=map[]
time=2026-03-31T18:21:56.473Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43961"
time=2026-03-31T18:21:56.473Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13
time=2026-03-31T18:21:56.488Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-31T18:21:56.489Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:43961"
time=2026-03-31T18:21:56.495Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string
time=2026-03-31T18:21:56.495Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-31T18:21:56.495Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-03-31T18:21:56.500Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb, compute capability 12.0, VMM: yes, ID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so
time=2026-03-31T18:21:56.607Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-03-31T18:21:56.607Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=112.192034ms
time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=104.541956ms
time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" devices="[{DeviceID:{ID:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 Library:CUDA} Name:CUDA0 Description:NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb FilterID: Integrated:false PCIID:0000:0b:00.0 TotalMemory:16810770432 FreeMemory:16614490112 ComputeMajor:12 ComputeMinor:0 DriverMajor:13 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/cuda_v13]}]"
time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=238.913755ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]
time=2026-03-31T18:21:56.712Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2
time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0
time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0
time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extraEnvs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extraEnvs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T18:21:56.712Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41427"
time=2026-03-31T18:21:56.712Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT=1
time=2026-03-31T18:21:56.713Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43293"
time=2026-03-31T18:21:56.713Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT=1
time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:41427"
time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-31T18:21:56.726Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:43293"
time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string
time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
time=2026-03-31T18:21:56.734Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-31T18:21:56.734Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-03-31T18:21:56.739Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13
time=2026-03-31T18:21:56.739Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12
ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-03-31T18:21:56.810Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-03-31T18:21:56.811Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=77.553908ms
time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=882ns
time=2026-03-31T18:21:56.812Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices=[]
time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=99.5831ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v12 pci_id=0000:0b:00.0 library=CUDA
ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so
time=2026-03-31T18:21:56.814Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-03-31T18:21:56.814Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=81.982831ms
time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=726ns
time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" devices=[]
time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=104.013646ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]"
time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v13 pci_id=0000:0b:00.0 library=CUDA
time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[]
time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:183 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 pci_id=0000:0b:00.0
time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:183 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 pci_id=0000:0b:00.0
time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=665.6487ms
time=2026-03-31T18:21:56.816Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="251.4 GiB" available="251.2 GiB"
time=2026-03-31T18:21:56.816Z level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096
[core@quanta-tmo-4500-cran mig]$

Gang

<!-- gh-comment-id:4164552707 --> @g21chen commented on GitHub (Mar 31, 2026): Hi Rick Thanks for the quick response. logs are pasted below: ``` [core@xxxxx]$ oc logs ollama-0 -n open-webui Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. Your new public key is: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJcD1FiWHjKDIts9zuGUVwhSHN++ahXD188YeHJ6Ro0J time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1742 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1744 msg="Ollama cloud disabled: false" time=2026-03-31T18:21:56.150Z level=INFO source=images.go:477 msg="total blobs: 0" time=2026-03-31T18:21:56.150Z level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-31T18:21:56.150Z level=INFO source=routes.go:1800 msg="Listening on [::]:11434 (version 0.19.0)" time=2026-03-31T18:21:56.151Z level=DEBUG source=sched.go:145 msg="starting llm scheduler" time=2026-03-31T18:21:56.152Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-31T18:21:56.152Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extraEnvs=map[] time=2026-03-31T18:21:56.153Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36249" time=2026-03-31T18:21:56.153Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 time=2026-03-31T18:21:56.169Z level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-31T18:21:56.169Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:36249" time=2026-03-31T18:21:56.174Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string time=2026-03-31T18:21:56.174Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-31T18:21:56.174Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-03-31T18:21:56.174Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-03-31T18:21:56.180Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb, compute capability 12.0, VMM: yes, ID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02 load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so time=2026-03-31T18:21:56.351Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-03-31T18:21:56.351Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-03-31T18:21:56.351Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=177.508363ms time=2026-03-31T18:21:56.473Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=121.404308ms time=2026-03-31T18:21:56.473Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices="[{DeviceID:{ID:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 Library:CUDA} Name:CUDA0 Description:NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb FilterID: Integrated:false PCIID:0000:0b:00.0 TotalMemory:16810770432 FreeMemory:16614490112 ComputeMajor:12 ComputeMinor:0 DriverMajor:13 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/cuda_v12]}]" time=2026-03-31T18:21:56.473Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=320.951467ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] time=2026-03-31T18:21:56.473Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extraEnvs=map[] time=2026-03-31T18:21:56.473Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43961" time=2026-03-31T18:21:56.473Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 time=2026-03-31T18:21:56.488Z level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-31T18:21:56.489Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:43961" time=2026-03-31T18:21:56.495Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string time=2026-03-31T18:21:56.495Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-31T18:21:56.495Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-03-31T18:21:56.495Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-03-31T18:21:56.500Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb, compute capability 12.0, VMM: yes, ID: GPU-01991e90-02de-d9b8-d852-cd5265aefb02 load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so time=2026-03-31T18:21:56.607Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-03-31T18:21:56.607Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-03-31T18:21:56.607Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=112.192034ms time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=104.541956ms time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" devices="[{DeviceID:{ID:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 Library:CUDA} Name:CUDA0 Description:NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb FilterID: Integrated:false PCIID:0000:0b:00.0 TotalMemory:16810770432 FreeMemory:16614490112 ComputeMajor:12 ComputeMinor:0 DriverMajor:13 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/cuda_v13]}]" time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=238.913755ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] time=2026-03-31T18:21:56.712Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2 time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0 time=2026-03-31T18:21:56.712Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 pci_id=0000:0b:00.0 time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extraEnvs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T18:21:56.712Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extraEnvs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T18:21:56.712Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41427" time=2026-03-31T18:21:56.712Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT=1 time=2026-03-31T18:21:56.713Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43293" time=2026-03-31T18:21:56.713Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 CUDA_VISIBLE_DEVICES=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT=1 time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:41427" time=2026-03-31T18:21:56.725Z level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-31T18:21:56.726Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:43293" time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-31T18:21:56.734Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string time=2026-03-31T18:21:56.734Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-31T18:21:56.734Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-03-31T18:21:56.734Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-03-31T18:21:56.739Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13 time=2026-03-31T18:21:56.739Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12 ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so time=2026-03-31T18:21:56.810Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-03-31T18:21:56.811Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-03-31T18:21:56.811Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=77.553908ms time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=882ns time=2026-03-31T18:21:56.812Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices=[] time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=99.5831ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T18:21:56.812Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v12 pci_id=0000:0b:00.0 library=CUDA ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so time=2026-03-31T18:21:56.814Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-03-31T18:21:56.814Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-31T18:21:56.815Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-03-31T18:21:56.816Z level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=81.982831ms time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=726ns time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" devices=[] time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=104.013646ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GGML_CUDA_INIT:1]" time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=GPU-01991e90-02de-d9b8-d852-cd5265aefb02 libdir=/usr/lib/ollama/cuda_v13 pci_id=0000:0b:00.0 library=CUDA time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[] time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:183 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama/cuda_v12 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 pci_id=0000:0b:00.0 time=2026-03-31T18:21:56.816Z level=TRACE source=runner.go:183 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama/cuda_v13 description="NVIDIA RTX PRO 4500 Blackwell Server Edition MIG 1g.16gb" compute=12.0 pci_id=0000:0b:00.0 time=2026-03-31T18:21:56.816Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=665.6487ms time=2026-03-31T18:21:56.816Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="251.4 GiB" available="251.2 GiB" time=2026-03-31T18:21:56.816Z level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 [core@quanta-tmo-4500-cran mig]$ ``` Gang
Author
Owner

@rick-github commented on GitHub (Mar 31, 2026):

What's the output of

nvidia-smi -q
<!-- gh-comment-id:4164585409 --> @rick-github commented on GitHub (Mar 31, 2026): What's the output of ``` nvidia-smi -q ```
Author
Owner

@g21chen commented on GitHub (Mar 31, 2026):

output with executing command in ollama pod.

root@ollama-0:/# nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                              : Tue Mar 31 18:31:27 2026
Driver Version                                         : 590.48.01
CUDA Version                                           : 13.1

Attached GPUs                                          : 1
GPU 00000000:0B:00.0
    Product Name                                       : NVIDIA RTX PRO 4500 Blackwell Server Edition
    Product Brand                                      : NVIDIA
    Product Architecture                               : Blackwell
    Display Mode                                       : Requested functionality has been deprecated
    Display Attached                                   : No
    Display Active                                     : Disabled
    Persistence Mode                                   : Enabled
    Addressing Mode                                    : HMM
    MIG Mode
        Current                                        : Enabled
        Pending                                        : Enabled
    MIG Device
        Index                                          : 0
        GPU Instance ID                                : 1
        Compute Instance ID                            : 0
        Device Attributes
            Shared
                Multiprocessor count                   : 42
                Copy Engine count                      : 1
                Encoder count                          : 1
                Decoder count                          : 1
                OFA count                              : 0
                JPG count                              : 1
        ECC Errors
            Volatile
                SRAM Uncorrectable                     : 0
        Shared FB Memory Usage
            Total                                      : 16032 MiB
            Reserved                                   : 0 MiB
            Used                                       : 58 MiB
            Free                                       : 15975 MiB
        Shared BAR1 Memory
            Total                                      : 16383 MiB
            Used                                       : 0 MiB
            Free                                       : 16383 MiB
    Accounting Mode                                    : Disabled
    Accounting Mode Buffer Size                        : 4000
    Driver Model
        Current                                        : N/A
        Pending                                        : N/A
    Serial Number                                      : 1795125089725
    GPU UUID                                           : GPU-01991e90-02de-d9b8-d852-cd5265aefb02
    GPU PDI                                            : 0x648fae272cf1d560
    Minor Number                                       : 0
    VBIOS Version                                      : 98.03.B1.00.01
    MultiGPU Board                                     : No
    Board ID                                           : 0xb00
    Board Part Number                                  : 900-2G147-0000-000
    GPU Part Number                                    : 2C3A-895-A1
    FRU Part Number                                    : N/A
    Platform Info
        Chassis Serial Number                          :
        Slot Number                                    : 0
        Tray Index                                     : 0
        Host ID                                        : 1
        Peer Type                                      : Direct Connected
        Module Id                                      : 1
        GPU Fabric GUID                                : 0x0000000000000000
    Inforom Version
        Image Version                                  : G147.0210.00.01
        OEM Object                                     : 2.1
        ECC Object                                     : 7.16
        Power Management Object                        : N/A
    Inforom BBX Object Flush
        Latest Timestamp                               : 2026/03/30 19:01:47.841
        Latest Duration                                : 48934 us
    GPU Operation Mode
        Current                                        : N/A
        Pending                                        : N/A
    GPU C2C Mode                                       : Disabled
    GPU Virtualization Mode
        Virtualization Mode                            : None
        Host VGPU Mode                                 : N/A
        vGPU Heterogeneous Mode                        : N/A
    GPU Recovery Action                                : None
    GSP Firmware Version                               : 590.48.01
    IBMNPU
        Relaxed Ordering Mode                          : N/A
    PCI
        Bus                                            : 0x0B
        Device                                         : 0x00
        Domain                                         : 0x0000
        Base Classcode                                 : 0x3
        Sub Classcode                                  : 0x2
        Device Id                                      : 0x2C3A10DE
        Bus Id                                         : 00000000:0B:00.0
        Sub System Id                                  : 0x21F410DE
        GPU Link Info
            PCIe Generation
                Max                                    : 5
                Current                                : 1
                Device Current                         : 1
                Device Max                             : 5
                Host Max                               : 5
            Link Width
                Max                                    : 16x
                Current                                : 16x
        Bridge Chip
            Type                                       : N/A
            Firmware                                   : N/A
        Replays Since Reset                            : 0
        Replay Number Rollovers                        : 0
        Tx Throughput                                  : 481 KB/s
        Rx Throughput                                  : 426 KB/s
        Atomic Caps Outbound                           : N/A
        Atomic Caps Inbound                            : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64
    Fan Speed                                          : N/A
    Performance State                                  : P8
    Clocks Event Reasons
        Idle                                           : Active
        Applications Clocks Setting                    : Not Active
        SW Power Cap                                   : Not Active
        HW Slowdown                                    : Not Active
            HW Thermal Slowdown                        : Not Active
            HW Power Brake Slowdown                    : Not Active
        Sync Boost                                     : Not Active
        SW Thermal Slowdown                            : Not Active
        Display Clock Setting                          : Not Active
    Clocks Event Reasons Counters
        SW Power Capping                               : 384727561 us
        Sync Boost                                     : 0 us
        SW Thermal Slowdown                            : 0 us
        HW Thermal Slowdown                            : 0 us
        HW Power Braking                               : 0 us
    Sparse Operation Mode                              : N/A
    FB Memory Usage
        Total                                          : Insufficient Permissions
        Reserved                                       : Insufficient Permissions
        Used                                           : Insufficient Permissions
        Free                                           : Insufficient Permissions
    BAR1 Memory Usage
        Total                                          : Insufficient Permissions
        Used                                           : Insufficient Permissions
        Free                                           : Insufficient Permissions
    Conf Compute Protected Memory Usage
        Total                                          : Insufficient Permissions
        Used                                           : Insufficient Permissions
        Free                                           : Insufficient Permissions
    Compute Mode                                       : Default
    Utilization
        GPU                                            : N/A
        Memory                                         : N/A
        Encoder                                        : N/A
        Decoder                                        : N/A
        JPEG                                           : N/A
        OFA                                            : N/A
    Encoder Stats
        Active Sessions                                : 0
        Average FPS                                    : 0
        Average Latency                                : 0
    FBC Stats
        Active Sessions                                : 0
        Average FPS                                    : 0
        Average Latency                                : 0
    DRAM Encryption Mode
        Current                                        : Disabled
        Pending                                        : Disabled
    ECC Mode
        Current                                        : Enabled
        Pending                                        : Enabled
    ECC Errors
        Volatile
            SRAM Correctable                           : 0
            SRAM Uncorrectable Parity                  : 0
            SRAM Uncorrectable SEC-DED                 : 0
            DRAM Correctable                           : 0
            DRAM Uncorrectable                         : 0
        Aggregate
            SRAM Correctable                           : 0
            SRAM Uncorrectable Parity                  : 0
            SRAM Uncorrectable SEC-DED                 : 0
            DRAM Correctable                           : 0
            DRAM Uncorrectable                         : 0
            SRAM Threshold Exceeded                    : No
        Aggregate Uncorrectable SRAM Sources
            SRAM L2                                    : 0
            SRAM SM                                    : 0
            SRAM Microcontroller                       : 0
            SRAM PCIE                                  : 0
            SRAM Other                                 : 0
        Channel Repair Pending                         : No
        TPC Repair Pending                             : No
        Unrepairable Memory                            : No
    Retired Pages
        Single Bit ECC                                 : N/A
        Double Bit ECC                                 : N/A
        Pending Page Blacklist                         : N/A
    Remapped Rows                                      : N/A
    Temperature
        GPU Current Temp                               : 34 C
        GPU T.Limit Temp                               : 58 C
        GPU Shutdown T.Limit Temp                      : -5 C
        GPU Slowdown T.Limit Temp                      : -2 C
        GPU Max Operating T.Limit Temp                 : 0 C
        GPU Target Temperature                         : N/A
        Memory Current Temp                            : N/A
        Memory Max Operating T.Limit Temp              : N/A
    GPU Power Readings
        Average Power Draw                             : 17.86 W
        Instantaneous Power Draw                       : 17.27 W
        Current Power Limit                            : 165.00 W
        Requested Power Limit                          : 165.00 W
        Default Power Limit                            : 165.00 W
        Min Power Limit                                : 100.00 W
        Max Power Limit                                : 165.00 W
    GPU Memory Power Readings
        Average Power Draw                             : N/A
        Instantaneous Power Draw                       : N/A
    Module Power Readings
        Average Power Draw                             : N/A
        Instantaneous Power Draw                       : N/A
        Current Power Limit                            : N/A
        Requested Power Limit                          : N/A
        Default Power Limit                            : N/A
        Min Power Limit                                : N/A
        Max Power Limit                                : N/A
    Power Smoothing                                    : N/A
    Workload Power Profiles
        Requested Profiles                             : N/A
        Enforced Profiles                              : N/A
    EDPp Multiplier                                    : N/A
    Clocks
        Graphics                                       : 180 MHz
        SM                                             : 180 MHz
        Memory                                         : 405 MHz
        Video                                          : 600 MHz
    Applications Clocks
        Graphics                                       : Requested functionality has been deprecated
        Memory                                         : Requested functionality has been deprecated
    Default Applications Clocks
        Graphics                                       : Requested functionality has been deprecated
        Memory                                         : Requested functionality has been deprecated
    Deferred Clocks
        Memory                                         : N/A
    Max Clocks
        Graphics                                       : 2415 MHz
        SM                                             : 2415 MHz
        Memory                                         : 12501 MHz
        Video                                          : 2100 MHz
    Max Customer Boost Clocks
        Graphics                                       : 2415 MHz
    Clock Policy
        Auto Boost                                     : N/A
        Auto Boost Default                             : N/A
    Fabric
        State                                          : N/A
        Status                                         : N/A
        CliqueId                                       : N/A
        ClusterUUID                                    : N/A
        Health
            Summary                                    : N/A
            Bandwidth                                  : N/A
            Route Recovery in progress                 : N/A
            Route Unhealthy                            : N/A
            Access Timeout Recovery                    : N/A
            Incorrect Configuration                    : N/A
    Processes                                          : None
    Capabilities
        EGM                                            : disabled

root@ollama-0:/#
<!-- gh-comment-id:4164606050 --> @g21chen commented on GitHub (Mar 31, 2026): output with executing command in ollama pod. ``` root@ollama-0:/# nvidia-smi -q ==============NVSMI LOG============== Timestamp : Tue Mar 31 18:31:27 2026 Driver Version : 590.48.01 CUDA Version : 13.1 Attached GPUs : 1 GPU 00000000:0B:00.0 Product Name : NVIDIA RTX PRO 4500 Blackwell Server Edition Product Brand : NVIDIA Product Architecture : Blackwell Display Mode : Requested functionality has been deprecated Display Attached : No Display Active : Disabled Persistence Mode : Enabled Addressing Mode : HMM MIG Mode Current : Enabled Pending : Enabled MIG Device Index : 0 GPU Instance ID : 1 Compute Instance ID : 0 Device Attributes Shared Multiprocessor count : 42 Copy Engine count : 1 Encoder count : 1 Decoder count : 1 OFA count : 0 JPG count : 1 ECC Errors Volatile SRAM Uncorrectable : 0 Shared FB Memory Usage Total : 16032 MiB Reserved : 0 MiB Used : 58 MiB Free : 15975 MiB Shared BAR1 Memory Total : 16383 MiB Used : 0 MiB Free : 16383 MiB Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1795125089725 GPU UUID : GPU-01991e90-02de-d9b8-d852-cd5265aefb02 GPU PDI : 0x648fae272cf1d560 Minor Number : 0 VBIOS Version : 98.03.B1.00.01 MultiGPU Board : No Board ID : 0xb00 Board Part Number : 900-2G147-0000-000 GPU Part Number : 2C3A-895-A1 FRU Part Number : N/A Platform Info Chassis Serial Number : Slot Number : 0 Tray Index : 0 Host ID : 1 Peer Type : Direct Connected Module Id : 1 GPU Fabric GUID : 0x0000000000000000 Inforom Version Image Version : G147.0210.00.01 OEM Object : 2.1 ECC Object : 7.16 Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : 2026/03/30 19:01:47.841 Latest Duration : 48934 us GPU Operation Mode Current : N/A Pending : N/A GPU C2C Mode : Disabled GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A vGPU Heterogeneous Mode : N/A GPU Recovery Action : None GSP Firmware Version : 590.48.01 IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x0B Device : 0x00 Domain : 0x0000 Base Classcode : 0x3 Sub Classcode : 0x2 Device Id : 0x2C3A10DE Bus Id : 00000000:0B:00.0 Sub System Id : 0x21F410DE GPU Link Info PCIe Generation Max : 5 Current : 1 Device Current : 1 Device Max : 5 Host Max : 5 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 481 KB/s Rx Throughput : 426 KB/s Atomic Caps Outbound : N/A Atomic Caps Inbound : FETCHADD_32 FETCHADD_64 SWAP_32 SWAP_64 CAS_32 CAS_64 Fan Speed : N/A Performance State : P8 Clocks Event Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active Clocks Event Reasons Counters SW Power Capping : 384727561 us Sync Boost : 0 us SW Thermal Slowdown : 0 us HW Thermal Slowdown : 0 us HW Power Braking : 0 us Sparse Operation Mode : N/A FB Memory Usage Total : Insufficient Permissions Reserved : Insufficient Permissions Used : Insufficient Permissions Free : Insufficient Permissions BAR1 Memory Usage Total : Insufficient Permissions Used : Insufficient Permissions Free : Insufficient Permissions Conf Compute Protected Memory Usage Total : Insufficient Permissions Used : Insufficient Permissions Free : Insufficient Permissions Compute Mode : Default Utilization GPU : N/A Memory : N/A Encoder : N/A Decoder : N/A JPEG : N/A OFA : N/A Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 DRAM Encryption Mode Current : Disabled Pending : Disabled ECC Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable Parity : 0 SRAM Uncorrectable SEC-DED : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 SRAM Threshold Exceeded : No Aggregate Uncorrectable SRAM Sources SRAM L2 : 0 SRAM SM : 0 SRAM Microcontroller : 0 SRAM PCIE : 0 SRAM Other : 0 Channel Repair Pending : No TPC Repair Pending : No Unrepairable Memory : No Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 34 C GPU T.Limit Temp : 58 C GPU Shutdown T.Limit Temp : -5 C GPU Slowdown T.Limit Temp : -2 C GPU Max Operating T.Limit Temp : 0 C GPU Target Temperature : N/A Memory Current Temp : N/A Memory Max Operating T.Limit Temp : N/A GPU Power Readings Average Power Draw : 17.86 W Instantaneous Power Draw : 17.27 W Current Power Limit : 165.00 W Requested Power Limit : 165.00 W Default Power Limit : 165.00 W Min Power Limit : 100.00 W Max Power Limit : 165.00 W GPU Memory Power Readings Average Power Draw : N/A Instantaneous Power Draw : N/A Module Power Readings Average Power Draw : N/A Instantaneous Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Power Smoothing : N/A Workload Power Profiles Requested Profiles : N/A Enforced Profiles : N/A EDPp Multiplier : N/A Clocks Graphics : 180 MHz SM : 180 MHz Memory : 405 MHz Video : 600 MHz Applications Clocks Graphics : Requested functionality has been deprecated Memory : Requested functionality has been deprecated Default Applications Clocks Graphics : Requested functionality has been deprecated Memory : Requested functionality has been deprecated Deferred Clocks Memory : N/A Max Clocks Graphics : 2415 MHz SM : 2415 MHz Memory : 12501 MHz Video : 2100 MHz Max Customer Boost Clocks Graphics : 2415 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Fabric State : N/A Status : N/A CliqueId : N/A ClusterUUID : N/A Health Summary : N/A Bandwidth : N/A Route Recovery in progress : N/A Route Unhealthy : N/A Access Timeout Recovery : N/A Incorrect Configuration : N/A Processes : None Capabilities EGM : disabled root@ollama-0:/# ```
Author
Owner

@g21chen commented on GitHub (Mar 31, 2026):

F.Y.I. it works with vLLM instead of ollama in the same server with same mig cfg

  1. pod spec with vLLM
    [core@xxxxx]$ cat vll-model-deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: vllm-verify
    namespace: open-webui
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: vllm-verify
    template:
    metadata:
    labels:
    app: vllm-verify
    spec:
    containers:

    • name: vllm-container
      image: vllm/vllm-openai:latest
      env:
      • name: NVIDIA_VISIBLE_DEVICES

        Use the specific UUID for Device 1

        value: "MIG-12ea3b46-39a7-5d8c-9dd6-7d8482d5e3f8"
      • name: NVIDIA_DRIVER_CAPABILITIES
        value: "all"
        args: [
        "--model", "unsloth/Llama-3.2-3B-Instruct",
        "--max-model-len", "4096",
        "--gpu-memory-utilization", "0.9"
        ]
        resources:
        limits:
        nvidia.com/gpu: "1"
        requests:
        nvidia.com/gpu: "1"
  2. query results

[core@xxxxxxxxxx]$ curl http://10.128.1.92:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "unsloth/Llama-3.2-3B-Instruct",
"messages": [
{"role": "user", "content": "List the top 30 countries by population."}
],
"max_tokens": 1000
}'
{"id":"chatcmpl-88d9bbbe3705f88e","object":"chat.completion","created":1774982640,"model":"unsloth/Llama-3.2-3B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Here's a list of the top 30 countries by population based on available data up to 2023:\n\n1. China - 1.449 billion\n2. India - 1.393 billion\n3. United States - 334 million\n4. Indonesia - 283 million\n5. Pakistan - 229 million\n6. Brazil - 215 million\n7. Nigeria - 213 million\n8. Bangladesh - 166 million\n9. Russia - 146 million\n10. Japan - 128 million\n11. Mexico - 127 million\n12. Ethiopia - 125 million\n13. Philippines - 115 million\n14. Vietnam - 99 million\n15. Egypt - 98 million\n16. Democratic Republic of the Congo - 92 million\n17. Turkey - 84 million\n18. Iran - 83 million\n19. Thailand - 69 million\n20. Germany - 68 million\n21. South Korea - 68 million\n22. Iran - 67 million\n23. Italy - 64 million\n24. France - 63 million\n25. United Kingdom - 62 million\n26. Myanmar - 58 million\n27. Tanzania - 58 million\n28. Kenya - 56 million\n29. Algeria - 55 million\n30. Colombia - 54 million","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":44,"total_tokens":317,"completion_tokens":273,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}[core@quanta-tmo-4500-cran mig]$

<!-- gh-comment-id:4164747574 --> @g21chen commented on GitHub (Mar 31, 2026): F.Y.I. it works with vLLM instead of ollama in the same server with same mig cfg 1. pod spec with vLLM [core@xxxxx]$ cat vll-model-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: vllm-verify namespace: open-webui spec: replicas: 1 selector: matchLabels: app: vllm-verify template: metadata: labels: app: vllm-verify spec: containers: - name: vllm-container image: vllm/vllm-openai:latest env: - name: NVIDIA_VISIBLE_DEVICES # Use the specific UUID for Device 1 value: "MIG-12ea3b46-39a7-5d8c-9dd6-7d8482d5e3f8" - name: NVIDIA_DRIVER_CAPABILITIES value: "all" args: [ "--model", "unsloth/Llama-3.2-3B-Instruct", "--max-model-len", "4096", "--gpu-memory-utilization", "0.9" ] resources: limits: nvidia.com/gpu: "1" requests: nvidia.com/gpu: "1" 2. query results [core@xxxxxxxxxx]$ curl http://10.128.1.92:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "unsloth/Llama-3.2-3B-Instruct", "messages": [ {"role": "user", "content": "List the top 30 countries by population."} ], "max_tokens": 1000 }' {"id":"chatcmpl-88d9bbbe3705f88e","object":"chat.completion","created":1774982640,"model":"unsloth/Llama-3.2-3B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Here's a list of the top 30 countries by population based on available data up to 2023:\n\n1. China - 1.449 billion\n2. India - 1.393 billion\n3. United States - 334 million\n4. Indonesia - 283 million\n5. Pakistan - 229 million\n6. Brazil - 215 million\n7. Nigeria - 213 million\n8. Bangladesh - 166 million\n9. Russia - 146 million\n10. Japan - 128 million\n11. Mexico - 127 million\n12. Ethiopia - 125 million\n13. Philippines - 115 million\n14. Vietnam - 99 million\n15. Egypt - 98 million\n16. Democratic Republic of the Congo - 92 million\n17. Turkey - 84 million\n18. Iran - 83 million\n19. Thailand - 69 million\n20. Germany - 68 million\n21. South Korea - 68 million\n22. Iran - 67 million\n23. Italy - 64 million\n24. France - 63 million\n25. United Kingdom - 62 million\n26. Myanmar - 58 million\n27. Tanzania - 58 million\n28. Kenya - 56 million\n29. Algeria - 55 million\n30. Colombia - 54 million","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":44,"total_tokens":317,"completion_tokens":273,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}[core@quanta-tmo-4500-cran mig]$
Author
Owner

@g21chen commented on GitHub (Apr 3, 2026):

may I ask if any update? Thanks!

<!-- gh-comment-id:4185093027 --> @g21chen commented on GitHub (Apr 3, 2026): may I ask if any update? Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9716