[GH-ISSUE #13069] 0.12.10 uses only the CPU, while 0.9.6 uses the GPU. #55167

Closed
opened 2026-04-29 08:26:27 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Nyx-YeSheng on GitHub (Nov 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13069

What is the issue?

They all run normally, but they use the CPU very slowly.
The environment variables I am using are as follows:

export CUDA_PATH=/usr/local/cuda-12.4
export CUDA_VISIBLE_DEVICES=0,1,2,3
export OLLAMA_GPU_LAYER=cuda
export OLLAMA_NUM_GPU=4
export OLLAMA_SCHED_SPREAD=1
export OLLAMA_MAX_LOADED_MODELS=4
export OLLAMA_NUM_PARALLEL=12
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_LLM_LIBRARY=gguf
export OLLAMA_DISABLE_METRICS=1
export OLLAMA_BATCH_SIZE=2
export OLLAMA_HOST=0.0.0.0:11434
export OLLAMA_ORIGINS=*

The output below is using the same device :
This is version 0.9.6.

time=2025-11-12T18:06:04.440+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0,1,2,3 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true
 OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:gguf OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:4 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/user/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW
_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:12 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https
://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-12T18:06:04.444+08:00 level=INFO source=images.go:476 msg="total blobs: 33"
time=2025-11-12T18:06:04.445+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
time=2025-11-12T18:06:04.445+08:00 level=INFO source=routes.go:1288 msg="Listening on [::]:11434 (version 0.9.6)"
time=2025-11-12T18:06:04.445+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-11-12T18:06:05.124+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-67804d8d-f393-d462-9904-5f5daa82ce31 library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.6 GiB" available="23.3 GiB"
time=2025-11-12T18:06:05.124+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-b350807b-2902-55f7-187a-33ae3a250c8e library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.6 GiB" available="23.3 GiB"
time=2025-11-12T18:06:05.124+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-9ee37f5a-b84a-33ef-0089-49f8af955415 library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.6 GiB" available="23.3 GiB"
time=2025-11-12T18:06:05.124+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-a6cd756c-3c06-ed2a-ad21-e90888ebf26f library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.6 GiB" available="23.3 GiB"
time=2025-11-12T18:06:08.414+08:00 level=INFO source=sched.go:804 msg="new model will fit in available VRAM, loading" model=/home/user/.ollama/models/blobs/sha256-75acd89545d50126692df2116bc3fdf700408b4be1d74d58e8eed13bca028f17 library=cuda parallel=12 required="17.8 GiB"
time=2025-11-12T18:06:08.928+08:00 level=INFO source=server.go:135 msg="system memory" total="251.5 GiB" free="246.4 GiB" free_swap="1.9 GiB"

This is version 0.12.10.

time=2025-11-12T18:06:31.372+08:00 level=INFO source=routes.go:1525 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0,1,2,3 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OL
LAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:gguf OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:4 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/user/.ollama/models OLLAMA_MULTIUS
ER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:12 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0
.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-12T18:06:31.377+08:00 level=INFO source=images.go:522 msg="total blobs: 33"
time=2025-11-12T18:06:31.378+08:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-12T18:06:31.379+08:00 level=INFO source=routes.go:1578 msg="Listening on [::]:11434 (version 0.12.10)"
time=2025-11-12T18:06:31.380+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-12T18:06:31.380+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="251.5 GiB" available="246.6 GiB"
time=2025-11-12T18:06:31.380+08:00 level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
time=2025-11-12T18:06:34.567+08:00 level=INFO source=server.go:215 msg="enabling flash attention"
time=2025-11-12T18:06:34.567+08:00 level=INFO source=server.go:400 msg="starting runner" cmd="/usr/main/ollama-12.10/bin/ollama runner --ollama-engine --model /home/user/.ollama/models/blobs/sha256-75acd89545d50126692df2116bc3fdf700408b4be1d74d58e8eed13bca028f17 --port 35199"
time=2025-11-12T18:06:34.568+08:00 level=INFO source=server.go:653 msg="loading model" "model layers"=25 requested=-1
time=2025-11-12T18:06:34.568+08:00 level=INFO source=server.go:658 msg="system memory" total="251.5 GiB" free="246.5 GiB" free_swap="1.9 GiB"
time=2025-11-12T18:06:34.594+08:00 level=INFO source=runner.go:1349 msg="starting ollama engine"
time=2025-11-12T18:06:34.595+08:00 level=INFO source=runner.go:1384 msg="Server listening on 127.0.0.1:35199"
time=2025-11-12T18:06:34.604+08:00 level=INFO source=runner.go:1222 msg=load request="{Operation:fit LoraPath:[] Parallel:12 BatchSize:512 FlashAttention:true KvSize:49152 KvCacheType: NumThreads:64 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-12T18:06:34.686+08:00 level=INFO source=ggml.go:136 msg="" architecture=gpt-oss file_type=unknown name="Gpt Oss 20b" description="" num_tensors=459 num_key_values=36
load_backend: loaded CPU backend from /usr/main/ollama-12.10/lib/ollama/libggml-cpu-icelake.so
time=2025-11-12T18:06:34.734+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-11-12T18:06:35.053+08:00 level=INFO source=runner.go:1222 msg=load request="{Operation:alloc LoraPath:[] Parallel:12 BatchSize:512 FlashAttention:true KvSize:49152 KvCacheType: NumThreads:64 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-12T18:06:36.919+08:00 level=INFO source=runner.go:1222 msg=load request="{Operation:commit LoraPath:[] Parallel:12 BatchSize:512 FlashAttention:true KvSize:49152 KvCacheType: NumThreads:64 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-12T18:06:36.919+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2025-11-12T18:06:36.919+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2025-11-12T18:06:36.919+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU"
time=2025-11-12T18:06:36.919+08:00 level=INFO source=device.go:217 msg="model weights" device=CPU size="11.3 GiB"
time=2025-11-12T18:06:36.919+08:00 level=INFO source=device.go:228 msg="kv cache" device=CPU size="2.2 GiB"
time=2025-11-12T18:06:36.919+08:00 level=INFO source=device.go:239 msg="compute graph" device=CPU size="198.8 MiB"
time=2025-11-12T18:06:36.919+08:00 level=INFO source=device.go:244 msg="total memory" size="13.7 GiB"
time=2025-11-12T18:06:36.919+08:00 level=INFO source=sched.go:500 msg="loaded runners" count=1
time=2025-11-12T18:06:36.919+08:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-12T18:06:36.920+08:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
time=2025-11-12T18:06:38.682+08:00 level=INFO source=server.go:1289 msg="llama runner started in 4.11 seconds"

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.12.10

Originally created by @Nyx-YeSheng on GitHub (Nov 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13069 ### What is the issue? They all run normally, but they use the CPU very slowly. The environment variables I am using are as follows: ``` export CUDA_PATH=/usr/local/cuda-12.4 export CUDA_VISIBLE_DEVICES=0,1,2,3 export OLLAMA_GPU_LAYER=cuda export OLLAMA_NUM_GPU=4 export OLLAMA_SCHED_SPREAD=1 export OLLAMA_MAX_LOADED_MODELS=4 export OLLAMA_NUM_PARALLEL=12 export OLLAMA_FLASH_ATTENTION=1 export OLLAMA_LLM_LIBRARY=gguf export OLLAMA_DISABLE_METRICS=1 export OLLAMA_BATCH_SIZE=2 export OLLAMA_HOST=0.0.0.0:11434 export OLLAMA_ORIGINS=* ``` The output below is using the same device : This is version 0.9.6. ``` time=2025-11-12T18:06:04.440+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0,1,2,3 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:gguf OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:4 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/user/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW _ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:12 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https ://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-12T18:06:04.444+08:00 level=INFO source=images.go:476 msg="total blobs: 33" time=2025-11-12T18:06:04.445+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0" time=2025-11-12T18:06:04.445+08:00 level=INFO source=routes.go:1288 msg="Listening on [::]:11434 (version 0.9.6)" time=2025-11-12T18:06:04.445+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-11-12T18:06:05.124+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-67804d8d-f393-d462-9904-5f5daa82ce31 library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.6 GiB" available="23.3 GiB" time=2025-11-12T18:06:05.124+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-b350807b-2902-55f7-187a-33ae3a250c8e library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.6 GiB" available="23.3 GiB" time=2025-11-12T18:06:05.124+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-9ee37f5a-b84a-33ef-0089-49f8af955415 library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.6 GiB" available="23.3 GiB" time=2025-11-12T18:06:05.124+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-a6cd756c-3c06-ed2a-ad21-e90888ebf26f library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.6 GiB" available="23.3 GiB" time=2025-11-12T18:06:08.414+08:00 level=INFO source=sched.go:804 msg="new model will fit in available VRAM, loading" model=/home/user/.ollama/models/blobs/sha256-75acd89545d50126692df2116bc3fdf700408b4be1d74d58e8eed13bca028f17 library=cuda parallel=12 required="17.8 GiB" time=2025-11-12T18:06:08.928+08:00 level=INFO source=server.go:135 msg="system memory" total="251.5 GiB" free="246.4 GiB" free_swap="1.9 GiB" ``` This is version 0.12.10. ``` time=2025-11-12T18:06:31.372+08:00 level=INFO source=routes.go:1525 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0,1,2,3 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OL LAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:gguf OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:4 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/user/.ollama/models OLLAMA_MULTIUS ER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:12 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0 .0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-12T18:06:31.377+08:00 level=INFO source=images.go:522 msg="total blobs: 33" time=2025-11-12T18:06:31.378+08:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-12T18:06:31.379+08:00 level=INFO source=routes.go:1578 msg="Listening on [::]:11434 (version 0.12.10)" time=2025-11-12T18:06:31.380+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-12T18:06:31.380+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="251.5 GiB" available="246.6 GiB" time=2025-11-12T18:06:31.380+08:00 level=INFO source=routes.go:1619 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" time=2025-11-12T18:06:34.567+08:00 level=INFO source=server.go:215 msg="enabling flash attention" time=2025-11-12T18:06:34.567+08:00 level=INFO source=server.go:400 msg="starting runner" cmd="/usr/main/ollama-12.10/bin/ollama runner --ollama-engine --model /home/user/.ollama/models/blobs/sha256-75acd89545d50126692df2116bc3fdf700408b4be1d74d58e8eed13bca028f17 --port 35199" time=2025-11-12T18:06:34.568+08:00 level=INFO source=server.go:653 msg="loading model" "model layers"=25 requested=-1 time=2025-11-12T18:06:34.568+08:00 level=INFO source=server.go:658 msg="system memory" total="251.5 GiB" free="246.5 GiB" free_swap="1.9 GiB" time=2025-11-12T18:06:34.594+08:00 level=INFO source=runner.go:1349 msg="starting ollama engine" time=2025-11-12T18:06:34.595+08:00 level=INFO source=runner.go:1384 msg="Server listening on 127.0.0.1:35199" time=2025-11-12T18:06:34.604+08:00 level=INFO source=runner.go:1222 msg=load request="{Operation:fit LoraPath:[] Parallel:12 BatchSize:512 FlashAttention:true KvSize:49152 KvCacheType: NumThreads:64 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-12T18:06:34.686+08:00 level=INFO source=ggml.go:136 msg="" architecture=gpt-oss file_type=unknown name="Gpt Oss 20b" description="" num_tensors=459 num_key_values=36 load_backend: loaded CPU backend from /usr/main/ollama-12.10/lib/ollama/libggml-cpu-icelake.so time=2025-11-12T18:06:34.734+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-11-12T18:06:35.053+08:00 level=INFO source=runner.go:1222 msg=load request="{Operation:alloc LoraPath:[] Parallel:12 BatchSize:512 FlashAttention:true KvSize:49152 KvCacheType: NumThreads:64 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-12T18:06:36.919+08:00 level=INFO source=runner.go:1222 msg=load request="{Operation:commit LoraPath:[] Parallel:12 BatchSize:512 FlashAttention:true KvSize:49152 KvCacheType: NumThreads:64 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-12T18:06:36.919+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" time=2025-11-12T18:06:36.919+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2025-11-12T18:06:36.919+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU" time=2025-11-12T18:06:36.919+08:00 level=INFO source=device.go:217 msg="model weights" device=CPU size="11.3 GiB" time=2025-11-12T18:06:36.919+08:00 level=INFO source=device.go:228 msg="kv cache" device=CPU size="2.2 GiB" time=2025-11-12T18:06:36.919+08:00 level=INFO source=device.go:239 msg="compute graph" device=CPU size="198.8 MiB" time=2025-11-12T18:06:36.919+08:00 level=INFO source=device.go:244 msg="total memory" size="13.7 GiB" time=2025-11-12T18:06:36.919+08:00 level=INFO source=sched.go:500 msg="loaded runners" count=1 time=2025-11-12T18:06:36.919+08:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-12T18:06:36.920+08:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" time=2025-11-12T18:06:38.682+08:00 level=INFO source=server.go:1289 msg="llama runner started in 4.11 seconds" ``` ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.10
GiteaMirror added the bug label 2026-04-29 08:26:27 -05:00
Author
Owner

@Nyx-YeSheng commented on GitHub (Nov 13, 2025):

If 0.12.10 use the ./ollama run gpt-oss, nvidia-smi to see the GPU is used

<!-- gh-comment-id:3525008179 --> @Nyx-YeSheng commented on GitHub (Nov 13, 2025): If 0.12.10 use the `./ollama run gpt-oss`, nvidia-smi to see the GPU is used
Author
Owner

@Nyx-YeSheng commented on GitHub (Nov 13, 2025):

I used to use ollama server all the time, but now, i change it to ollama start.

<!-- gh-comment-id:3525709697 --> @Nyx-YeSheng commented on GitHub (Nov 13, 2025): I used to use `ollama server` all the time, but now, i change it to `ollama start.`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55167