[GH-ISSUE #14849] Qwen 3.5 or GLM-4.7 bug #35338

Closed
opened 2026-04-22 19:46:31 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @DjceUo on GitHub (Mar 14, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14849

What is the issue?

When launching the Qwen 3.5 or GLM-4.7 model from the terminal, the model runs and works well. However, as soon as you try to launch it through Open Web UI or Chatbox (Ollama), it thinks for a while, and then everything freezes and stops, although the aforementioned user interfaces continue waiting for the model's response...

Relevant log output


OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.18

Originally created by @DjceUo on GitHub (Mar 14, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14849 ### What is the issue? When launching the Qwen 3.5 or GLM-4.7 model from the terminal, the model runs and works well. However, as soon as you try to launch it through Open Web UI or Chatbox (Ollama), it thinks for a while, and then everything freezes and stops, although the aforementioned user interfaces continue waiting for the model's response... ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.18
GiteaMirror added the needs more infobug labels 2026-04-22 19:46:31 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 14, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:4061117080 --> @rick-github commented on GitHub (Mar 14, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@DjceUo commented on GitHub (Mar 17, 2026):

time=2026-03-17T11:26:52.288+03:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:12h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:e:\\.ollama\\models\\ OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-03-17T11:26:52.310+03:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false"
time=2026-03-17T11:26:52.323+03:00 level=INFO source=images.go:477 msg="total blobs: 190"
time=2026-03-17T11:26:52.330+03:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-17T11:26:52.335+03:00 level=INFO source=routes.go:1782 msg="Listening on 127.0.0.1:11434 (version 0.18.0)"
time=2026-03-17T11:26:52.336+03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-17T11:26:52.353+03:00 level=WARN source=runner.go:485 msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=0
time=2026-03-17T11:26:52.353+03:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again"
time=2026-03-17T11:26:52.362+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60488"
time=2026-03-17T11:26:52.542+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60496"
time=2026-03-17T11:26:52.666+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60503"
time=2026-03-17T11:26:52.770+03:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-03-17T11:26:52.771+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60511"
time=2026-03-17T11:26:52.946+03:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v12 driver=12.7 pci_id=0000:01:00.0 type=discrete total="24.0 GiB" available="1.5 GiB"
time=2026-03-17T11:26:52.946+03:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768
[GIN] 2026/03/17 - 11:26:52 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/17 - 11:26:52 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/17 - 11:26:52 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/17 - 11:26:52 | 200 |     12.9981ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/03/17 - 11:26:53 | 401 |    217.0908ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/03/17 - 11:26:53 | 401 |    219.7291ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/03/17 - 11:26:53 | 404 |      6.9689ms |       127.0.0.1 | POST     "/api/show"
time=2026-03-17T11:27:43.216+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60638"
time=2026-03-17T11:27:43.400+03:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-17T11:27:43.400+03:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16
time=2026-03-17T11:27:43.518+03:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-03-17T11:27:43.519+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --model e:\\.ollama\\models\\blobs\\sha256-73b25b6053372913a20bde4fd1c81db57fb573b2b8fc063ffdcf25c5ef1f1d68 --port 60646"
time=2026-03-17T11:27:43.521+03:00 level=INFO source=sched.go:489 msg="system memory" total="63.9 GiB" free="50.8 GiB" free_swap="29.5 GiB"
time=2026-03-17T11:27:43.521+03:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 library=CUDA available="1.0 GiB" free="1.5 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-17T11:27:43.521+03:00 level=INFO source=server.go:757 msg="loading model" "model layers"=33 requested=-1
time=2026-03-17T11:27:43.560+03:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-17T11:27:43.583+03:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:60646"
time=2026-03-17T11:27:43.586+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:33[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-17T11:27:43.630+03:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q8_0 name="" description="" num_tensors=883 num_key_values=52
load_backend: loaded CPU backend from C:\Ollama\lib\ollama\ggml-cpu-haswell.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2
load_backend: loaded CUDA backend from C:\Ollama\lib\ollama\cuda_v12\ggml-cuda.dll
time=2026-03-17T11:27:43.722+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-03-17T11:27:44.742+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-17T11:27:45.228+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:2[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:2(30..31)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-17T11:27:45.749+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:1[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:1(31..31)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[GIN] 2026/03/17 - 11:27:46 | 401 |    151.5107ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/03/17 - 11:27:46 | 401 |    159.9471ms |       127.0.0.1 | POST     "/api/me"
time=2026-03-17T11:27:46.293+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-17T11:27:46.740+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-17T11:27:47.467+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:2[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:2(30..31)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-17T11:27:48.394+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:1[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:1(31..31)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-17T11:27:49.403+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-17T11:27:50.293+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-17T11:27:50.293+03:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="10.0 GiB"
time=2026-03-17T11:27:50.293+03:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-03-17T11:27:50.293+03:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-03-17T11:27:50.293+03:00 level=INFO source=ggml.go:494 msg="offloaded 0/33 layers to GPU"
time=2026-03-17T11:27:50.293+03:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.8 GiB"
time=2026-03-17T11:27:50.293+03:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="622.1 MiB"
time=2026-03-17T11:27:50.293+03:00 level=INFO source=device.go:272 msg="total memory" size="12.3 GiB"
time=2026-03-17T11:27:50.293+03:00 level=INFO source=sched.go:565 msg="loaded runners" count=1
time=2026-03-17T11:27:50.293+03:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-03-17T11:27:50.294+03:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-17T11:27:54.049+03:00 level=INFO source=server.go:1388 msg="llama runner started in 10.53 seconds"
[GIN] 2026/03/17 - 11:30:13 | 500 |         2m30s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2026-03-17T11:30:13.454+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60959"
time=2026-03-17T11:30:13.640+03:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-17T11:30:13.640+03:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16
time=2026-03-17T11:30:13.795+03:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-03-17T11:30:13.796+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --model e:\\.ollama\\models\\blobs\\sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 --port 60967"
time=2026-03-17T11:30:13.798+03:00 level=INFO source=sched.go:489 msg="system memory" total="63.9 GiB" free="51.4 GiB" free_swap="29.6 GiB"
time=2026-03-17T11:30:13.798+03:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 library=CUDA available="1.2 GiB" free="1.6 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-17T11:30:13.798+03:00 level=INFO source=server.go:757 msg="loading model" "model layers"=27 requested=-1
time=2026-03-17T11:30:13.837+03:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-17T11:30:13.860+03:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:60967"
time=2026-03-17T11:30:13.862+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:27[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:27(0..26)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-17T11:30:13.920+03:00 level=INFO source=ggml.go:136 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=340 num_key_values=32
load_backend: loaded CPU backend from C:\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2026-03-17T11:30:13.948+03:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60967/load\": context canceled"
time=2026-03-17T11:30:13.948+03:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60967/load\": context canceled"
time=2026-03-17T11:30:13.948+03:00 level=INFO source=sched.go:516 msg="Load failed" model=e:\.ollama\models\blobs\sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
[GIN] 2026/03/17 - 11:30:13 | 500 |         2m30s |       127.0.0.1 | POST     "/v1/chat/completions"
<!-- gh-comment-id:4073245098 --> @DjceUo commented on GitHub (Mar 17, 2026): ``` time=2026-03-17T11:26:52.288+03:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:12h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:e:\\.ollama\\models\\ OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" time=2026-03-17T11:26:52.310+03:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false" time=2026-03-17T11:26:52.323+03:00 level=INFO source=images.go:477 msg="total blobs: 190" time=2026-03-17T11:26:52.330+03:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-17T11:26:52.335+03:00 level=INFO source=routes.go:1782 msg="Listening on 127.0.0.1:11434 (version 0.18.0)" time=2026-03-17T11:26:52.336+03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-17T11:26:52.353+03:00 level=WARN source=runner.go:485 msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=0 time=2026-03-17T11:26:52.353+03:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again" time=2026-03-17T11:26:52.362+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60488" time=2026-03-17T11:26:52.542+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60496" time=2026-03-17T11:26:52.666+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60503" time=2026-03-17T11:26:52.770+03:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-17T11:26:52.771+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60511" time=2026-03-17T11:26:52.946+03:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v12 driver=12.7 pci_id=0000:01:00.0 type=discrete total="24.0 GiB" available="1.5 GiB" time=2026-03-17T11:26:52.946+03:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768 [GIN] 2026/03/17 - 11:26:52 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/17 - 11:26:52 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/17 - 11:26:52 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/17 - 11:26:52 | 200 | 12.9981ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/17 - 11:26:53 | 401 | 217.0908ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/03/17 - 11:26:53 | 401 | 219.7291ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/03/17 - 11:26:53 | 404 | 6.9689ms | 127.0.0.1 | POST "/api/show" time=2026-03-17T11:27:43.216+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60638" time=2026-03-17T11:27:43.400+03:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-03-17T11:27:43.400+03:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16 time=2026-03-17T11:27:43.518+03:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-03-17T11:27:43.519+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --model e:\\.ollama\\models\\blobs\\sha256-73b25b6053372913a20bde4fd1c81db57fb573b2b8fc063ffdcf25c5ef1f1d68 --port 60646" time=2026-03-17T11:27:43.521+03:00 level=INFO source=sched.go:489 msg="system memory" total="63.9 GiB" free="50.8 GiB" free_swap="29.5 GiB" time=2026-03-17T11:27:43.521+03:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 library=CUDA available="1.0 GiB" free="1.5 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-17T11:27:43.521+03:00 level=INFO source=server.go:757 msg="loading model" "model layers"=33 requested=-1 time=2026-03-17T11:27:43.560+03:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-17T11:27:43.583+03:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:60646" time=2026-03-17T11:27:43.586+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:33[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-17T11:27:43.630+03:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q8_0 name="" description="" num_tensors=883 num_key_values=52 load_backend: loaded CPU backend from C:\Ollama\lib\ollama\ggml-cpu-haswell.dll ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 load_backend: loaded CUDA backend from C:\Ollama\lib\ollama\cuda_v12\ggml-cuda.dll time=2026-03-17T11:27:43.722+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-03-17T11:27:44.742+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-17T11:27:45.228+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:2[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:2(30..31)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-17T11:27:45.749+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:1[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:1(31..31)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [GIN] 2026/03/17 - 11:27:46 | 401 | 151.5107ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/03/17 - 11:27:46 | 401 | 159.9471ms | 127.0.0.1 | POST "/api/me" time=2026-03-17T11:27:46.293+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-17T11:27:46.740+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-17T11:27:47.467+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:2[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:2(30..31)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-17T11:27:48.394+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:1[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:1(31..31)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-17T11:27:49.403+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-17T11:27:50.293+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-17T11:27:50.293+03:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="10.0 GiB" time=2026-03-17T11:27:50.293+03:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" time=2026-03-17T11:27:50.293+03:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-03-17T11:27:50.293+03:00 level=INFO source=ggml.go:494 msg="offloaded 0/33 layers to GPU" time=2026-03-17T11:27:50.293+03:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.8 GiB" time=2026-03-17T11:27:50.293+03:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="622.1 MiB" time=2026-03-17T11:27:50.293+03:00 level=INFO source=device.go:272 msg="total memory" size="12.3 GiB" time=2026-03-17T11:27:50.293+03:00 level=INFO source=sched.go:565 msg="loaded runners" count=1 time=2026-03-17T11:27:50.293+03:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-03-17T11:27:50.294+03:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" time=2026-03-17T11:27:54.049+03:00 level=INFO source=server.go:1388 msg="llama runner started in 10.53 seconds" [GIN] 2026/03/17 - 11:30:13 | 500 | 2m30s | 127.0.0.1 | POST "/v1/chat/completions" time=2026-03-17T11:30:13.454+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 60959" time=2026-03-17T11:30:13.640+03:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-03-17T11:30:13.640+03:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16 time=2026-03-17T11:30:13.795+03:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-03-17T11:30:13.796+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --model e:\\.ollama\\models\\blobs\\sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 --port 60967" time=2026-03-17T11:30:13.798+03:00 level=INFO source=sched.go:489 msg="system memory" total="63.9 GiB" free="51.4 GiB" free_swap="29.6 GiB" time=2026-03-17T11:30:13.798+03:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 library=CUDA available="1.2 GiB" free="1.6 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-17T11:30:13.798+03:00 level=INFO source=server.go:757 msg="loading model" "model layers"=27 requested=-1 time=2026-03-17T11:30:13.837+03:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-17T11:30:13.860+03:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:60967" time=2026-03-17T11:30:13.862+03:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:27[ID:GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 Layers:27(0..26)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-17T11:30:13.920+03:00 level=INFO source=ggml.go:136 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=340 num_key_values=32 load_backend: loaded CPU backend from C:\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2026-03-17T11:30:13.948+03:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60967/load\": context canceled" time=2026-03-17T11:30:13.948+03:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60967/load\": context canceled" time=2026-03-17T11:30:13.948+03:00 level=INFO source=sched.go:516 msg="Load failed" model=e:\.ollama\models\blobs\sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" [GIN] 2026/03/17 - 11:30:13 | 500 | 2m30s | 127.0.0.1 | POST "/v1/chat/completions" ```
Author
Owner

@DjceUo commented on GitHub (Mar 17, 2026):

time=2026-03-17T11:35:37.662+03:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:12h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:e:\\.ollama\\models\\ OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-03-17T11:35:37.685+03:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false"
time=2026-03-17T11:35:37.698+03:00 level=INFO source=images.go:477 msg="total blobs: 190"
time=2026-03-17T11:35:37.704+03:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-17T11:35:37.709+03:00 level=INFO source=routes.go:1782 msg="Listening on 127.0.0.1:11434 (version 0.18.0)"
time=2026-03-17T11:35:37.710+03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-17T11:35:37.728+03:00 level=WARN source=runner.go:485 msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=0
time=2026-03-17T11:35:37.728+03:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again"
time=2026-03-17T11:35:37.736+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 61800"
time=2026-03-17T11:35:37.905+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 61808"
time=2026-03-17T11:35:38.033+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 61815"
time=2026-03-17T11:35:38.138+03:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-03-17T11:35:38.139+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 61823"
time=2026-03-17T11:35:38.322+03:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v12 driver=12.7 pci_id=0000:01:00.0 type=discrete total="24.0 GiB" available="23.1 GiB"
time=2026-03-17T11:35:38.322+03:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768
[GIN] 2026/03/17 - 11:35:38 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/17 - 11:35:38 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/17 - 11:35:38 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/17 - 11:35:38 | 404 |      8.5513ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/03/17 - 11:35:38 | 200 |     13.5306ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/03/17 - 11:35:38 | 401 |     232.402ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/03/17 - 11:35:38 | 401 |    248.2904ms |       127.0.0.1 | POST     "/api/me"
<!-- gh-comment-id:4073245931 --> @DjceUo commented on GitHub (Mar 17, 2026): ``` time=2026-03-17T11:35:37.662+03:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:12h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:e:\\.ollama\\models\\ OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" time=2026-03-17T11:35:37.685+03:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false" time=2026-03-17T11:35:37.698+03:00 level=INFO source=images.go:477 msg="total blobs: 190" time=2026-03-17T11:35:37.704+03:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-17T11:35:37.709+03:00 level=INFO source=routes.go:1782 msg="Listening on 127.0.0.1:11434 (version 0.18.0)" time=2026-03-17T11:35:37.710+03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-17T11:35:37.728+03:00 level=WARN source=runner.go:485 msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=0 time=2026-03-17T11:35:37.728+03:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again" time=2026-03-17T11:35:37.736+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 61800" time=2026-03-17T11:35:37.905+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 61808" time=2026-03-17T11:35:38.033+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 61815" time=2026-03-17T11:35:38.138+03:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-17T11:35:38.139+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Ollama\\ollama.exe runner --ollama-engine --port 61823" time=2026-03-17T11:35:38.322+03:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v12 driver=12.7 pci_id=0000:01:00.0 type=discrete total="24.0 GiB" available="23.1 GiB" time=2026-03-17T11:35:38.322+03:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768 [GIN] 2026/03/17 - 11:35:38 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/17 - 11:35:38 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/17 - 11:35:38 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/17 - 11:35:38 | 404 | 8.5513ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/03/17 - 11:35:38 | 200 | 13.5306ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/17 - 11:35:38 | 401 | 232.402ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/03/17 - 11:35:38 | 401 | 248.2904ms | 127.0.0.1 | POST "/api/me" ```
Author
Owner

@DjceUo commented on GitHub (Mar 17, 2026):

time=2026-03-17T11:26:51.190+03:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Ollama version=0.18.0 OS=Windows/10.0.19044
time=2026-03-17T11:26:51.214+03:00 level=INFO source=app.go:239 msg="initialized tools registry" tool_count=0
time=2026-03-17T11:26:51.229+03:00 level=INFO source=app.go:254 msg="starting ollama server"
time=2026-03-17T11:26:51.229+03:00 level=INFO source=app.go:285 msg="starting ui server" port=60322
time=2026-03-17T11:26:51.762+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=999.3µs request_id=1773736011761751200 version=0.18.0
time=2026-03-17T11:26:51.762+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/cloud http.pattern="GET /api/v1/cloud" http.status=200 http.d=0s request_id=1773736011762750500 version=0.18.0
time=2026-03-17T11:26:51.779+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/chats http.pattern="GET /api/v1/chats" http.status=200 http.d=0s request_id=1773736011779399900 version=0.18.0
time=2026-03-17T11:26:52.270+03:00 level=ERROR source=ui.go:1524 msg="failed to get inference info" error="timeout scanning server log for inference compute details"
time=2026-03-17T11:26:52.270+03:00 level=ERROR source=ui.go:241 msg=site.serveHTTP error="failed to get inference info: timeout scanning server log for inference compute details" http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=500 http.d=501.8947ms request_id=1773736011768233900 version=0.18.0
time=2026-03-17T11:26:52.946+03:00 level=INFO source=ui.go:159 msg="configuring ollama proxy" target=http://127.0.0.1:11434
time=2026-03-17T11:26:53.291+03:00 level=INFO source=server.go:362 msg=Matched "inference compute"="{Library:CUDA Variant: Compute:8.6 Driver:12.7 Name:CUDA0 VRAM:24.0 GiB}"
time=2026-03-17T11:26:53.291+03:00 level=INFO source=server.go:373 msg="Matched default context length" default_num_ctx=32768
time=2026-03-17T11:26:53.291+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=200 http.d=0s request_id=1773736013291313200 version=0.18.0
time=2026-03-17T11:26:53.303+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=0s request_id=1773736013303785500 version=0.18.0
time=2026-03-17T11:26:53.304+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=0s request_id=1773736013304322000 version=0.18.0
time=2026-03-17T11:26:53.304+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=546µs request_id=1773736013304322000 version=0.18.0
time=2026-03-17T11:26:53.304+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=546µs request_id=1773736013304322000 version=0.18.0
time=2026-03-17T11:26:53.305+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=0s request_id=1773736013305417900 version=0.18.0
time=2026-03-17T11:26:53.313+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013313664500 version=0.18.0
time=2026-03-17T11:26:53.319+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013319095600 version=0.18.0
time=2026-03-17T11:26:53.320+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=15.2889ms request_id=1773736013304868000 version=0.18.0
time=2026-03-17T11:26:53.322+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013322274200 version=0.18.0
time=2026-03-17T11:26:53.323+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013323875000 version=0.18.0
time=2026-03-17T11:26:53.325+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013325516500 version=0.18.0
time=2026-03-17T11:26:53.328+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013328772600 version=0.18.0
time=2026-03-17T11:26:54.230+03:00 level=INFO source=updater.go:296 msg="beginning update checker" interval=1h0m0s
time=2026-03-17T11:27:45.909+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/cloud http.pattern="GET /api/v1/cloud" http.status=200 http.d=0s request_id=1773736065909309900 version=0.18.0
time=2026-03-17T11:27:45.909+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736065909886500 version=0.18.0
time=2026-03-17T11:27:45.913+03:00 level=INFO source=server.go:362 msg=Matched "inference compute"="{Library:CUDA Variant: Compute:8.6 Driver:12.7 Name:CUDA0 VRAM:24.0 GiB}"
time=2026-03-17T11:27:45.913+03:00 level=INFO source=server.go:373 msg="Matched default context length" default_num_ctx=32768
time=2026-03-17T11:27:45.913+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=200 http.d=0s request_id=1773736065913195900 version=0.18.0
time=2026-03-17T11:35:24.237+03:00 level=INFO source=app.go:352 msg="shutting down desktop server"
time=2026-03-17T11:35:24.237+03:00 level=INFO source=app.go:357 msg="shutting down ollama server"
<!-- gh-comment-id:4073246740 --> @DjceUo commented on GitHub (Mar 17, 2026): ``` time=2026-03-17T11:26:51.190+03:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Ollama version=0.18.0 OS=Windows/10.0.19044 time=2026-03-17T11:26:51.214+03:00 level=INFO source=app.go:239 msg="initialized tools registry" tool_count=0 time=2026-03-17T11:26:51.229+03:00 level=INFO source=app.go:254 msg="starting ollama server" time=2026-03-17T11:26:51.229+03:00 level=INFO source=app.go:285 msg="starting ui server" port=60322 time=2026-03-17T11:26:51.762+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=999.3µs request_id=1773736011761751200 version=0.18.0 time=2026-03-17T11:26:51.762+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/cloud http.pattern="GET /api/v1/cloud" http.status=200 http.d=0s request_id=1773736011762750500 version=0.18.0 time=2026-03-17T11:26:51.779+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/chats http.pattern="GET /api/v1/chats" http.status=200 http.d=0s request_id=1773736011779399900 version=0.18.0 time=2026-03-17T11:26:52.270+03:00 level=ERROR source=ui.go:1524 msg="failed to get inference info" error="timeout scanning server log for inference compute details" time=2026-03-17T11:26:52.270+03:00 level=ERROR source=ui.go:241 msg=site.serveHTTP error="failed to get inference info: timeout scanning server log for inference compute details" http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=500 http.d=501.8947ms request_id=1773736011768233900 version=0.18.0 time=2026-03-17T11:26:52.946+03:00 level=INFO source=ui.go:159 msg="configuring ollama proxy" target=http://127.0.0.1:11434 time=2026-03-17T11:26:53.291+03:00 level=INFO source=server.go:362 msg=Matched "inference compute"="{Library:CUDA Variant: Compute:8.6 Driver:12.7 Name:CUDA0 VRAM:24.0 GiB}" time=2026-03-17T11:26:53.291+03:00 level=INFO source=server.go:373 msg="Matched default context length" default_num_ctx=32768 time=2026-03-17T11:26:53.291+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=200 http.d=0s request_id=1773736013291313200 version=0.18.0 time=2026-03-17T11:26:53.303+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=0s request_id=1773736013303785500 version=0.18.0 time=2026-03-17T11:26:53.304+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=0s request_id=1773736013304322000 version=0.18.0 time=2026-03-17T11:26:53.304+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=546µs request_id=1773736013304322000 version=0.18.0 time=2026-03-17T11:26:53.304+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=546µs request_id=1773736013304322000 version=0.18.0 time=2026-03-17T11:26:53.305+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=0s request_id=1773736013305417900 version=0.18.0 time=2026-03-17T11:26:53.313+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013313664500 version=0.18.0 time=2026-03-17T11:26:53.319+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013319095600 version=0.18.0 time=2026-03-17T11:26:53.320+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=15.2889ms request_id=1773736013304868000 version=0.18.0 time=2026-03-17T11:26:53.322+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013322274200 version=0.18.0 time=2026-03-17T11:26:53.323+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013323875000 version=0.18.0 time=2026-03-17T11:26:53.325+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013325516500 version=0.18.0 time=2026-03-17T11:26:53.328+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736013328772600 version=0.18.0 time=2026-03-17T11:26:54.230+03:00 level=INFO source=updater.go:296 msg="beginning update checker" interval=1h0m0s time=2026-03-17T11:27:45.909+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/cloud http.pattern="GET /api/v1/cloud" http.status=200 http.d=0s request_id=1773736065909309900 version=0.18.0 time=2026-03-17T11:27:45.909+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736065909886500 version=0.18.0 time=2026-03-17T11:27:45.913+03:00 level=INFO source=server.go:362 msg=Matched "inference compute"="{Library:CUDA Variant: Compute:8.6 Driver:12.7 Name:CUDA0 VRAM:24.0 GiB}" time=2026-03-17T11:27:45.913+03:00 level=INFO source=server.go:373 msg="Matched default context length" default_num_ctx=32768 time=2026-03-17T11:27:45.913+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=200 http.d=0s request_id=1773736065913195900 version=0.18.0 time=2026-03-17T11:35:24.237+03:00 level=INFO source=app.go:352 msg="shutting down desktop server" time=2026-03-17T11:35:24.237+03:00 level=INFO source=app.go:357 msg="shutting down ollama server" ```
Author
Owner

@DjceUo commented on GitHub (Mar 17, 2026):

time=2026-03-17T11:35:36.549+03:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Ollama version=0.18.0 OS=Windows/10.0.19044
time=2026-03-17T11:35:36.573+03:00 level=INFO source=app.go:239 msg="initialized tools registry" tool_count=0
time=2026-03-17T11:35:36.588+03:00 level=INFO source=app.go:254 msg="starting ollama server"
time=2026-03-17T11:35:36.588+03:00 level=INFO source=app.go:285 msg="starting ui server" port=61628
time=2026-03-17T11:35:37.143+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736537143909900 version=0.18.0
time=2026-03-17T11:35:37.144+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/cloud http.pattern="GET /api/v1/cloud" http.status=200 http.d=0s request_id=1773736537144948800 version=0.18.0
time=2026-03-17T11:35:37.162+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/chats http.pattern="GET /api/v1/chats" http.status=200 http.d=0s request_id=1773736537162441800 version=0.18.0
time=2026-03-17T11:35:37.653+03:00 level=ERROR source=ui.go:1524 msg="failed to get inference info" error="timeout scanning server log for inference compute details"
time=2026-03-17T11:35:37.653+03:00 level=ERROR source=ui.go:241 msg=site.serveHTTP error="failed to get inference info: timeout scanning server log for inference compute details" http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=500 http.d=503.2361ms request_id=1773736537150759000 version=0.18.0
time=2026-03-17T11:35:38.323+03:00 level=INFO source=ui.go:159 msg="configuring ollama proxy" target=http://127.0.0.1:11434
time=2026-03-17T11:35:38.669+03:00 level=INFO source=server.go:362 msg=Matched "inference compute"="{Library:CUDA Variant: Compute:8.6 Driver:12.7 Name:CUDA0 VRAM:24.0 GiB}"
time=2026-03-17T11:35:38.669+03:00 level=INFO source=server.go:373 msg="Matched default context length" default_num_ctx=32768
time=2026-03-17T11:35:38.669+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=200 http.d=0s request_id=1773736538669330600 version=0.18.0
time=2026-03-17T11:35:39.590+03:00 level=INFO source=updater.go:296 msg="beginning update checker" interval=1h0m0s
time=2026-03-17T11:35:46.212+03:00 level=INFO source=app.go:352 msg="shutting down desktop server"
time=2026-03-17T11:35:46.212+03:00 level=INFO source=app.go:357 msg="shutting down ollama server"
<!-- gh-comment-id:4073247486 --> @DjceUo commented on GitHub (Mar 17, 2026): ``` time=2026-03-17T11:35:36.549+03:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Ollama version=0.18.0 OS=Windows/10.0.19044 time=2026-03-17T11:35:36.573+03:00 level=INFO source=app.go:239 msg="initialized tools registry" tool_count=0 time=2026-03-17T11:35:36.588+03:00 level=INFO source=app.go:254 msg="starting ollama server" time=2026-03-17T11:35:36.588+03:00 level=INFO source=app.go:285 msg="starting ui server" port=61628 time=2026-03-17T11:35:37.143+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1773736537143909900 version=0.18.0 time=2026-03-17T11:35:37.144+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/cloud http.pattern="GET /api/v1/cloud" http.status=200 http.d=0s request_id=1773736537144948800 version=0.18.0 time=2026-03-17T11:35:37.162+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/chats http.pattern="GET /api/v1/chats" http.status=200 http.d=0s request_id=1773736537162441800 version=0.18.0 time=2026-03-17T11:35:37.653+03:00 level=ERROR source=ui.go:1524 msg="failed to get inference info" error="timeout scanning server log for inference compute details" time=2026-03-17T11:35:37.653+03:00 level=ERROR source=ui.go:241 msg=site.serveHTTP error="failed to get inference info: timeout scanning server log for inference compute details" http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=500 http.d=503.2361ms request_id=1773736537150759000 version=0.18.0 time=2026-03-17T11:35:38.323+03:00 level=INFO source=ui.go:159 msg="configuring ollama proxy" target=http://127.0.0.1:11434 time=2026-03-17T11:35:38.669+03:00 level=INFO source=server.go:362 msg=Matched "inference compute"="{Library:CUDA Variant: Compute:8.6 Driver:12.7 Name:CUDA0 VRAM:24.0 GiB}" time=2026-03-17T11:35:38.669+03:00 level=INFO source=server.go:373 msg="Matched default context length" default_num_ctx=32768 time=2026-03-17T11:35:38.669+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=200 http.d=0s request_id=1773736538669330600 version=0.18.0 time=2026-03-17T11:35:39.590+03:00 level=INFO source=updater.go:296 msg="beginning update checker" interval=1h0m0s time=2026-03-17T11:35:46.212+03:00 level=INFO source=app.go:352 msg="shutting down desktop server" time=2026-03-17T11:35:46.212+03:00 level=INFO source=app.go:357 msg="shutting down ollama server" ```
Author
Owner

@rick-github commented on GitHub (Mar 17, 2026):

time=2026-03-17T11:26:52.946+03:00 level=INFO source=types.go:42 msg="inference compute"
 id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 filter_id="" library=CUDA compute=8.6 name=CUDA0
 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v12 driver=12.7 pci_id=0000:01:00.0
 type=discrete total="24.0 GiB" available="1.5 GiB"
time=2026-03-17T11:35:38.322+03:00 level=INFO source=routes.go:1832 msg="vram-based default context"
 total_vram="24.0 GiB" default_num_ctx=32768
time=2026-03-17T11:27:50.293+03:00 level=INFO source=ggml.go:494 msg="offloaded 0/33 layers to GPU"

The 3090 has 24G of VRAM but only 1.5G is free. The default context is set to 32k because of the size of the VRAM. Due to the limited free VRAM, no layers are loaded into the GPU and the model runs on the CPU, which is much slower. If VRAM is freed up, performance will improve. Use nvidia-smi to examine which processes are using VRAM and close any that are not needed.

<!-- gh-comment-id:4073936876 --> @rick-github commented on GitHub (Mar 17, 2026): ``` time=2026-03-17T11:26:52.946+03:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v12 driver=12.7 pci_id=0000:01:00.0 type=discrete total="24.0 GiB" available="1.5 GiB" time=2026-03-17T11:35:38.322+03:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768 time=2026-03-17T11:27:50.293+03:00 level=INFO source=ggml.go:494 msg="offloaded 0/33 layers to GPU" ``` The 3090 has 24G of VRAM but only 1.5G is free. The default context is set to 32k because of the size of the VRAM. Due to the limited free VRAM, no layers are loaded into the GPU and the model runs on the CPU, which is much slower. If VRAM is freed up, performance will improve. Use `nvidia-smi` to examine which processes are using VRAM and close any that are not needed.
Author
Owner

@DjceUo commented on GitHub (Mar 17, 2026):

qwen3.5:9b-q8_0

<!-- gh-comment-id:4075473693 --> @DjceUo commented on GitHub (Mar 17, 2026): qwen3.5:9b-q8_0
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35338