[GH-ISSUE #13939] Timeout Error when running ollama with claude code using qwen3-coder:30b #9121

Open
opened 2026-04-12 21:58:34 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @omer1abay on GitHub (Jan 27, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13939

What is the issue?

Image

Even though my system requirements are good enough, I get a timeout when I run the qwen3-coder:30b model with Claude Code. When I checked the logs in the debug folder, I saw that the error ‘“message”:“model 'claude-haiku-4-5-20251001’ not found”' was logged multiple times (You can check the full version of the error log, if it's enough to check the error I can send the all log file). Is there a solution for this?
I set the context length to 64k,
and I have a computer with 64 GB of RAM,
Ollama version 0.15.2
Claude Code version 2.1.20

2026-01-27T21:27:35.524Z [DEBUG] Generated summary for session 096c8d6f-f9c6-4059-9631-92e43b6f33c5: "API Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_6942222b2659cd315d234f9c"}"
2026-01-27T21:27:35.532Z [DEBUG] Session index: added 1, updated 0, removed 0, summaries generated 1 (total: 6)
2026-01-27T21:27:35.536Z [ERROR] Error in non-streaming fallback: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"}
2026-01-27T21:27:35.536Z [ERROR] Error: Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"}
at t7.generate (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:152:37688)
at RR.makeStatusError (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:2195)
at RR.makeRequest (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:5420)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
2026-01-27T21:27:35.545Z [ERROR] SyntaxError: SyntaxError: Unexpected token 'A', "API Error:"... is not valid JSON
at JSON.parse ()
at file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:67:812
at q (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:8:6814)
at qZ7 (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:1609:40380)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Relevant log output

2026-01-27T21:27:35.524Z [DEBUG] Generated summary for session 096c8d6f-f9c6-4059-9631-92e43b6f33c5: "API Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_6942222b2659cd315d234f9c"}"
2026-01-27T21:27:35.532Z [DEBUG] Session index: added 1, updated 0, removed 0, summaries generated 1 (total: 6)
2026-01-27T21:27:35.536Z [ERROR] Error in non-streaming fallback: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"}
2026-01-27T21:27:35.536Z [ERROR] Error: Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"}
    at t7.generate (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:152:37688)
    at RR.makeStatusError (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:2195)
    at RR.makeRequest (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:5420)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
2026-01-27T21:27:35.545Z [ERROR] SyntaxError: SyntaxError: Unexpected token 'A', "API Error:"... is not valid JSON
    at JSON.parse (<anonymous>)
    at file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:67:812
    at q (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:8:6814)
    at qZ7 (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:1609:40380)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

OS

Windows

GPU

Intel

CPU

Intel

Ollama version

0.15.2

Originally created by @omer1abay on GitHub (Jan 27, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13939 ### What is the issue? <img width="984" height="468" alt="Image" src="https://github.com/user-attachments/assets/91af2b46-3f28-47ea-b454-0a825c5f490c" /> Even though my system requirements are good enough, I get a timeout when I run the qwen3-coder:30b model with Claude Code. When I checked the logs in the debug folder, I saw that the error ‘“message”:“model 'claude-haiku-4-5-20251001’ not found”' was logged multiple times (You can check the full version of the error log, if it's enough to check the error I can send the all log file). Is there a solution for this? I set the context length to 64k, and I have a computer with 64 GB of RAM, Ollama version 0.15.2 Claude Code version 2.1.20 2026-01-27T21:27:35.524Z [DEBUG] Generated summary for session 096c8d6f-f9c6-4059-9631-92e43b6f33c5: "API Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_6942222b2659cd315d234f9c"}" 2026-01-27T21:27:35.532Z [DEBUG] Session index: added 1, updated 0, removed 0, summaries generated 1 (total: 6) 2026-01-27T21:27:35.536Z [ERROR] Error in non-streaming fallback: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"} 2026-01-27T21:27:35.536Z [ERROR] Error: Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"} at t7.generate (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:152:37688) at RR.makeStatusError (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:2195) at RR.makeRequest (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:5420) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) 2026-01-27T21:27:35.545Z [ERROR] SyntaxError: SyntaxError: Unexpected token 'A', "API Error:"... is not valid JSON at JSON.parse (<anonymous>) at file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:67:812 at q (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:8:6814) at qZ7 (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:1609:40380) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) ### Relevant log output ```shell 2026-01-27T21:27:35.524Z [DEBUG] Generated summary for session 096c8d6f-f9c6-4059-9631-92e43b6f33c5: "API Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_6942222b2659cd315d234f9c"}" 2026-01-27T21:27:35.532Z [DEBUG] Session index: added 1, updated 0, removed 0, summaries generated 1 (total: 6) 2026-01-27T21:27:35.536Z [ERROR] Error in non-streaming fallback: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"} 2026-01-27T21:27:35.536Z [ERROR] Error: Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"} at t7.generate (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:152:37688) at RR.makeStatusError (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:2195) at RR.makeRequest (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:5420) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) 2026-01-27T21:27:35.545Z [ERROR] SyntaxError: SyntaxError: Unexpected token 'A', "API Error:"... is not valid JSON at JSON.parse (<anonymous>) at file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:67:812 at q (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:8:6814) at qZ7 (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:1609:40380) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) ``` ### OS Windows ### GPU Intel ### CPU Intel ### Ollama version 0.15.2
GiteaMirror added the bug label 2026-04-12 21:58:34 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 27, 2026):

Don't see a Ollama log here.

<!-- gh-comment-id:3807865485 --> @rick-github commented on GitHub (Jan 27, 2026): Don't see a Ollama log here.
Author
Owner

@KyleJFischer commented on GitHub (Jan 30, 2026):

I am getting the same error on 15.1:

The error seems to be related to truncating the original prompt:

My prompt was "test" and that seems to be where it stalled and then it reprompted it self.

nixy is the name of my machine

Jan 30 13:13:05 nixy ollama[13814]: time=2026-01-30T13:13:05.228-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13872 keep=4 new=4096
Jan 30 13:13:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:10 | 404 |      28.344µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:13:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:25 | 404 |      18.425µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:13:55 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:55 | 404 |      27.914µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:14:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:14:25 | 404 |      27.723µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:14:55 nixy ollama[13814]: [GIN] 2026/01/30 - 13:14:55 | 404 |      18.826µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:04 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:04 | 500 |         1m59s |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:15:04 nixy ollama[13814]: time=2026-01-30T13:15:04.227-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
Jan 30 13:15:06 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:06 | 404 |      35.238µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:06 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:06 | 404 |        5.33µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:15 | 200 |      24.266µs |       127.0.0.1 | HEAD     "/"
Jan 30 13:15:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:15 | 200 |      690.91µs |       127.0.0.1 | GET      "/api/tags"
Jan 30 13:15:18 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:18 | 404 |      37.472µs |             ::1 | POST     "/v1/messages/count_tokens?beta=true"
Jan 30 13:15:18 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:18 | 404 |    1.838782ms |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:15:21 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:21 | 404 |     519.172µs |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:15:21 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:21 | 404 |     253.855µs |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:15:21 nixy ollama[13814]: time=2026-01-30T13:15:21.358-05:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b
Jan 30 13:15:21 nixy ollama[13814]: time=2026-01-30T13:15:21.417-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13872 keep=4 new=4096
Jan 30 13:15:23 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:23 | 404 |      29.897µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:23 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:23 | 404 |       7.264µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:25 | 404 |       7.354µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:30 | 404 |       9.238µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:38 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:38 | 404 |      31.991µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:50 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:50 | 404 |      18.595µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:08 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:08 | 404 |      27.202µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:14 nixy ollama[13814]: time=2026-01-30T13:16:14.059-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 500 |  52.71675636s |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 404 |      34.857µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 404 |       1.923µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:19 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:19 | 200 |      22.793µs |       127.0.0.1 | HEAD     "/"
Jan 30 13:16:19 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:19 | 200 |    1.282722ms |       127.0.0.1 | GET      "/api/tags"
Jan 30 13:16:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:25 | 404 |      27.442µs |             ::1 | POST     "/v1/messages/count_tokens?beta=true"
Jan 30 13:16:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:25 | 404 |    2.729146ms |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:16:27 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:27 | 404 |     843.192µs |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:16:27 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:27 | 404 |     483.935µs |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.092-05:00 level=INFO source=sched.go:635 msg="updated VRAM based on existing loaded models" gpu=00000000-c200-0000-0000-000000000000 library=Vulkan total="63.0 GiB" available="44.9 GiB"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.139-05:00 level=INFO source=server.go:245 msg="enabling flash attention"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/bin/ollama runner --ollama-engine --model /var/lib/ollama/models/blobs/sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 39223"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=sched.go:452 msg="system memory" total="125.1 GiB" free="101.5 GiB" free_swap="0 B"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=sched.go:459 msg="gpu memory" id=00000000-c200-0000-0000-000000000000 library=Vulkan available="44.5 GiB" free="44.9 GiB" minimum="457.0 MiB" overhead="0 B"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=server.go:755 msg="loading model" "model layers"=48 requested=-1
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.148-05:00 level=INFO source=runner.go:1405 msg="starting ollama engine"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.149-05:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:39223"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.152-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.185-05:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39
Jan 30 13:16:28 nixy ollama[13814]: ggml_vulkan: Found 1 Vulkan devices:
Jan 30 13:16:28 nixy ollama[13814]: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
Jan 30 13:16:28 nixy ollama[13814]: load_backend: loaded Vulkan backend from /nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/lib/ollama/libggml-vulkan.so
Jan 30 13:16:28 nixy ollama[13814]: load_backend: loaded CPU backend from /nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/lib/ollama/libggml-cpu-icelake.so
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.218-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.236-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.099-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="17.5 GiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:482 msg="offloading 47 repeating layers to GPU"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:494 msg="offloaded 48/48 layers to GPU"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="170.2 MiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="399.5 MiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="76.0 MiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:272 msg="total memory" size="18.2 GiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=sched.go:526 msg="loaded runners" count=2
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=server.go:1347 msg="waiting for llama runner to start responding"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model"
Jan 30 13:16:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:30 | 404 |      19.046µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:30 | 404 |       6.252µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:32 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:32 | 404 |       6.702µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:35 nixy ollama[13814]: time=2026-01-30T13:16:35.367-05:00 level=INFO source=server.go:1385 msg="llama runner started in 7.23 seconds"
Jan 30 13:16:35 nixy ollama[13814]: time=2026-01-30T13:16:35.427-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096
Jan 30 13:16:37 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:37 | 404 |       9.458µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:45 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:45 | 404 |      30.919µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:57 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:57 | 404 |      28.344µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:17:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:17:15 | 404 |       26.41µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:17:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:17:40 | 404 |      28.184µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:18:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:18:10 | 404 |      27.823µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:18:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:18:40 | 404 |      13.756µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:19:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:19:10 | 404 |      29.497µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:19:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:19:40 | 404 |      27.863µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:20:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:20:10 | 404 |      27.072µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:20:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:20:40 | 404 |      13.846µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:21:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:10 | 404 |      20.679µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:21:28 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:28 | 500 |          5m0s |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:21:29 nixy ollama[13814]: time=2026-01-30T13:21:29.061-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096
Jan 30 13:21:33 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:33 | 404 |      27.904µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:21:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:40 | 404 |      28.395µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:22:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:22:10 | 404 |      25.909µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:22:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:22:40 | 404 |      20.459µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:23:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:23:10 | 404 |      21.512µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:23:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:23:40 | 404 |      14.458µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:24:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:24:10 | 404 |      18.496µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:24:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:24:40 | 404 |      28.114µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:25:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:25:10 | 404 |      36.881µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:25:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:25:40 | 404 |      21.431µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:26:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:10 | 404 |      27.993µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:26:29 nixy ollama[13814]: time=2026-01-30T13:26:29.398-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
Jan 30 13:26:29 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:29 | 500 |          5m0s |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:26:30 nixy ollama[13814]: time=2026-01-30T13:26:30.558-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096
Jan 30 13:26:34 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:34 | 404 |      27.763µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:26:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:40 | 404 |      29.276µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:27:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:27:10 | 404 |      28.114µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:27:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:27:40 | 404 |      22.613µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:28:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:28:10 | 404 |      17.864µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:28:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:28:40 | 404 |      26.922µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:29:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:29:10 | 404 |      28.225µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:29:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:29:40 | 404 |      19.818µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:30:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:30:10 | 404 |      18.946µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:30:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:30:40 | 404 |      12.675µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:31:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:10 | 404 |      42.221µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:31:30 nixy ollama[13814]: time=2026-01-30T13:31:30.887-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
Jan 30 13:31:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:30 | 500 |          5m0s |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:31:33 nixy ollama[13814]: time=2026-01-30T13:31:33.396-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096
Jan 30 13:31:35 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:35 | 404 |      30.999µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:31:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:40 | 404 |       10.03µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:32:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:32:10 | 404 |      26.962µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:32:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:32:40 | 404 |      27.102µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:33:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:33:10 | 404 |      25.619µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:33:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:33:40 | 404 |      24.898µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:34:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:34:10 | 404 |      18.264µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:34:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:34:40 | 404 |      27.822µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:35:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:35:10 | 404 |      30.127µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:35:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:35:40 | 404 |      27.442µs |             ::1 | POST     "/api/event_logging/batch"

OS:
NixOS

GPU
AMD

CPU
AMD

Ollama Version
0.15.1

<!-- gh-comment-id:3825108402 --> @KyleJFischer commented on GitHub (Jan 30, 2026): I am getting the same error on 15.1: The error seems to be related to truncating the original prompt: My prompt was "test" and that seems to be where it stalled and then it reprompted it self. nixy is the name of my machine ``` Jan 30 13:13:05 nixy ollama[13814]: time=2026-01-30T13:13:05.228-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13872 keep=4 new=4096 Jan 30 13:13:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:10 | 404 | 28.344µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:13:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:25 | 404 | 18.425µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:13:55 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:55 | 404 | 27.914µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:14:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:14:25 | 404 | 27.723µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:14:55 nixy ollama[13814]: [GIN] 2026/01/30 - 13:14:55 | 404 | 18.826µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:04 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:04 | 500 | 1m59s | ::1 | POST "/v1/messages?beta=true" Jan 30 13:15:04 nixy ollama[13814]: time=2026-01-30T13:15:04.227-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" Jan 30 13:15:06 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:06 | 404 | 35.238µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:06 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:06 | 404 | 5.33µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:15 | 200 | 24.266µs | 127.0.0.1 | HEAD "/" Jan 30 13:15:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:15 | 200 | 690.91µs | 127.0.0.1 | GET "/api/tags" Jan 30 13:15:18 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:18 | 404 | 37.472µs | ::1 | POST "/v1/messages/count_tokens?beta=true" Jan 30 13:15:18 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:18 | 404 | 1.838782ms | ::1 | POST "/v1/messages?beta=true" Jan 30 13:15:21 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:21 | 404 | 519.172µs | ::1 | POST "/v1/messages?beta=true" Jan 30 13:15:21 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:21 | 404 | 253.855µs | ::1 | POST "/v1/messages?beta=true" Jan 30 13:15:21 nixy ollama[13814]: time=2026-01-30T13:15:21.358-05:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b Jan 30 13:15:21 nixy ollama[13814]: time=2026-01-30T13:15:21.417-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13872 keep=4 new=4096 Jan 30 13:15:23 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:23 | 404 | 29.897µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:23 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:23 | 404 | 7.264µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:25 | 404 | 7.354µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:30 | 404 | 9.238µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:38 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:38 | 404 | 31.991µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:50 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:50 | 404 | 18.595µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:08 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:08 | 404 | 27.202µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:14 nixy ollama[13814]: time=2026-01-30T13:16:14.059-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 500 | 52.71675636s | ::1 | POST "/v1/messages?beta=true" Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 404 | 34.857µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 404 | 1.923µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:19 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:19 | 200 | 22.793µs | 127.0.0.1 | HEAD "/" Jan 30 13:16:19 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:19 | 200 | 1.282722ms | 127.0.0.1 | GET "/api/tags" Jan 30 13:16:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:25 | 404 | 27.442µs | ::1 | POST "/v1/messages/count_tokens?beta=true" Jan 30 13:16:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:25 | 404 | 2.729146ms | ::1 | POST "/v1/messages?beta=true" Jan 30 13:16:27 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:27 | 404 | 843.192µs | ::1 | POST "/v1/messages?beta=true" Jan 30 13:16:27 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:27 | 404 | 483.935µs | ::1 | POST "/v1/messages?beta=true" Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000 Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000 Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.092-05:00 level=INFO source=sched.go:635 msg="updated VRAM based on existing loaded models" gpu=00000000-c200-0000-0000-000000000000 library=Vulkan total="63.0 GiB" available="44.9 GiB" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.139-05:00 level=INFO source=server.go:245 msg="enabling flash attention" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/bin/ollama runner --ollama-engine --model /var/lib/ollama/models/blobs/sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 39223" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=sched.go:452 msg="system memory" total="125.1 GiB" free="101.5 GiB" free_swap="0 B" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=sched.go:459 msg="gpu memory" id=00000000-c200-0000-0000-000000000000 library=Vulkan available="44.5 GiB" free="44.9 GiB" minimum="457.0 MiB" overhead="0 B" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=server.go:755 msg="loading model" "model layers"=48 requested=-1 Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.148-05:00 level=INFO source=runner.go:1405 msg="starting ollama engine" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.149-05:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:39223" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.152-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.185-05:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39 Jan 30 13:16:28 nixy ollama[13814]: ggml_vulkan: Found 1 Vulkan devices: Jan 30 13:16:28 nixy ollama[13814]: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat Jan 30 13:16:28 nixy ollama[13814]: load_backend: loaded Vulkan backend from /nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/lib/ollama/libggml-vulkan.so Jan 30 13:16:28 nixy ollama[13814]: load_backend: loaded CPU backend from /nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/lib/ollama/libggml-cpu-icelake.so Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.218-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000 Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.236-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000 Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.099-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="17.5 GiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:482 msg="offloading 47 repeating layers to GPU" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:494 msg="offloaded 48/48 layers to GPU" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="170.2 MiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="399.5 MiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="76.0 MiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:272 msg="total memory" size="18.2 GiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=sched.go:526 msg="loaded runners" count=2 Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=server.go:1347 msg="waiting for llama runner to start responding" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model" Jan 30 13:16:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:30 | 404 | 19.046µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:30 | 404 | 6.252µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:32 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:32 | 404 | 6.702µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:35 nixy ollama[13814]: time=2026-01-30T13:16:35.367-05:00 level=INFO source=server.go:1385 msg="llama runner started in 7.23 seconds" Jan 30 13:16:35 nixy ollama[13814]: time=2026-01-30T13:16:35.427-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096 Jan 30 13:16:37 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:37 | 404 | 9.458µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:45 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:45 | 404 | 30.919µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:57 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:57 | 404 | 28.344µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:17:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:17:15 | 404 | 26.41µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:17:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:17:40 | 404 | 28.184µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:18:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:18:10 | 404 | 27.823µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:18:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:18:40 | 404 | 13.756µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:19:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:19:10 | 404 | 29.497µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:19:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:19:40 | 404 | 27.863µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:20:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:20:10 | 404 | 27.072µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:20:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:20:40 | 404 | 13.846µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:21:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:10 | 404 | 20.679µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:21:28 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:28 | 500 | 5m0s | ::1 | POST "/v1/messages?beta=true" Jan 30 13:21:29 nixy ollama[13814]: time=2026-01-30T13:21:29.061-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096 Jan 30 13:21:33 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:33 | 404 | 27.904µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:21:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:40 | 404 | 28.395µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:22:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:22:10 | 404 | 25.909µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:22:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:22:40 | 404 | 20.459µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:23:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:23:10 | 404 | 21.512µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:23:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:23:40 | 404 | 14.458µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:24:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:24:10 | 404 | 18.496µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:24:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:24:40 | 404 | 28.114µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:25:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:25:10 | 404 | 36.881µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:25:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:25:40 | 404 | 21.431µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:26:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:10 | 404 | 27.993µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:26:29 nixy ollama[13814]: time=2026-01-30T13:26:29.398-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" Jan 30 13:26:29 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:29 | 500 | 5m0s | ::1 | POST "/v1/messages?beta=true" Jan 30 13:26:30 nixy ollama[13814]: time=2026-01-30T13:26:30.558-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096 Jan 30 13:26:34 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:34 | 404 | 27.763µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:26:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:40 | 404 | 29.276µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:27:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:27:10 | 404 | 28.114µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:27:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:27:40 | 404 | 22.613µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:28:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:28:10 | 404 | 17.864µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:28:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:28:40 | 404 | 26.922µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:29:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:29:10 | 404 | 28.225µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:29:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:29:40 | 404 | 19.818µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:30:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:30:10 | 404 | 18.946µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:30:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:30:40 | 404 | 12.675µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:31:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:10 | 404 | 42.221µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:31:30 nixy ollama[13814]: time=2026-01-30T13:31:30.887-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" Jan 30 13:31:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:30 | 500 | 5m0s | ::1 | POST "/v1/messages?beta=true" Jan 30 13:31:33 nixy ollama[13814]: time=2026-01-30T13:31:33.396-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096 Jan 30 13:31:35 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:35 | 404 | 30.999µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:31:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:40 | 404 | 10.03µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:32:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:32:10 | 404 | 26.962µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:32:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:32:40 | 404 | 27.102µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:33:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:33:10 | 404 | 25.619µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:33:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:33:40 | 404 | 24.898µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:34:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:34:10 | 404 | 18.264µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:34:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:34:40 | 404 | 27.822µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:35:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:35:10 | 404 | 30.127µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:35:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:35:40 | 404 | 27.442µs | ::1 | POST "/api/event_logging/batch" ``` OS: NixOS GPU AMD CPU AMD Ollama Version 0.15.1
Author
Owner

@KyleJFischer commented on GitHub (Jan 30, 2026):

Note this is using glm-4.7, I was also getting this issue with gwen3-coder and any model really

<!-- gh-comment-id:3825110342 --> @KyleJFischer commented on GitHub (Jan 30, 2026): Note this is using glm-4.7, I was also getting this issue with gwen3-coder and any model really
Author
Owner

@omer1abay commented on GitHub (Jan 30, 2026):

Sorry for the late response, these are the logs;

[GIN] 2026/01/30 - 22:18:58 | 200 | 507.4µs | 127.0.0.1 | GET "/api/version"
[GIN] 2026/01/30 - 22:18:58 | 200 | 324.5411ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/01/30 - 22:18:59 | 200 | 237.6ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:18:59 | 200 | 288.086ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:18:59 | 200 | 233.7476ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:18:59 | 200 | 112.5679ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:00 | 200 | 243.4562ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:00 | 200 | 326.5986ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:00 | 200 | 341.0319ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:01 | 200 | 251.4141ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:01 | 200 | 237.9697ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:01 | 200 | 313.0391ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:02 | 200 | 580.0144ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:08 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2026/01/30 - 22:19:13 | 404 | 0s | 127.0.0.1 | POST "/v1/messages/count_tokens?beta=true"
[GIN] 2026/01/30 - 22:19:13 | 404 | 11.9269ms | 127.0.0.1 | POST "/v1/messages?beta=true"
[GIN] 2026/01/30 - 22:19:18 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:19:19 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:19:21 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:19:22 | 404 | 8.61ms | 127.0.0.1 | POST "/v1/messages?beta=true"
[GIN] 2026/01/30 - 22:19:22 | 404 | 7.5979ms | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:19:22.494+03:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b
time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=10 threads=16
time=2026-01-30T22:19:22.731+03:00 level=INFO source=server.go:245 msg="enabling flash attention"
time=2026-01-30T22:19:22.739+03:00 level=INFO source=server.go:429 msg="starting runner" cmd="C:\Users\Ömer Abay\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --model C:\Users\Ömer Abay\.ollama\models\blobs\sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a --port 53740"
time=2026-01-30T22:19:22.748+03:00 level=INFO source=sched.go:452 msg="system memory" total="62.9 GiB" free="28.1 GiB" free_swap="8.5 GiB"
time=2026-01-30T22:19:22.748+03:00 level=INFO source=server.go:755 msg="loading model" "model layers"=49 requested=-1
time=2026-01-30T22:19:22.834+03:00 level=INFO source=runner.go:1405 msg="starting ollama engine"
time=2026-01-30T22:19:23.014+03:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:53740"
time=2026-01-30T22:19:23.026+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-01-30T22:19:23.076+03:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3moe file_type=Q4_K_M name="Qwen3 Coder 30B A3B Instruct" description="" num_tensors=579 num_key_values=35
load_backend: loaded CPU backend from C:\Users\Ömer Abay\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2026-01-30T22:19:23.440+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2026-01-30T22:19:23.676+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[GIN] 2026/01/30 - 22:19:25 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:19:25 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:494 msg="offloaded 0/49 layers to GPU"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="17.3 GiB"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="6.0 GiB"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="144.0 MiB"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:272 msg="total memory" size="23.4 GiB"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=sched.go:526 msg="loaded runners" count=1
time=2026-01-30T22:19:26.467+03:00 level=INFO source=server.go:1347 msg="waiting for llama runner to start responding"
time=2026-01-30T22:19:26.468+03:00 level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model"
[GIN] 2026/01/30 - 22:19:33 | 404 | 545.8µs | 127.0.0.1 | POST "/api/event_logging/batch"
time=2026-01-30T22:19:36.636+03:00 level=INFO source=server.go:1385 msg="llama runner started in 13.89 seconds"
[GIN] 2026/01/30 - 22:19:46 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:20:04 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:20:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:20:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:21:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:21:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:22:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:22:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:23:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:23:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:24:26 | 500 | 5m4s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:24:27.447+03:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b
[GIN] 2026/01/30 - 22:24:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:24:31 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:24:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
[GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"

<!-- gh-comment-id:3825360176 --> @omer1abay commented on GitHub (Jan 30, 2026): Sorry for the late response, these are the logs; [GIN] 2026/01/30 - 22:18:58 | 200 | 507.4µs | 127.0.0.1 | GET "/api/version" [GIN] 2026/01/30 - 22:18:58 | 200 | 324.5411ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/01/30 - 22:18:59 | 200 | 237.6ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:18:59 | 200 | 288.086ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:18:59 | 200 | 233.7476ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:18:59 | 200 | 112.5679ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:00 | 200 | 243.4562ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:00 | 200 | 326.5986ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:00 | 200 | 341.0319ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:01 | 200 | 251.4141ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:01 | 200 | 237.9697ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:01 | 200 | 313.0391ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:02 | 200 | 580.0144ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:08 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2026/01/30 - 22:19:13 | 404 | 0s | 127.0.0.1 | POST "/v1/messages/count_tokens?beta=true" [GIN] 2026/01/30 - 22:19:13 | 404 | 11.9269ms | 127.0.0.1 | POST "/v1/messages?beta=true" [GIN] 2026/01/30 - 22:19:18 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:19:19 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:19:21 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:19:22 | 404 | 8.61ms | 127.0.0.1 | POST "/v1/messages?beta=true" [GIN] 2026/01/30 - 22:19:22 | 404 | 7.5979ms | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-01-30T22:19:22.494+03:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=10 threads=16 time=2026-01-30T22:19:22.731+03:00 level=INFO source=server.go:245 msg="enabling flash attention" time=2026-01-30T22:19:22.739+03:00 level=INFO source=server.go:429 msg="starting runner" cmd="C:\\Users\\Ömer Abay\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Ömer Abay\\.ollama\\models\\blobs\\sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a --port 53740" time=2026-01-30T22:19:22.748+03:00 level=INFO source=sched.go:452 msg="system memory" total="62.9 GiB" free="28.1 GiB" free_swap="8.5 GiB" time=2026-01-30T22:19:22.748+03:00 level=INFO source=server.go:755 msg="loading model" "model layers"=49 requested=-1 time=2026-01-30T22:19:22.834+03:00 level=INFO source=runner.go:1405 msg="starting ollama engine" time=2026-01-30T22:19:23.014+03:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:53740" time=2026-01-30T22:19:23.026+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-01-30T22:19:23.076+03:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3moe file_type=Q4_K_M name="Qwen3 Coder 30B A3B Instruct" description="" num_tensors=579 num_key_values=35 load_backend: loaded CPU backend from C:\Users\Ömer Abay\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll time=2026-01-30T22:19:23.440+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2026-01-30T22:19:23.676+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [GIN] 2026/01/30 - 22:19:25 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:19:25 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" time=2026-01-30T22:19:26.467+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:494 msg="offloaded 0/49 layers to GPU" time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="17.3 GiB" time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="6.0 GiB" time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="144.0 MiB" time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:272 msg="total memory" size="23.4 GiB" time=2026-01-30T22:19:26.467+03:00 level=INFO source=sched.go:526 msg="loaded runners" count=1 time=2026-01-30T22:19:26.467+03:00 level=INFO source=server.go:1347 msg="waiting for llama runner to start responding" time=2026-01-30T22:19:26.468+03:00 level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model" [GIN] 2026/01/30 - 22:19:33 | 404 | 545.8µs | 127.0.0.1 | POST "/api/event_logging/batch" time=2026-01-30T22:19:36.636+03:00 level=INFO source=server.go:1385 msg="llama runner started in 13.89 seconds" [GIN] 2026/01/30 - 22:19:46 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:20:04 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:20:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:20:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:21:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:21:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:22:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:22:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:23:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:23:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:24:26 | 500 | 5m4s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-01-30T22:24:27.447+03:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b [GIN] 2026/01/30 - 22:24:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:24:31 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:24:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" [GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
Author
Owner

@rick-github commented on GitHub (Jan 31, 2026):

@KyleJFischer Increase the size of the model context as described here, then re-test.

@omer1abay

[GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"

The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU.

Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in:

DISABLE_TELEMETRY=1
DISABLE_ERROR_REPORTING=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
<!-- gh-comment-id:3828588124 --> @rick-github commented on GitHub (Jan 31, 2026): @KyleJFischer Increase the size of the model context as described [here](https://docs.ollama.com/integrations/claude-code#manual-setup:~:text=Note%3A%20Claude%20Code%20requires%20a%20large%20context%20window.%20We%20recommend%20at%20least%2064k%20tokens.%20See%20the%20context%20length%20documentation%20for%20how%20to%20adjust%20context%20length%20in%20Ollama.), then re-test. @omer1abay ``` [GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" ``` The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU. Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in: ``` DISABLE_TELEMETRY=1 DISABLE_ERROR_REPORTING=1 CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 ```
Author
Owner

@stratmm commented on GitHub (Feb 6, 2026):

@KyleJFischer Increase the size of the model context as described here, then re-test.

@omer1abay

[GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"

The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU.

Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in:

DISABLE_TELEMETRY=1
DISABLE_ERROR_REPORTING=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

@rick-github When you say "Increase the timeout",; which timeout do you mean? Can't find any Claude code documentation on timeouts related to model responses?

<!-- gh-comment-id:3858983637 --> @stratmm commented on GitHub (Feb 6, 2026): > [@KyleJFischer](https://github.com/KyleJFischer) Increase the size of the model context as described [here](https://docs.ollama.com/integrations/claude-code#manual-setup:~:text=Note%3A%20Claude%20Code%20requires%20a%20large%20context%20window.%20We%20recommend%20at%20least%2064k%20tokens.%20See%20the%20context%20length%20documentation%20for%20how%20to%20adjust%20context%20length%20in%20Ollama.), then re-test. > > [@omer1abay](https://github.com/omer1abay) > > ``` > [GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true" > time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" > ``` > > The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU. > > Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in: > > ``` > DISABLE_TELEMETRY=1 > DISABLE_ERROR_REPORTING=1 > CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 > ``` @rick-github When you say "Increase the timeout",; which timeout do you mean? Can't find any Claude code documentation on timeouts related to model responses?
Author
Owner

@rick-github commented on GitHub (Feb 6, 2026):

Unfortunately I'm not a Claude Code user so I don't know what configuration options are available.

<!-- gh-comment-id:3859028537 --> @rick-github commented on GitHub (Feb 6, 2026): Unfortunately I'm not a Claude Code user so I don't know what configuration options are available.
Author
Owner

@stratmm commented on GitHub (Feb 9, 2026):

@KyleJFischer Increase the size of the model context as described here, then re-test.

@omer1abay

[GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"

The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU.

Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in:

DISABLE_TELEMETRY=1
DISABLE_ERROR_REPORTING=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

@KyleJFischer, I have found a solution that has worked for me. I was using Ollama to run my models as I incorrectly thought that Claude Code only supports Ollama.

I switched to running the exact same models on llama.cpp, and now timeouts are no longer an issue.

There are a number of key differences between my Ollama and llama.cpp setups:

  1. Ollama is running the vulkan drivers and is therefore slower
  2. llama.cpp is running the AMD rocm nightly drivers and is therefore at least 30% faster.
  3. The models I am now running are the unsloth versions, in this case Qwen3-Coder-Next

I just dont know if the improvement is due to llama.cpp speed, differences in the models or differences in the llama.cpp api compared to ollama.

If it helps I have pased my llama.cpp docker container and docker-compose that I am running.

# build
FROM registry.fedoraproject.org/fedora:43 AS builder

RUN dnf -y --nodocs --setopt=install_weak_deps=False install \
  make gcc cmake lld clang clang-devel compiler-rt libcurl-devel \
  radeontop git vim patch curl ninja-build tar xz aria2c \
  && dnf clean all && rm -rf /var/cache/dnf/*

# find & fetch the latest Linux 7.x.x tarball (gfx1151)
WORKDIR /tmp
ARG ROCM_MAJOR_VER=7
ARG GFX=gfx1151
RUN set -euo pipefail; \
  BASE="https://therock-nightly-tarball.s3.amazonaws.com"; \
  PREFIX="therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}"; \
  KEY="$(curl -s "${BASE}?list-type=2&prefix=${PREFIX}" \
  | tr '<' '\n' \
  | grep -o "therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}\..*\.tar\.gz" \
  | sort -V | tail -n1)"; \
  echo "Latest tarball: ${KEY}"; \
  aria2c -x 16 -s 16 -j 16 --file-allocation=none "${BASE}/${KEY}" -o therock.tar.gz
RUN mkdir -p /opt/rocm-7.0 \
  && tar xzf therock.tar.gz -C /opt/rocm-7.0 --strip-components=1

ENV ROCM_PATH=/opt/rocm-7.0 \
  HIP_PLATFORM=amd \
  HIP_PATH=/opt/rocm-7.0 \
  HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \
  HIP_INCLUDE_PATH=/opt/rocm-7.0/include \
  HIP_LIB_PATH=/opt/rocm-7.0/lib \
  HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \
  PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \
  LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \
  CPATH=/opt/rocm-7.0/include \
  PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig

RUN printf '%s\n' \
  'export ROCM_PATH=/opt/rocm-7.0' \
  'export HIP_PLATFORM=amd' \
  'export HIP_PATH=/opt/rocm-7.0' \
  'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \
  'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \
  'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \
  'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \
  'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \
  'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \
  'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \
  'export CPATH="$HIP_INCLUDE_PATH"' \
  'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \
  > /etc/profile.d/rocm.sh \
  && chmod +x /etc/profile.d/rocm.sh \
  && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc

WORKDIR /opt/llama.cpp
RUN git clone --recursive https://github.com/ggerganov/llama.cpp.git . \
  && git clean -xdf \
  && git submodule update --recursive

RUN cmake -S . -B build \
  -DGGML_HIP=ON \
  -DAMDGPU_TARGETS=gfx1151 \
  -DCMAKE_BUILD_TYPE=Release \
  -DGGML_RPC=ON \
  -DLLAMA_HIP_UMA=ON \
  && cmake --build build --config Release -- -j$(nproc) \
  && cmake --install build --config Release

# keep bin; drop headers/docs/static libs (retain llama.cpp for rpc binaries)
RUN find /opt/rocm-7.0 -type f -name '*.a' -delete \
  && rm -rf /opt/rocm-7.0/include /opt/rocm-7.0/share \
  /opt/rocm-7.0/llvm/include /opt/rocm-7.0/llvm/share

# runtime
FROM registry.fedoraproject.org/fedora-minimal:43

RUN microdnf -y --nodocs --setopt=install_weak_deps=0 install \
  bash ca-certificates libatomic libstdc++ libgcc radeontop vim procps-ng \
  && microdnf clean all && rm -rf /var/cache/dnf/*

COPY --from=builder /opt/rocm-7.0 /opt/rocm-7.0
COPY --from=builder /usr/local/ /usr/local/
COPY --from=builder /opt/llama.cpp/build/bin/rpc-* /usr/local/bin/

# COPY gguf-vram-estimator.py /usr/local/bin/
# RUN chmod +x /usr/local/bin/gguf-vram-estimator.py

ENV ROCM_PATH=/opt/rocm-7.0 \
  HIP_PLATFORM=amd \
  HIP_PATH=/opt/rocm-7.0 \
  HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \
  HIP_INCLUDE_PATH=/opt/rocm-7.0/include \
  HIP_LIB_PATH=/opt/rocm-7.0/lib \
  HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \
  PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \
  LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \
  CPATH=/opt/rocm-7.0/include \
  PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig

RUN printf '%s\n' \
  'export ROCM_PATH=/opt/rocm-7.0' \
  'export HIP_PLATFORM=amd' \
  'export HIP_PATH=/opt/rocm-7.0' \
  'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \
  'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \
  'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \
  'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \
  'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \
  'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \
  'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \
  'export CPATH="$HIP_INCLUDE_PATH"' \
  'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \
  > /etc/profile.d/rocm.sh \
  && chmod +x /etc/profile.d/rocm.sh \
  && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc

# make /usr/local libs visible without touching env
RUN echo "/usr/local/lib"  > /etc/ld.so.conf.d/local.conf \
  && echo "/usr/local/lib64" >> /etc/ld.so.conf.d/local.conf \
  && ldconfig

CMD ["/bin/bash"]
  qwen-3-coder-next-rocm:
    image: llamacpp-rocm
    container_name: llamacpp
    restart: unless-stopped
    devices:
      - /dev/dri:/dev/dri
      - /dev/kfd:/dev/kfd
    group_add:
      - "video"
      - "render"
    volumes:
      - /home/mark/running-llms/:/root/running-llms
    ports:
      - "8080:8080"
    security_opt:
      - seccomp=unconfined
    command: >
      bash -c "llama-server --alias Qwen3-Coder-Next -m /root/running-llms/hf-models/unsloth/Qwen3-Coder-Next-GGUF/UD-Q8_K_XL/Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf --ctx-size 262144 -fa 1 --no-mmap --host 0.0.0.0 --port 8080 --temp 1.0 --top-k 40 --min-p 0.01 --top-p 0.95 --jinja -ngl 99 --threads -1"

Hope this helps

<!-- gh-comment-id:3871094499 --> @stratmm commented on GitHub (Feb 9, 2026): > [@KyleJFischer](https://github.com/KyleJFischer) Increase the size of the model context as described [here](https://docs.ollama.com/integrations/claude-code#manual-setup:~:text=Note%3A%20Claude%20Code%20requires%20a%20large%20context%20window.%20We%20recommend%20at%20least%2064k%20tokens.%20See%20the%20context%20length%20documentation%20for%20how%20to%20adjust%20context%20length%20in%20Ollama.), then re-test. > > [@omer1abay](https://github.com/omer1abay) > > ``` > [GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true" > time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" > ``` > > The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU. > > Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in: > > ``` > DISABLE_TELEMETRY=1 > DISABLE_ERROR_REPORTING=1 > CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 > ``` @KyleJFischer, I have found a solution that has worked for me. I was using Ollama to run my models as I incorrectly thought that Claude Code only supports Ollama. I switched to running the exact same models on llama.cpp, and now timeouts are no longer an issue. There are a number of key differences between my Ollama and llama.cpp setups: 1. Ollama is running the vulkan drivers and is therefore slower 2. llama.cpp is running the AMD rocm nightly drivers and is therefore at least 30% faster. 3. The models I am now running are the unsloth versions, in this case Qwen3-Coder-Next I just dont know if the improvement is due to llama.cpp speed, differences in the models or differences in the llama.cpp api compared to ollama. If it helps I have pased my llama.cpp docker container and docker-compose that I am running. ``` # build FROM registry.fedoraproject.org/fedora:43 AS builder RUN dnf -y --nodocs --setopt=install_weak_deps=False install \ make gcc cmake lld clang clang-devel compiler-rt libcurl-devel \ radeontop git vim patch curl ninja-build tar xz aria2c \ && dnf clean all && rm -rf /var/cache/dnf/* # find & fetch the latest Linux 7.x.x tarball (gfx1151) WORKDIR /tmp ARG ROCM_MAJOR_VER=7 ARG GFX=gfx1151 RUN set -euo pipefail; \ BASE="https://therock-nightly-tarball.s3.amazonaws.com"; \ PREFIX="therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}"; \ KEY="$(curl -s "${BASE}?list-type=2&prefix=${PREFIX}" \ | tr '<' '\n' \ | grep -o "therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}\..*\.tar\.gz" \ | sort -V | tail -n1)"; \ echo "Latest tarball: ${KEY}"; \ aria2c -x 16 -s 16 -j 16 --file-allocation=none "${BASE}/${KEY}" -o therock.tar.gz RUN mkdir -p /opt/rocm-7.0 \ && tar xzf therock.tar.gz -C /opt/rocm-7.0 --strip-components=1 ENV ROCM_PATH=/opt/rocm-7.0 \ HIP_PLATFORM=amd \ HIP_PATH=/opt/rocm-7.0 \ HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \ HIP_INCLUDE_PATH=/opt/rocm-7.0/include \ HIP_LIB_PATH=/opt/rocm-7.0/lib \ HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \ PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \ LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \ CPATH=/opt/rocm-7.0/include \ PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig RUN printf '%s\n' \ 'export ROCM_PATH=/opt/rocm-7.0' \ 'export HIP_PLATFORM=amd' \ 'export HIP_PATH=/opt/rocm-7.0' \ 'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \ 'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \ 'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \ 'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \ 'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \ 'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \ 'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \ 'export CPATH="$HIP_INCLUDE_PATH"' \ 'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \ > /etc/profile.d/rocm.sh \ && chmod +x /etc/profile.d/rocm.sh \ && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc WORKDIR /opt/llama.cpp RUN git clone --recursive https://github.com/ggerganov/llama.cpp.git . \ && git clean -xdf \ && git submodule update --recursive RUN cmake -S . -B build \ -DGGML_HIP=ON \ -DAMDGPU_TARGETS=gfx1151 \ -DCMAKE_BUILD_TYPE=Release \ -DGGML_RPC=ON \ -DLLAMA_HIP_UMA=ON \ && cmake --build build --config Release -- -j$(nproc) \ && cmake --install build --config Release # keep bin; drop headers/docs/static libs (retain llama.cpp for rpc binaries) RUN find /opt/rocm-7.0 -type f -name '*.a' -delete \ && rm -rf /opt/rocm-7.0/include /opt/rocm-7.0/share \ /opt/rocm-7.0/llvm/include /opt/rocm-7.0/llvm/share # runtime FROM registry.fedoraproject.org/fedora-minimal:43 RUN microdnf -y --nodocs --setopt=install_weak_deps=0 install \ bash ca-certificates libatomic libstdc++ libgcc radeontop vim procps-ng \ && microdnf clean all && rm -rf /var/cache/dnf/* COPY --from=builder /opt/rocm-7.0 /opt/rocm-7.0 COPY --from=builder /usr/local/ /usr/local/ COPY --from=builder /opt/llama.cpp/build/bin/rpc-* /usr/local/bin/ # COPY gguf-vram-estimator.py /usr/local/bin/ # RUN chmod +x /usr/local/bin/gguf-vram-estimator.py ENV ROCM_PATH=/opt/rocm-7.0 \ HIP_PLATFORM=amd \ HIP_PATH=/opt/rocm-7.0 \ HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \ HIP_INCLUDE_PATH=/opt/rocm-7.0/include \ HIP_LIB_PATH=/opt/rocm-7.0/lib \ HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \ PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \ LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \ CPATH=/opt/rocm-7.0/include \ PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig RUN printf '%s\n' \ 'export ROCM_PATH=/opt/rocm-7.0' \ 'export HIP_PLATFORM=amd' \ 'export HIP_PATH=/opt/rocm-7.0' \ 'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \ 'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \ 'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \ 'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \ 'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \ 'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \ 'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \ 'export CPATH="$HIP_INCLUDE_PATH"' \ 'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \ > /etc/profile.d/rocm.sh \ && chmod +x /etc/profile.d/rocm.sh \ && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc # make /usr/local libs visible without touching env RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/local.conf \ && echo "/usr/local/lib64" >> /etc/ld.so.conf.d/local.conf \ && ldconfig CMD ["/bin/bash"] ``` ``` qwen-3-coder-next-rocm: image: llamacpp-rocm container_name: llamacpp restart: unless-stopped devices: - /dev/dri:/dev/dri - /dev/kfd:/dev/kfd group_add: - "video" - "render" volumes: - /home/mark/running-llms/:/root/running-llms ports: - "8080:8080" security_opt: - seccomp=unconfined command: > bash -c "llama-server --alias Qwen3-Coder-Next -m /root/running-llms/hf-models/unsloth/Qwen3-Coder-Next-GGUF/UD-Q8_K_XL/Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf --ctx-size 262144 -fa 1 --no-mmap --host 0.0.0.0 --port 8080 --temp 1.0 --top-k 40 --min-p 0.01 --top-p 0.95 --jinja -ngl 99 --threads -1" ``` Hope this helps
Author
Owner

@lvvorovi commented on GitHub (Apr 2, 2026):

same issue with timeout using ollama.
Did anyone find a way to fix it without switching from ollama?


time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds"
time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000

time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format=""
time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""
time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198

time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection"
time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000
time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1
[GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5

time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""

time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection"
time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000
time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1
[GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true"


CLAUDE

export PATH="$HOME/.local/bin:$PATH"
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
export API_TIMEOUT_MS=600000000
export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000
export CLAUDE_ENABLE_STREAM_WATCHDOG=0
export DISABLE_TELEMETRY=1
export DISABLE_ERROR_REPORTING=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

OLLAMA

export OLLAMA_LOAD_TIMEOUT=60000000
export OLLAMA_KEEP_ALIVE=60000000
export OLLAMA_CONTEXT_LENGTH=128000
export OLLAMA_DEBUG=1


ollama serve
ollama launch claude --model qwen3.5:0.8b


Claude Code v2.1.90
ollama version is 0.19.0

<!-- gh-comment-id:4178895639 --> @lvvorovi commented on GitHub (Apr 2, 2026): same issue with timeout using ollama. Did anyone find a way to fix it without switching from ollama? ------------------------------- time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198 time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true" --------------- # CLAUDE export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 # OLLAMA export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1 --------------------------------- ollama serve ollama launch claude --model qwen3.5:0.8b ------------------------------ Claude Code v2.1.90 ollama version is 0.19.0
Author
Owner

@omer1abay commented on GitHub (Apr 2, 2026):

same issue with timeout using ollama. Did anyone find a way to fix it without switching from ollama?

time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000

time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198

time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5

time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""

time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true"

CLAUDE

export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

OLLAMA

export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1

ollama serve ollama launch claude --model qwen3.5:0.8b

Claude Code v2.1.90 ollama version is 0.19.0

It worked on my personal computer which is have a GPU, when I opened this issue I had office laptop (it's also strong but without GPU) so I got timeout. But in my personal computer, it's still slow but end of the day no timeout error, Do you have GPU on your PC?

<!-- gh-comment-id:4179313305 --> @omer1abay commented on GitHub (Apr 2, 2026): > same issue with timeout using ollama. Did anyone find a way to fix it without switching from ollama? > > time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 > > time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198 > > time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 > > time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" > > time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true" > > # CLAUDE > export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 > > # OLLAMA > export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1 > > ollama serve ollama launch claude --model qwen3.5:0.8b > > Claude Code v2.1.90 ollama version is 0.19.0 It worked on my personal computer which is have a GPU, when I opened this issue I had office laptop (it's also strong but without GPU) so I got timeout. But in my personal computer, it's still slow but end of the day no timeout error, Do you have GPU on your PC?
Author
Owner

@lvvorovi commented on GitHub (Apr 3, 2026):

same issue with timeout using ollama. Did anyone find a way to fix it without switching from ollama?
time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000
time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198
time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5
time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""
time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true"

CLAUDE

export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

OLLAMA

export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1
ollama serve ollama launch claude --model qwen3.5:0.8b
Claude Code v2.1.90 ollama version is 0.19.0

It worked on my personal computer which is have a GPU, when I opened this issue I had office laptop (it's also strong but without GPU) so I got timeout. But in my personal computer, it's still slow but end of the day no timeout error, Do you have GPU on your PC?

it is obviously due to time it takes. My setup has no GPU, so it takes longer. I am in general OK with the time it takes, just need to find a way to configure ClaudeCode/Ollama to be OK with that too.

<!-- gh-comment-id:4183695662 --> @lvvorovi commented on GitHub (Apr 3, 2026): > > same issue with timeout using ollama. Did anyone find a way to fix it without switching from ollama? > > time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 > > time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198 > > time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 > > time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" > > time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true" > > # CLAUDE > > export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 > > # OLLAMA > > export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1 > > ollama serve ollama launch claude --model qwen3.5:0.8b > > Claude Code v2.1.90 ollama version is 0.19.0 > > It worked on my personal computer which is have a GPU, when I opened this issue I had office laptop (it's also strong but without GPU) so I got timeout. But in my personal computer, it's still slow but end of the day no timeout error, Do you have GPU on your PC? it is obviously due to time it takes. My setup has no GPU, so it takes longer. I am in general OK with the time it takes, just need to find a way to configure ClaudeCode/Ollama to be OK with that too.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9121