[GH-ISSUE #13939] Timeout Error when running ollama with claude code using qwen3-coder:30b #9121

New Issue

GiteaMirror · 2026-04-12T21:58:34-05:00

GiteaMirror commented

2026-04-12 21:58:34 -05:00

Originally created by @omer1abay on GitHub (Jan 27, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13939

What is the issue?

Even though my system requirements are good enough, I get a timeout when I run the qwen3-coder:30b model with Claude Code. When I checked the logs in the debug folder, I saw that the error ‘“message”:“model 'claude-haiku-4-5-20251001’ not found”' was logged multiple times (You can check the full version of the error log, if it's enough to check the error I can send the all log file). Is there a solution for this?
I set the context length to 64k,
and I have a computer with 64 GB of RAM,
Ollama version 0.15.2
Claude Code version 2.1.20

2026-01-27T21:27:35.524Z [DEBUG] Generated summary for session 096c8d6f-f9c6-4059-9631-92e43b6f33c5: "API Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_6942222b2659cd315d234f9c"}"
2026-01-27T21:27:35.532Z [DEBUG] Session index: added 1, updated 0, removed 0, summaries generated 1 (total: 6)
2026-01-27T21:27:35.536Z [ERROR] Error in non-streaming fallback: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"}
2026-01-27T21:27:35.536Z [ERROR] Error: Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"}
at t7.generate (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:152:37688)
at RR.makeStatusError (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:2195)
at RR.makeRequest (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:5420)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
2026-01-27T21:27:35.545Z [ERROR] SyntaxError: SyntaxError: Unexpected token 'A', "API Error:"... is not valid JSON
at JSON.parse ()
at file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:67:812
at q (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:8:6814)
at qZ7 (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:1609:40380)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Relevant log output

2026-01-27T21:27:35.524Z [DEBUG] Generated summary for session 096c8d6f-f9c6-4059-9631-92e43b6f33c5: "API Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_6942222b2659cd315d234f9c"}"
2026-01-27T21:27:35.532Z [DEBUG] Session index: added 1, updated 0, removed 0, summaries generated 1 (total: 6)
2026-01-27T21:27:35.536Z [ERROR] Error in non-streaming fallback: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"}
2026-01-27T21:27:35.536Z [ERROR] Error: Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"}
    at t7.generate (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:152:37688)
    at RR.makeStatusError (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:2195)
    at RR.makeRequest (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:5420)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
2026-01-27T21:27:35.545Z [ERROR] SyntaxError: SyntaxError: Unexpected token 'A', "API Error:"... is not valid JSON
    at JSON.parse (<anonymous>)
    at file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:67:812
    at q (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:8:6814)
    at qZ7 (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:1609:40380)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

OS

Windows

GPU

Intel

CPU

Intel

Ollama version

0.15.2

Originally created by @omer1abay on GitHub (Jan 27, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13939 ### What is the issue? <img width="984" height="468" alt="Image" src="https://github.com/user-attachments/assets/91af2b46-3f28-47ea-b454-0a825c5f490c" /> Even though my system requirements are good enough, I get a timeout when I run the qwen3-coder:30b model with Claude Code. When I checked the logs in the debug folder, I saw that the error ‘“message”:“model 'claude-haiku-4-5-20251001’ not found”' was logged multiple times (You can check the full version of the error log, if it's enough to check the error I can send the all log file). Is there a solution for this? I set the context length to 64k, and I have a computer with 64 GB of RAM, Ollama version 0.15.2 Claude Code version 2.1.20 2026-01-27T21:27:35.524Z [DEBUG] Generated summary for session 096c8d6f-f9c6-4059-9631-92e43b6f33c5: "API Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_6942222b2659cd315d234f9c"}" 2026-01-27T21:27:35.532Z [DEBUG] Session index: added 1, updated 0, removed 0, summaries generated 1 (total: 6) 2026-01-27T21:27:35.536Z [ERROR] Error in non-streaming fallback: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"} 2026-01-27T21:27:35.536Z [ERROR] Error: Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"} at t7.generate (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:152:37688) at RR.makeStatusError (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:2195) at RR.makeRequest (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:5420) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) 2026-01-27T21:27:35.545Z [ERROR] SyntaxError: SyntaxError: Unexpected token 'A', "API Error:"... is not valid JSON at JSON.parse (<anonymous>) at file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:67:812 at q (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:8:6814) at qZ7 (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:1609:40380) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) ### Relevant log output ```shell 2026-01-27T21:27:35.524Z [DEBUG] Generated summary for session 096c8d6f-f9c6-4059-9631-92e43b6f33c5: "API Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_6942222b2659cd315d234f9c"}" 2026-01-27T21:27:35.532Z [DEBUG] Session index: added 1, updated 0, removed 0, summaries generated 1 (total: 6) 2026-01-27T21:27:35.536Z [ERROR] Error in non-streaming fallback: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"} 2026-01-27T21:27:35.536Z [ERROR] Error: Error: 404 {"type":"error","error":{"type":"not_found_error","message":"model 'claude-haiku-4-5-20251001' not found"},"request_id":"req_d41493224a01d0a0ae3fb954"} at t7.generate (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:152:37688) at RR.makeStatusError (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:2195) at RR.makeRequest (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:169:5420) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) 2026-01-27T21:27:35.545Z [ERROR] SyntaxError: SyntaxError: Unexpected token 'A', "API Error:"... is not valid JSON at JSON.parse (<anonymous>) at file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:67:812 at q (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:8:6814) at qZ7 (file:///C:/Users/%C3%96mer%20Abay/AppData/Roaming/npm/node_modules/@anthropic-ai/claude-code/cli.js:1609:40380) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) ``` ### OS Windows ### GPU Intel ### CPU Intel ### Ollama version 0.15.2

GiteaMirror added the bug label 2026-04-12 21:58:34 -05:00

GiteaMirror commented

2026-04-12 21:58:35 -05:00

@rick-github commented on GitHub (Jan 27, 2026):

Don't see a Ollama log here.

@rick-github commented on GitHub (Jan 27, 2026): Don't see a Ollama log here.

GiteaMirror commented

2026-04-12 21:58:35 -05:00

@KyleJFischer commented on GitHub (Jan 30, 2026):

I am getting the same error on 15.1:

The error seems to be related to truncating the original prompt:

My prompt was "test" and that seems to be where it stalled and then it reprompted it self.

nixy is the name of my machine

Jan 30 13:13:05 nixy ollama[13814]: time=2026-01-30T13:13:05.228-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13872 keep=4 new=4096
Jan 30 13:13:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:10 | 404 |      28.344µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:13:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:25 | 404 |      18.425µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:13:55 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:55 | 404 |      27.914µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:14:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:14:25 | 404 |      27.723µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:14:55 nixy ollama[13814]: [GIN] 2026/01/30 - 13:14:55 | 404 |      18.826µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:04 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:04 | 500 |         1m59s |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:15:04 nixy ollama[13814]: time=2026-01-30T13:15:04.227-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
Jan 30 13:15:06 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:06 | 404 |      35.238µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:06 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:06 | 404 |        5.33µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:15 | 200 |      24.266µs |       127.0.0.1 | HEAD     "/"
Jan 30 13:15:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:15 | 200 |      690.91µs |       127.0.0.1 | GET      "/api/tags"
Jan 30 13:15:18 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:18 | 404 |      37.472µs |             ::1 | POST     "/v1/messages/count_tokens?beta=true"
Jan 30 13:15:18 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:18 | 404 |    1.838782ms |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:15:21 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:21 | 404 |     519.172µs |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:15:21 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:21 | 404 |     253.855µs |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:15:21 nixy ollama[13814]: time=2026-01-30T13:15:21.358-05:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b
Jan 30 13:15:21 nixy ollama[13814]: time=2026-01-30T13:15:21.417-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13872 keep=4 new=4096
Jan 30 13:15:23 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:23 | 404 |      29.897µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:23 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:23 | 404 |       7.264µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:25 | 404 |       7.354µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:30 | 404 |       9.238µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:38 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:38 | 404 |      31.991µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:15:50 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:50 | 404 |      18.595µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:08 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:08 | 404 |      27.202µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:14 nixy ollama[13814]: time=2026-01-30T13:16:14.059-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 500 |  52.71675636s |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 404 |      34.857µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 404 |       1.923µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:19 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:19 | 200 |      22.793µs |       127.0.0.1 | HEAD     "/"
Jan 30 13:16:19 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:19 | 200 |    1.282722ms |       127.0.0.1 | GET      "/api/tags"
Jan 30 13:16:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:25 | 404 |      27.442µs |             ::1 | POST     "/v1/messages/count_tokens?beta=true"
Jan 30 13:16:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:25 | 404 |    2.729146ms |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:16:27 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:27 | 404 |     843.192µs |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:16:27 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:27 | 404 |     483.935µs |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.092-05:00 level=INFO source=sched.go:635 msg="updated VRAM based on existing loaded models" gpu=00000000-c200-0000-0000-000000000000 library=Vulkan total="63.0 GiB" available="44.9 GiB"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.139-05:00 level=INFO source=server.go:245 msg="enabling flash attention"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/bin/ollama runner --ollama-engine --model /var/lib/ollama/models/blobs/sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 39223"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=sched.go:452 msg="system memory" total="125.1 GiB" free="101.5 GiB" free_swap="0 B"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=sched.go:459 msg="gpu memory" id=00000000-c200-0000-0000-000000000000 library=Vulkan available="44.5 GiB" free="44.9 GiB" minimum="457.0 MiB" overhead="0 B"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=server.go:755 msg="loading model" "model layers"=48 requested=-1
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.148-05:00 level=INFO source=runner.go:1405 msg="starting ollama engine"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.149-05:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:39223"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.152-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.185-05:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39
Jan 30 13:16:28 nixy ollama[13814]: ggml_vulkan: Found 1 Vulkan devices:
Jan 30 13:16:28 nixy ollama[13814]: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
Jan 30 13:16:28 nixy ollama[13814]: load_backend: loaded Vulkan backend from /nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/lib/ollama/libggml-vulkan.so
Jan 30 13:16:28 nixy ollama[13814]: load_backend: loaded CPU backend from /nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/lib/ollama/libggml-cpu-icelake.so
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.218-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.236-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000
Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.099-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="17.5 GiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:482 msg="offloading 47 repeating layers to GPU"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:494 msg="offloaded 48/48 layers to GPU"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="170.2 MiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="399.5 MiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="76.0 MiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:272 msg="total memory" size="18.2 GiB"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=sched.go:526 msg="loaded runners" count=2
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=server.go:1347 msg="waiting for llama runner to start responding"
Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model"
Jan 30 13:16:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:30 | 404 |      19.046µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:30 | 404 |       6.252µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:32 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:32 | 404 |       6.702µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:35 nixy ollama[13814]: time=2026-01-30T13:16:35.367-05:00 level=INFO source=server.go:1385 msg="llama runner started in 7.23 seconds"
Jan 30 13:16:35 nixy ollama[13814]: time=2026-01-30T13:16:35.427-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096
Jan 30 13:16:37 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:37 | 404 |       9.458µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:45 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:45 | 404 |      30.919µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:16:57 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:57 | 404 |      28.344µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:17:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:17:15 | 404 |       26.41µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:17:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:17:40 | 404 |      28.184µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:18:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:18:10 | 404 |      27.823µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:18:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:18:40 | 404 |      13.756µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:19:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:19:10 | 404 |      29.497µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:19:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:19:40 | 404 |      27.863µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:20:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:20:10 | 404 |      27.072µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:20:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:20:40 | 404 |      13.846µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:21:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:10 | 404 |      20.679µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:21:28 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:28 | 500 |          5m0s |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:21:29 nixy ollama[13814]: time=2026-01-30T13:21:29.061-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096
Jan 30 13:21:33 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:33 | 404 |      27.904µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:21:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:40 | 404 |      28.395µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:22:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:22:10 | 404 |      25.909µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:22:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:22:40 | 404 |      20.459µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:23:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:23:10 | 404 |      21.512µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:23:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:23:40 | 404 |      14.458µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:24:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:24:10 | 404 |      18.496µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:24:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:24:40 | 404 |      28.114µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:25:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:25:10 | 404 |      36.881µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:25:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:25:40 | 404 |      21.431µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:26:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:10 | 404 |      27.993µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:26:29 nixy ollama[13814]: time=2026-01-30T13:26:29.398-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
Jan 30 13:26:29 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:29 | 500 |          5m0s |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:26:30 nixy ollama[13814]: time=2026-01-30T13:26:30.558-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096
Jan 30 13:26:34 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:34 | 404 |      27.763µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:26:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:40 | 404 |      29.276µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:27:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:27:10 | 404 |      28.114µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:27:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:27:40 | 404 |      22.613µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:28:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:28:10 | 404 |      17.864µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:28:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:28:40 | 404 |      26.922µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:29:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:29:10 | 404 |      28.225µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:29:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:29:40 | 404 |      19.818µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:30:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:30:10 | 404 |      18.946µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:30:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:30:40 | 404 |      12.675µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:31:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:10 | 404 |      42.221µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:31:30 nixy ollama[13814]: time=2026-01-30T13:31:30.887-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
Jan 30 13:31:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:30 | 500 |          5m0s |             ::1 | POST     "/v1/messages?beta=true"
Jan 30 13:31:33 nixy ollama[13814]: time=2026-01-30T13:31:33.396-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096
Jan 30 13:31:35 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:35 | 404 |      30.999µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:31:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:40 | 404 |       10.03µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:32:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:32:10 | 404 |      26.962µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:32:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:32:40 | 404 |      27.102µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:33:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:33:10 | 404 |      25.619µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:33:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:33:40 | 404 |      24.898µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:34:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:34:10 | 404 |      18.264µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:34:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:34:40 | 404 |      27.822µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:35:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:35:10 | 404 |      30.127µs |             ::1 | POST     "/api/event_logging/batch"
Jan 30 13:35:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:35:40 | 404 |      27.442µs |             ::1 | POST     "/api/event_logging/batch"

OS:
NixOS

GPU
AMD

CPU
AMD

Ollama Version
0.15.1

@KyleJFischer commented on GitHub (Jan 30, 2026): I am getting the same error on 15.1: The error seems to be related to truncating the original prompt: My prompt was "test" and that seems to be where it stalled and then it reprompted it self. nixy is the name of my machine ``` Jan 30 13:13:05 nixy ollama[13814]: time=2026-01-30T13:13:05.228-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13872 keep=4 new=4096 Jan 30 13:13:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:10 | 404 | 28.344µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:13:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:25 | 404 | 18.425µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:13:55 nixy ollama[13814]: [GIN] 2026/01/30 - 13:13:55 | 404 | 27.914µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:14:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:14:25 | 404 | 27.723µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:14:55 nixy ollama[13814]: [GIN] 2026/01/30 - 13:14:55 | 404 | 18.826µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:04 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:04 | 500 | 1m59s | ::1 | POST "/v1/messages?beta=true" Jan 30 13:15:04 nixy ollama[13814]: time=2026-01-30T13:15:04.227-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" Jan 30 13:15:06 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:06 | 404 | 35.238µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:06 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:06 | 404 | 5.33µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:15 | 200 | 24.266µs | 127.0.0.1 | HEAD "/" Jan 30 13:15:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:15 | 200 | 690.91µs | 127.0.0.1 | GET "/api/tags" Jan 30 13:15:18 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:18 | 404 | 37.472µs | ::1 | POST "/v1/messages/count_tokens?beta=true" Jan 30 13:15:18 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:18 | 404 | 1.838782ms | ::1 | POST "/v1/messages?beta=true" Jan 30 13:15:21 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:21 | 404 | 519.172µs | ::1 | POST "/v1/messages?beta=true" Jan 30 13:15:21 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:21 | 404 | 253.855µs | ::1 | POST "/v1/messages?beta=true" Jan 30 13:15:21 nixy ollama[13814]: time=2026-01-30T13:15:21.358-05:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b Jan 30 13:15:21 nixy ollama[13814]: time=2026-01-30T13:15:21.417-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13872 keep=4 new=4096 Jan 30 13:15:23 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:23 | 404 | 29.897µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:23 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:23 | 404 | 7.264µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:25 | 404 | 7.354µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:30 | 404 | 9.238µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:38 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:38 | 404 | 31.991µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:15:50 nixy ollama[13814]: [GIN] 2026/01/30 - 13:15:50 | 404 | 18.595µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:08 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:08 | 404 | 27.202µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:14 nixy ollama[13814]: time=2026-01-30T13:16:14.059-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 500 | 52.71675636s | ::1 | POST "/v1/messages?beta=true" Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 404 | 34.857µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:14 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:14 | 404 | 1.923µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:19 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:19 | 200 | 22.793µs | 127.0.0.1 | HEAD "/" Jan 30 13:16:19 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:19 | 200 | 1.282722ms | 127.0.0.1 | GET "/api/tags" Jan 30 13:16:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:25 | 404 | 27.442µs | ::1 | POST "/v1/messages/count_tokens?beta=true" Jan 30 13:16:25 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:25 | 404 | 2.729146ms | ::1 | POST "/v1/messages?beta=true" Jan 30 13:16:27 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:27 | 404 | 843.192µs | ::1 | POST "/v1/messages?beta=true" Jan 30 13:16:27 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:27 | 404 | 483.935µs | ::1 | POST "/v1/messages?beta=true" Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000 Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000 Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.092-05:00 level=INFO source=sched.go:635 msg="updated VRAM based on existing loaded models" gpu=00000000-c200-0000-0000-000000000000 library=Vulkan total="63.0 GiB" available="44.9 GiB" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.139-05:00 level=INFO source=server.go:245 msg="enabling flash attention" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/bin/ollama runner --ollama-engine --model /var/lib/ollama/models/blobs/sha256-9eba2761cf0b88b8bc11a065a7b5b47f1b13ce820e8e492cb1010b450f9ec950 --port 39223" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=sched.go:452 msg="system memory" total="125.1 GiB" free="101.5 GiB" free_swap="0 B" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=sched.go:459 msg="gpu memory" id=00000000-c200-0000-0000-000000000000 library=Vulkan available="44.5 GiB" free="44.9 GiB" minimum="457.0 MiB" overhead="0 B" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.140-05:00 level=INFO source=server.go:755 msg="loading model" "model layers"=48 requested=-1 Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.148-05:00 level=INFO source=runner.go:1405 msg="starting ollama engine" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.149-05:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:39223" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.152-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.185-05:00 level=INFO source=ggml.go:136 msg="" architecture=glm4moelite file_type=Q4_K_M name="" description="" num_tensors=844 num_key_values=39 Jan 30 13:16:28 nixy ollama[13814]: ggml_vulkan: Found 1 Vulkan devices: Jan 30 13:16:28 nixy ollama[13814]: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat Jan 30 13:16:28 nixy ollama[13814]: load_backend: loaded Vulkan backend from /nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/lib/ollama/libggml-vulkan.so Jan 30 13:16:28 nixy ollama[13814]: load_backend: loaded CPU backend from /nix/store/hb2mgmb71phjj10i4214pxjwwdgg3sbg-ollama-0.15.1/lib/ollama/libggml-cpu-icelake.so Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.218-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000 Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 Jan 30 13:16:28 nixy ollama[13814]: time=2026-01-30T13:16:28.236-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: uuid 00000000-c200-0000-0000-000000000000 Jan 30 13:16:28 nixy ollama[13814]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.099-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:16 GPULayers:48[ID:00000000-c200-0000-0000-000000000000 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="17.5 GiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:482 msg="offloading 47 repeating layers to GPU" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=ggml.go:494 msg="offloaded 48/48 layers to GPU" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="170.2 MiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="399.5 MiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="76.0 MiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=device.go:272 msg="total memory" size="18.2 GiB" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=sched.go:526 msg="loaded runners" count=2 Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=server.go:1347 msg="waiting for llama runner to start responding" Jan 30 13:16:29 nixy ollama[13814]: time=2026-01-30T13:16:29.100-05:00 level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model" Jan 30 13:16:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:30 | 404 | 19.046µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:30 | 404 | 6.252µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:32 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:32 | 404 | 6.702µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:35 nixy ollama[13814]: time=2026-01-30T13:16:35.367-05:00 level=INFO source=server.go:1385 msg="llama runner started in 7.23 seconds" Jan 30 13:16:35 nixy ollama[13814]: time=2026-01-30T13:16:35.427-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096 Jan 30 13:16:37 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:37 | 404 | 9.458µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:45 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:45 | 404 | 30.919µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:16:57 nixy ollama[13814]: [GIN] 2026/01/30 - 13:16:57 | 404 | 28.344µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:17:15 nixy ollama[13814]: [GIN] 2026/01/30 - 13:17:15 | 404 | 26.41µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:17:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:17:40 | 404 | 28.184µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:18:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:18:10 | 404 | 27.823µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:18:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:18:40 | 404 | 13.756µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:19:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:19:10 | 404 | 29.497µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:19:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:19:40 | 404 | 27.863µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:20:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:20:10 | 404 | 27.072µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:20:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:20:40 | 404 | 13.846µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:21:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:10 | 404 | 20.679µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:21:28 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:28 | 500 | 5m0s | ::1 | POST "/v1/messages?beta=true" Jan 30 13:21:29 nixy ollama[13814]: time=2026-01-30T13:21:29.061-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096 Jan 30 13:21:33 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:33 | 404 | 27.904µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:21:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:21:40 | 404 | 28.395µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:22:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:22:10 | 404 | 25.909µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:22:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:22:40 | 404 | 20.459µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:23:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:23:10 | 404 | 21.512µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:23:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:23:40 | 404 | 14.458µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:24:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:24:10 | 404 | 18.496µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:24:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:24:40 | 404 | 28.114µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:25:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:25:10 | 404 | 36.881µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:25:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:25:40 | 404 | 21.431µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:26:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:10 | 404 | 27.993µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:26:29 nixy ollama[13814]: time=2026-01-30T13:26:29.398-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" Jan 30 13:26:29 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:29 | 500 | 5m0s | ::1 | POST "/v1/messages?beta=true" Jan 30 13:26:30 nixy ollama[13814]: time=2026-01-30T13:26:30.558-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096 Jan 30 13:26:34 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:34 | 404 | 27.763µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:26:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:26:40 | 404 | 29.276µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:27:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:27:10 | 404 | 28.114µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:27:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:27:40 | 404 | 22.613µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:28:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:28:10 | 404 | 17.864µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:28:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:28:40 | 404 | 26.922µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:29:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:29:10 | 404 | 28.225µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:29:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:29:40 | 404 | 19.818µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:30:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:30:10 | 404 | 18.946µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:30:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:30:40 | 404 | 12.675µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:31:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:10 | 404 | 42.221µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:31:30 nixy ollama[13814]: time=2026-01-30T13:31:30.887-05:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" Jan 30 13:31:30 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:30 | 500 | 5m0s | ::1 | POST "/v1/messages?beta=true" Jan 30 13:31:33 nixy ollama[13814]: time=2026-01-30T13:31:33.396-05:00 level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=13769 keep=4 new=4096 Jan 30 13:31:35 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:35 | 404 | 30.999µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:31:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:31:40 | 404 | 10.03µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:32:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:32:10 | 404 | 26.962µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:32:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:32:40 | 404 | 27.102µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:33:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:33:10 | 404 | 25.619µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:33:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:33:40 | 404 | 24.898µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:34:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:34:10 | 404 | 18.264µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:34:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:34:40 | 404 | 27.822µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:35:10 nixy ollama[13814]: [GIN] 2026/01/30 - 13:35:10 | 404 | 30.127µs | ::1 | POST "/api/event_logging/batch" Jan 30 13:35:40 nixy ollama[13814]: [GIN] 2026/01/30 - 13:35:40 | 404 | 27.442µs | ::1 | POST "/api/event_logging/batch" ``` OS: NixOS GPU AMD CPU AMD Ollama Version 0.15.1

GiteaMirror commented

2026-04-12 21:58:36 -05:00

@KyleJFischer commented on GitHub (Jan 30, 2026):

Note this is using glm-4.7, I was also getting this issue with gwen3-coder and any model really

@KyleJFischer commented on GitHub (Jan 30, 2026): Note this is using glm-4.7, I was also getting this issue with gwen3-coder and any model really

GiteaMirror commented

2026-04-12 21:58:36 -05:00

@omer1abay commented on GitHub (Jan 30, 2026):

Sorry for the late response, these are the logs;

[GIN] 2026/01/30 - 22:18:58 | 200 | 507.4µs | 127.0.0.1 | GET "/api/version"
[GIN] 2026/01/30 - 22:18:58 | 200 | 324.5411ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/01/30 - 22:18:59 | 200 | 237.6ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:18:59 | 200 | 288.086ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:18:59 | 200 | 233.7476ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:18:59 | 200 | 112.5679ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:00 | 200 | 243.4562ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:00 | 200 | 326.5986ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:00 | 200 | 341.0319ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:01 | 200 | 251.4141ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:01 | 200 | 237.9697ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:01 | 200 | 313.0391ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:02 | 200 | 580.0144ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/01/30 - 22:19:08 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2026/01/30 - 22:19:13 | 404 | 0s | 127.0.0.1 | POST "/v1/messages/count_tokens?beta=true"
[GIN] 2026/01/30 - 22:19:13 | 404 | 11.9269ms | 127.0.0.1 | POST "/v1/messages?beta=true"
[GIN] 2026/01/30 - 22:19:18 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:19:19 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:19:21 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:19:22 | 404 | 8.61ms | 127.0.0.1 | POST "/v1/messages?beta=true"
[GIN] 2026/01/30 - 22:19:22 | 404 | 7.5979ms | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:19:22.494+03:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b
time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=10 threads=16
time=2026-01-30T22:19:22.731+03:00 level=INFO source=server.go:245 msg="enabling flash attention"
time=2026-01-30T22:19:22.739+03:00 level=INFO source=server.go:429 msg="starting runner" cmd="C:\Users\Ömer Abay\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --model C:\Users\Ömer Abay\.ollama\models\blobs\sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a --port 53740"
time=2026-01-30T22:19:22.748+03:00 level=INFO source=sched.go:452 msg="system memory" total="62.9 GiB" free="28.1 GiB" free_swap="8.5 GiB"
time=2026-01-30T22:19:22.748+03:00 level=INFO source=server.go:755 msg="loading model" "model layers"=49 requested=-1
time=2026-01-30T22:19:22.834+03:00 level=INFO source=runner.go:1405 msg="starting ollama engine"
time=2026-01-30T22:19:23.014+03:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:53740"
time=2026-01-30T22:19:23.026+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-01-30T22:19:23.076+03:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3moe file_type=Q4_K_M name="Qwen3 Coder 30B A3B Instruct" description="" num_tensors=579 num_key_values=35
load_backend: loaded CPU backend from C:\Users\Ömer Abay\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2026-01-30T22:19:23.440+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2026-01-30T22:19:23.676+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[GIN] 2026/01/30 - 22:19:25 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:19:25 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:494 msg="offloaded 0/49 layers to GPU"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="17.3 GiB"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="6.0 GiB"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="144.0 MiB"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:272 msg="total memory" size="23.4 GiB"
time=2026-01-30T22:19:26.467+03:00 level=INFO source=sched.go:526 msg="loaded runners" count=1
time=2026-01-30T22:19:26.467+03:00 level=INFO source=server.go:1347 msg="waiting for llama runner to start responding"
time=2026-01-30T22:19:26.468+03:00 level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model"
[GIN] 2026/01/30 - 22:19:33 | 404 | 545.8µs | 127.0.0.1 | POST "/api/event_logging/batch"
time=2026-01-30T22:19:36.636+03:00 level=INFO source=server.go:1385 msg="llama runner started in 13.89 seconds"
[GIN] 2026/01/30 - 22:19:46 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:20:04 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:20:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:20:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:21:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:21:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:22:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:22:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:23:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:23:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:24:26 | 500 | 5m4s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:24:27.447+03:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b
[GIN] 2026/01/30 - 22:24:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:24:31 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:24:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
[GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"
[GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"

@omer1abay commented on GitHub (Jan 30, 2026): Sorry for the late response, these are the logs; [GIN] 2026/01/30 - 22:18:58 | 200 | 507.4µs | 127.0.0.1 | GET "/api/version" [GIN] 2026/01/30 - 22:18:58 | 200 | 324.5411ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/01/30 - 22:18:59 | 200 | 237.6ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:18:59 | 200 | 288.086ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:18:59 | 200 | 233.7476ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:18:59 | 200 | 112.5679ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:00 | 200 | 243.4562ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:00 | 200 | 326.5986ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:00 | 200 | 341.0319ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:01 | 200 | 251.4141ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:01 | 200 | 237.9697ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:01 | 200 | 313.0391ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:02 | 200 | 580.0144ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/01/30 - 22:19:08 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2026/01/30 - 22:19:13 | 404 | 0s | 127.0.0.1 | POST "/v1/messages/count_tokens?beta=true" [GIN] 2026/01/30 - 22:19:13 | 404 | 11.9269ms | 127.0.0.1 | POST "/v1/messages?beta=true" [GIN] 2026/01/30 - 22:19:18 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:19:19 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:19:21 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:19:22 | 404 | 8.61ms | 127.0.0.1 | POST "/v1/messages?beta=true" [GIN] 2026/01/30 - 22:19:22 | 404 | 7.5979ms | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-01-30T22:19:22.494+03:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-01-30T22:19:22.596+03:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=10 threads=16 time=2026-01-30T22:19:22.731+03:00 level=INFO source=server.go:245 msg="enabling flash attention" time=2026-01-30T22:19:22.739+03:00 level=INFO source=server.go:429 msg="starting runner" cmd="C:\\Users\\Ömer Abay\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Ömer Abay\\.ollama\\models\\blobs\\sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a --port 53740" time=2026-01-30T22:19:22.748+03:00 level=INFO source=sched.go:452 msg="system memory" total="62.9 GiB" free="28.1 GiB" free_swap="8.5 GiB" time=2026-01-30T22:19:22.748+03:00 level=INFO source=server.go:755 msg="loading model" "model layers"=49 requested=-1 time=2026-01-30T22:19:22.834+03:00 level=INFO source=runner.go:1405 msg="starting ollama engine" time=2026-01-30T22:19:23.014+03:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:53740" time=2026-01-30T22:19:23.026+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-01-30T22:19:23.076+03:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3moe file_type=Q4_K_M name="Qwen3 Coder 30B A3B Instruct" description="" num_tensors=579 num_key_values=35 load_backend: loaded CPU backend from C:\Users\Ömer Abay\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll time=2026-01-30T22:19:23.440+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2026-01-30T22:19:23.676+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [GIN] 2026/01/30 - 22:19:25 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:19:25 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" time=2026-01-30T22:19:26.467+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-01-30T22:19:26.467+03:00 level=INFO source=ggml.go:494 msg="offloaded 0/49 layers to GPU" time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="17.3 GiB" time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="6.0 GiB" time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="144.0 MiB" time=2026-01-30T22:19:26.467+03:00 level=INFO source=device.go:272 msg="total memory" size="23.4 GiB" time=2026-01-30T22:19:26.467+03:00 level=INFO source=sched.go:526 msg="loaded runners" count=1 time=2026-01-30T22:19:26.467+03:00 level=INFO source=server.go:1347 msg="waiting for llama runner to start responding" time=2026-01-30T22:19:26.468+03:00 level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model" [GIN] 2026/01/30 - 22:19:33 | 404 | 545.8µs | 127.0.0.1 | POST "/api/event_logging/batch" time=2026-01-30T22:19:36.636+03:00 level=INFO source=server.go:1385 msg="llama runner started in 13.89 seconds" [GIN] 2026/01/30 - 22:19:46 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:20:04 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:20:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:20:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:21:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:21:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:22:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:22:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:23:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:23:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:24:26 | 500 | 5m4s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-01-30T22:24:27.447+03:00 level=WARN source=routes.go:2094 msg="model does not support thinking, relaxing thinking to nil" model=qwen3-coder:30b [GIN] 2026/01/30 - 22:24:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:24:31 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:24:59 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" [GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch" [GIN] 2026/01/30 - 22:25:29 | 404 | 0s | 127.0.0.1 | POST "/api/event_logging/batch"

GiteaMirror commented

2026-04-12 21:58:37 -05:00

@rick-github commented on GitHub (Jan 31, 2026):

@KyleJFischer Increase the size of the model context as described here, then re-test.

@omer1abay

[GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"

The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU.

Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in:

DISABLE_TELEMETRY=1
DISABLE_ERROR_REPORTING=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

@rick-github commented on GitHub (Jan 31, 2026): @KyleJFischer Increase the size of the model context as described [here](https://docs.ollama.com/integrations/claude-code#manual-setup:~:text=Note%3A%20Claude%20Code%20requires%20a%20large%20context%20window.%20We%20recommend%20at%20least%2064k%20tokens.%20See%20the%20context%20length%20documentation%20for%20how%20to%20adjust%20context%20length%20in%20Ollama.), then re-test. @omer1abay ``` [GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" ``` The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU. Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in: ``` DISABLE_TELEMETRY=1 DISABLE_ERROR_REPORTING=1 CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 ```

GiteaMirror commented

2026-04-12 21:58:37 -05:00

@stratmm commented on GitHub (Feb 6, 2026):

@KyleJFischer Increase the size of the model context as described here, then re-test.

@omer1abay
[GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU.

Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in:
DISABLE_TELEMETRY=1
DISABLE_ERROR_REPORTING=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

@rick-github When you say "Increase the timeout",; which timeout do you mean? Can't find any Claude code documentation on timeouts related to model responses?

@stratmm commented on GitHub (Feb 6, 2026): > [@KyleJFischer](https://github.com/KyleJFischer) Increase the size of the model context as described [here](https://docs.ollama.com/integrations/claude-code#manual-setup:~:text=Note%3A%20Claude%20Code%20requires%20a%20large%20context%20window.%20We%20recommend%20at%20least%2064k%20tokens.%20See%20the%20context%20length%20documentation%20for%20how%20to%20adjust%20context%20length%20in%20Ollama.), then re-test. > > [@omer1abay](https://github.com/omer1abay) > > ``` > [GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true" > time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" > ``` > > The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU. > > Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in: > > ``` > DISABLE_TELEMETRY=1 > DISABLE_ERROR_REPORTING=1 > CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 > ``` @rick-github When you say "Increase the timeout",; which timeout do you mean? Can't find any Claude code documentation on timeouts related to model responses?

GiteaMirror commented

2026-04-12 21:58:38 -05:00

@rick-github commented on GitHub (Feb 6, 2026):

Unfortunately I'm not a Claude Code user so I don't know what configuration options are available.

@rick-github commented on GitHub (Feb 6, 2026): Unfortunately I'm not a Claude Code user so I don't know what configuration options are available.

GiteaMirror commented

2026-04-12 21:58:38 -05:00

@stratmm commented on GitHub (Feb 9, 2026):

@KyleJFischer Increase the size of the model context as described here, then re-test.

@omer1abay
[GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection"
The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU.

Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in:
DISABLE_TELEMETRY=1
DISABLE_ERROR_REPORTING=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

@KyleJFischer, I have found a solution that has worked for me. I was using Ollama to run my models as I incorrectly thought that Claude Code only supports Ollama.

I switched to running the exact same models on llama.cpp, and now timeouts are no longer an issue.

There are a number of key differences between my Ollama and llama.cpp setups:

Ollama is running the vulkan drivers and is therefore slower
llama.cpp is running the AMD rocm nightly drivers and is therefore at least 30% faster.
The models I am now running are the unsloth versions, in this case Qwen3-Coder-Next

I just dont know if the improvement is due to llama.cpp speed, differences in the models or differences in the llama.cpp api compared to ollama.

If it helps I have pased my llama.cpp docker container and docker-compose that I am running.

# build
FROM registry.fedoraproject.org/fedora:43 AS builder

RUN dnf -y --nodocs --setopt=install_weak_deps=False install \
  make gcc cmake lld clang clang-devel compiler-rt libcurl-devel \
  radeontop git vim patch curl ninja-build tar xz aria2c \
  && dnf clean all && rm -rf /var/cache/dnf/*

# find & fetch the latest Linux 7.x.x tarball (gfx1151)
WORKDIR /tmp
ARG ROCM_MAJOR_VER=7
ARG GFX=gfx1151
RUN set -euo pipefail; \
  BASE="https://therock-nightly-tarball.s3.amazonaws.com"; \
  PREFIX="therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}"; \
  KEY="$(curl -s "${BASE}?list-type=2&prefix=${PREFIX}" \
  | tr '<' '\n' \
  | grep -o "therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}\..*\.tar\.gz" \
  | sort -V | tail -n1)"; \
  echo "Latest tarball: ${KEY}"; \
  aria2c -x 16 -s 16 -j 16 --file-allocation=none "${BASE}/${KEY}" -o therock.tar.gz
RUN mkdir -p /opt/rocm-7.0 \
  && tar xzf therock.tar.gz -C /opt/rocm-7.0 --strip-components=1

ENV ROCM_PATH=/opt/rocm-7.0 \
  HIP_PLATFORM=amd \
  HIP_PATH=/opt/rocm-7.0 \
  HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \
  HIP_INCLUDE_PATH=/opt/rocm-7.0/include \
  HIP_LIB_PATH=/opt/rocm-7.0/lib \
  HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \
  PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \
  LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \
  CPATH=/opt/rocm-7.0/include \
  PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig

RUN printf '%s\n' \
  'export ROCM_PATH=/opt/rocm-7.0' \
  'export HIP_PLATFORM=amd' \
  'export HIP_PATH=/opt/rocm-7.0' \
  'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \
  'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \
  'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \
  'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \
  'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \
  'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \
  'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \
  'export CPATH="$HIP_INCLUDE_PATH"' \
  'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \
  > /etc/profile.d/rocm.sh \
  && chmod +x /etc/profile.d/rocm.sh \
  && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc

WORKDIR /opt/llama.cpp
RUN git clone --recursive https://github.com/ggerganov/llama.cpp.git . \
  && git clean -xdf \
  && git submodule update --recursive

RUN cmake -S . -B build \
  -DGGML_HIP=ON \
  -DAMDGPU_TARGETS=gfx1151 \
  -DCMAKE_BUILD_TYPE=Release \
  -DGGML_RPC=ON \
  -DLLAMA_HIP_UMA=ON \
  && cmake --build build --config Release -- -j$(nproc) \
  && cmake --install build --config Release

# keep bin; drop headers/docs/static libs (retain llama.cpp for rpc binaries)
RUN find /opt/rocm-7.0 -type f -name '*.a' -delete \
  && rm -rf /opt/rocm-7.0/include /opt/rocm-7.0/share \
  /opt/rocm-7.0/llvm/include /opt/rocm-7.0/llvm/share

# runtime
FROM registry.fedoraproject.org/fedora-minimal:43

RUN microdnf -y --nodocs --setopt=install_weak_deps=0 install \
  bash ca-certificates libatomic libstdc++ libgcc radeontop vim procps-ng \
  && microdnf clean all && rm -rf /var/cache/dnf/*

COPY --from=builder /opt/rocm-7.0 /opt/rocm-7.0
COPY --from=builder /usr/local/ /usr/local/
COPY --from=builder /opt/llama.cpp/build/bin/rpc-* /usr/local/bin/

# COPY gguf-vram-estimator.py /usr/local/bin/
# RUN chmod +x /usr/local/bin/gguf-vram-estimator.py

ENV ROCM_PATH=/opt/rocm-7.0 \
  HIP_PLATFORM=amd \
  HIP_PATH=/opt/rocm-7.0 \
  HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \
  HIP_INCLUDE_PATH=/opt/rocm-7.0/include \
  HIP_LIB_PATH=/opt/rocm-7.0/lib \
  HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \
  PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \
  LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \
  CPATH=/opt/rocm-7.0/include \
  PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig

RUN printf '%s\n' \
  'export ROCM_PATH=/opt/rocm-7.0' \
  'export HIP_PLATFORM=amd' \
  'export HIP_PATH=/opt/rocm-7.0' \
  'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \
  'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \
  'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \
  'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \
  'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \
  'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \
  'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \
  'export CPATH="$HIP_INCLUDE_PATH"' \
  'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \
  > /etc/profile.d/rocm.sh \
  && chmod +x /etc/profile.d/rocm.sh \
  && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc

# make /usr/local libs visible without touching env
RUN echo "/usr/local/lib"  > /etc/ld.so.conf.d/local.conf \
  && echo "/usr/local/lib64" >> /etc/ld.so.conf.d/local.conf \
  && ldconfig

CMD ["/bin/bash"]

  qwen-3-coder-next-rocm:
    image: llamacpp-rocm
    container_name: llamacpp
    restart: unless-stopped
    devices:
      - /dev/dri:/dev/dri
      - /dev/kfd:/dev/kfd
    group_add:
      - "video"
      - "render"
    volumes:
      - /home/mark/running-llms/:/root/running-llms
    ports:
      - "8080:8080"
    security_opt:
      - seccomp=unconfined
    command: >
      bash -c "llama-server --alias Qwen3-Coder-Next -m /root/running-llms/hf-models/unsloth/Qwen3-Coder-Next-GGUF/UD-Q8_K_XL/Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf --ctx-size 262144 -fa 1 --no-mmap --host 0.0.0.0 --port 8080 --temp 1.0 --top-k 40 --min-p 0.01 --top-p 0.95 --jinja -ngl 99 --threads -1"

Hope this helps

@stratmm commented on GitHub (Feb 9, 2026): > [@KyleJFischer](https://github.com/KyleJFischer) Increase the size of the model context as described [here](https://docs.ollama.com/integrations/claude-code#manual-setup:~:text=Note%3A%20Claude%20Code%20requires%20a%20large%20context%20window.%20We%20recommend%20at%20least%2064k%20tokens.%20See%20the%20context%20length%20documentation%20for%20how%20to%20adjust%20context%20length%20in%20Ollama.), then re-test. > > [@omer1abay](https://github.com/omer1abay) > > ``` > [GIN] 2026/01/30 - 22:25:28 | 500 | 1m1s | 127.0.0.1 | POST "/v1/messages?beta=true" > time=2026-01-30T22:25:28.827+03:00 level=INFO source=runner.go:916 msg="aborting completion request due to client closing the connection" > ``` > > The client has a 60 second timeout and disconnected before the model could respond. I'm guessing that the prompt is large and/or complicated and since you are running on CPU, it's just taking a long time to process. Increase the timeout, simplify the prompt, or get a GPU. > > Note that the 404s in the log can be prevented by disabling Claude Code telemetry by setting these variables in the environment that you run CC in: > > ``` > DISABLE_TELEMETRY=1 > DISABLE_ERROR_REPORTING=1 > CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 > ``` @KyleJFischer, I have found a solution that has worked for me. I was using Ollama to run my models as I incorrectly thought that Claude Code only supports Ollama. I switched to running the exact same models on llama.cpp, and now timeouts are no longer an issue. There are a number of key differences between my Ollama and llama.cpp setups: 1. Ollama is running the vulkan drivers and is therefore slower 2. llama.cpp is running the AMD rocm nightly drivers and is therefore at least 30% faster. 3. The models I am now running are the unsloth versions, in this case Qwen3-Coder-Next I just dont know if the improvement is due to llama.cpp speed, differences in the models or differences in the llama.cpp api compared to ollama. If it helps I have pased my llama.cpp docker container and docker-compose that I am running. ``` # build FROM registry.fedoraproject.org/fedora:43 AS builder RUN dnf -y --nodocs --setopt=install_weak_deps=False install \ make gcc cmake lld clang clang-devel compiler-rt libcurl-devel \ radeontop git vim patch curl ninja-build tar xz aria2c \ && dnf clean all && rm -rf /var/cache/dnf/* # find & fetch the latest Linux 7.x.x tarball (gfx1151) WORKDIR /tmp ARG ROCM_MAJOR_VER=7 ARG GFX=gfx1151 RUN set -euo pipefail; \ BASE="https://therock-nightly-tarball.s3.amazonaws.com"; \ PREFIX="therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}"; \ KEY="$(curl -s "${BASE}?list-type=2&prefix=${PREFIX}" \ | tr '<' '\n' \ | grep -o "therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}\..*\.tar\.gz" \ | sort -V | tail -n1)"; \ echo "Latest tarball: ${KEY}"; \ aria2c -x 16 -s 16 -j 16 --file-allocation=none "${BASE}/${KEY}" -o therock.tar.gz RUN mkdir -p /opt/rocm-7.0 \ && tar xzf therock.tar.gz -C /opt/rocm-7.0 --strip-components=1 ENV ROCM_PATH=/opt/rocm-7.0 \ HIP_PLATFORM=amd \ HIP_PATH=/opt/rocm-7.0 \ HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \ HIP_INCLUDE_PATH=/opt/rocm-7.0/include \ HIP_LIB_PATH=/opt/rocm-7.0/lib \ HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \ PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \ LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \ CPATH=/opt/rocm-7.0/include \ PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig RUN printf '%s\n' \ 'export ROCM_PATH=/opt/rocm-7.0' \ 'export HIP_PLATFORM=amd' \ 'export HIP_PATH=/opt/rocm-7.0' \ 'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \ 'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \ 'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \ 'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \ 'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \ 'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \ 'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \ 'export CPATH="$HIP_INCLUDE_PATH"' \ 'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \ > /etc/profile.d/rocm.sh \ && chmod +x /etc/profile.d/rocm.sh \ && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc WORKDIR /opt/llama.cpp RUN git clone --recursive https://github.com/ggerganov/llama.cpp.git . \ && git clean -xdf \ && git submodule update --recursive RUN cmake -S . -B build \ -DGGML_HIP=ON \ -DAMDGPU_TARGETS=gfx1151 \ -DCMAKE_BUILD_TYPE=Release \ -DGGML_RPC=ON \ -DLLAMA_HIP_UMA=ON \ && cmake --build build --config Release -- -j$(nproc) \ && cmake --install build --config Release # keep bin; drop headers/docs/static libs (retain llama.cpp for rpc binaries) RUN find /opt/rocm-7.0 -type f -name '*.a' -delete \ && rm -rf /opt/rocm-7.0/include /opt/rocm-7.0/share \ /opt/rocm-7.0/llvm/include /opt/rocm-7.0/llvm/share # runtime FROM registry.fedoraproject.org/fedora-minimal:43 RUN microdnf -y --nodocs --setopt=install_weak_deps=0 install \ bash ca-certificates libatomic libstdc++ libgcc radeontop vim procps-ng \ && microdnf clean all && rm -rf /var/cache/dnf/* COPY --from=builder /opt/rocm-7.0 /opt/rocm-7.0 COPY --from=builder /usr/local/ /usr/local/ COPY --from=builder /opt/llama.cpp/build/bin/rpc-* /usr/local/bin/ # COPY gguf-vram-estimator.py /usr/local/bin/ # RUN chmod +x /usr/local/bin/gguf-vram-estimator.py ENV ROCM_PATH=/opt/rocm-7.0 \ HIP_PLATFORM=amd \ HIP_PATH=/opt/rocm-7.0 \ HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \ HIP_INCLUDE_PATH=/opt/rocm-7.0/include \ HIP_LIB_PATH=/opt/rocm-7.0/lib \ HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \ PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \ LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \ CPATH=/opt/rocm-7.0/include \ PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig RUN printf '%s\n' \ 'export ROCM_PATH=/opt/rocm-7.0' \ 'export HIP_PLATFORM=amd' \ 'export HIP_PATH=/opt/rocm-7.0' \ 'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \ 'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \ 'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \ 'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \ 'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \ 'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \ 'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \ 'export CPATH="$HIP_INCLUDE_PATH"' \ 'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \ > /etc/profile.d/rocm.sh \ && chmod +x /etc/profile.d/rocm.sh \ && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc # make /usr/local libs visible without touching env RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/local.conf \ && echo "/usr/local/lib64" >> /etc/ld.so.conf.d/local.conf \ && ldconfig CMD ["/bin/bash"] ``` ``` qwen-3-coder-next-rocm: image: llamacpp-rocm container_name: llamacpp restart: unless-stopped devices: - /dev/dri:/dev/dri - /dev/kfd:/dev/kfd group_add: - "video" - "render" volumes: - /home/mark/running-llms/:/root/running-llms ports: - "8080:8080" security_opt: - seccomp=unconfined command: > bash -c "llama-server --alias Qwen3-Coder-Next -m /root/running-llms/hf-models/unsloth/Qwen3-Coder-Next-GGUF/UD-Q8_K_XL/Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf --ctx-size 262144 -fa 1 --no-mmap --host 0.0.0.0 --port 8080 --temp 1.0 --top-k 40 --min-p 0.01 --top-p 0.95 --jinja -ngl 99 --threads -1" ``` Hope this helps

GiteaMirror commented

2026-04-12 21:58:39 -05:00

@lvvorovi commented on GitHub (Apr 2, 2026):

same issue with timeout using ollama.
Did anyone find a way to fix it without switching from ollama?

time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds"
time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000

time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format=""
time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""
time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198

time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection"
time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000
time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1
[GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true"
time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5

time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""

time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection"
time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000
time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1
[GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true"

CLAUDE

export PATH="$HOME/.local/bin:$PATH"
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
export API_TIMEOUT_MS=600000000
export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000
export CLAUDE_ENABLE_STREAM_WATCHDOG=0
export DISABLE_TELEMETRY=1
export DISABLE_ERROR_REPORTING=1
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

OLLAMA

export OLLAMA_LOAD_TIMEOUT=60000000
export OLLAMA_KEEP_ALIVE=60000000
export OLLAMA_CONTEXT_LENGTH=128000
export OLLAMA_DEBUG=1

ollama serve
ollama launch claude --model qwen3.5:0.8b

Claude Code v2.1.90
ollama version is 0.19.0

@lvvorovi commented on GitHub (Apr 2, 2026): same issue with timeout using ollama. Did anyone find a way to fix it without switching from ollama? ------------------------------- time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198 time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true" --------------- # CLAUDE export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 # OLLAMA export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1 --------------------------------- ollama serve ollama launch claude --model qwen3.5:0.8b ------------------------------ Claude Code v2.1.90 ollama version is 0.19.0

GiteaMirror commented

2026-04-12 21:58:39 -05:00

@omer1abay commented on GitHub (Apr 2, 2026):

same issue with timeout using ollama. Did anyone find a way to fix it without switching from ollama?

time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000

time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198

time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5

time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""

time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true"

CLAUDE

export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

OLLAMA

export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1

ollama serve ollama launch claude --model qwen3.5:0.8b

Claude Code v2.1.90 ollama version is 0.19.0

It worked on my personal computer which is have a GPU, when I opened this issue I had office laptop (it's also strong but without GPU) so I got timeout. But in my personal computer, it's still slow but end of the day no timeout error, Do you have GPU on your PC?

@omer1abay commented on GitHub (Apr 2, 2026): > same issue with timeout using ollama. Did anyone find a way to fix it without switching from ollama? > > time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 > > time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198 > > time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 > > time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" > > time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true" > > # CLAUDE > export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 > > # OLLAMA > export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1 > > ollama serve ollama launch claude --model qwen3.5:0.8b > > Claude Code v2.1.90 ollama version is 0.19.0 It worked on my personal computer which is have a GPU, when I opened this issue I had office laptop (it's also strong but without GPU) so I got timeout. But in my personal computer, it's still slow but end of the day no timeout error, Do you have GPU on your PC?

GiteaMirror commented

2026-04-12 21:58:41 -05:00

@lvvorovi commented on GitHub (Apr 3, 2026):

same issue with timeout using ollama. Did anyone find a way to fix it without switching from ollama?
time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000
time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198
time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5
time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format=""
time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true"

CLAUDE

export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

OLLAMA

export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1
ollama serve ollama launch claude --model qwen3.5:0.8b
Claude Code v2.1.90 ollama version is 0.19.0

It worked on my personal computer which is have a GPU, when I opened this issue I had office laptop (it's also strong but without GPU) so I got timeout. But in my personal computer, it's still slow but end of the day no timeout error, Do you have GPU on your PC?

it is obviously due to time it takes. My setup has no GPU, so it takes longer. I am in general OK with the time it takes, just need to find a way to configure ClaudeCode/Ollama to be OK with that too.

@lvvorovi commented on GitHub (Apr 3, 2026): > > same issue with timeout using ollama. Did anyone find a way to fix it without switching from ollama? > > time=2026-04-02T18:51:07.813+03:00 level=INFO source=server.go:1390 msg="llama runner started in 14.92 seconds" time=2026-04-02T18:51:07.813+03:00 level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 > > time=2026-04-02T18:51:08.023+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=850 format="" time=2026-04-02T18:51:08.068+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" time=2026-04-02T18:51:08.101+03:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=198 used=0 remaining=198 > > time=2026-04-02T18:55:48.050+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:55:48.050+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:55:48 | 500 | 4m55s | 127.0.0.1 | POST "/v1/messages?beta=true" time=2026-04-02T18:55:49.169+03:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 > > time=2026-04-02T18:55:49.252+03:00 level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=96188 format="" > > time=2026-04-02T18:57:29.347+03:00 level=INFO source=server.go:1570 msg="aborting completion request due to client closing the connection" time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 time=2026-04-02T18:57:29.347+03:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.size="3.8 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=396906 runner.model=/home/ardga/.ollama/models/blobs/sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=128000 refCount=1 [GIN] 2026/04/02 - 18:57:29 | 500 | 1m40s | 127.0.0.1 | POST "/v1/messages?beta=true" > > # CLAUDE > > export PATH="$HOME/.local/bin:$PATH" export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL=http://localhost:11434 export API_TIMEOUT_MS=600000000 export CLAUDE_CODE_GLOB_TIMEOUT_SECONDS=60000000 export CLAUDE_ENABLE_STREAM_WATCHDOG=0 export DISABLE_TELEMETRY=1 export DISABLE_ERROR_REPORTING=1 export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 > > # OLLAMA > > export OLLAMA_LOAD_TIMEOUT=60000000 export OLLAMA_KEEP_ALIVE=60000000 export OLLAMA_CONTEXT_LENGTH=128000 export OLLAMA_DEBUG=1 > > ollama serve ollama launch claude --model qwen3.5:0.8b > > Claude Code v2.1.90 ollama version is 0.19.0 > > It worked on my personal computer which is have a GPU, when I opened this issue I had office laptop (it's also strong but without GPU) so I got timeout. But in my personal computer, it's still slow but end of the day no timeout error, Do you have GPU on your PC? it is obviously due to time it takes. My setup has no GPU, so it takes longer. I am in general OK with the time it takes, just need to find a way to configure ClaudeCode/Ollama to be OK with that too.

GiteaMirror referenced this issue

2026-04-13 00:11:35 -05:00

[PR #9121] [CLOSED] ml: let model specify rope configuration #12869

GiteaMirror referenced this issue

2026-04-13 00:11:58 -05:00

[PR #9200] [CLOSED] model: add new engine support for qwen2 family #12885

GiteaMirror referenced this issue

2026-04-16 06:25:56 -05:00

[PR #9121] [CLOSED] ml: let model specify rope configuration #18140

GiteaMirror referenced this issue

2026-04-16 06:26:36 -05:00

[PR #9200] [CLOSED] model: add new engine support for qwen2 family #18156

GiteaMirror referenced this issue

2026-04-19 16:58:24 -05:00

[PR #9121] [CLOSED] ml: let model specify rope configuration #23409

GiteaMirror referenced this issue

2026-04-19 16:59:12 -05:00

[PR #9200] [CLOSED] model: add new engine support for qwen2 family #23425

GiteaMirror referenced this issue

2026-04-22 23:24:34 -05:00

[PR #9121] [CLOSED] ml: let model specify rope configuration #38742

GiteaMirror referenced this issue

2026-04-22 23:25:29 -05:00

[PR #9200] [CLOSED] model: add new engine support for qwen2 family #38758

GiteaMirror referenced this issue

2026-04-24 23:38:59 -05:00

[PR #9121] [CLOSED] ml: let model specify rope configuration #44117

GiteaMirror referenced this issue

2026-04-24 23:39:52 -05:00

[PR #9200] [CLOSED] model: add new engine support for qwen2 family #44133

GiteaMirror referenced this issue

2026-04-29 14:29:47 -05:00

[PR #9121] [CLOSED] ml: let model specify rope configuration #59566

GiteaMirror referenced this issue

2026-04-29 14:30:42 -05:00

[PR #9200] [CLOSED] model: add new engine support for qwen2 family #59582

GiteaMirror referenced this issue

2026-05-05 07:35:47 -05:00

[PR #9121] [CLOSED] ml: let model specify rope configuration #75163

GiteaMirror referenced this issue

2026-05-05 07:36:48 -05:00

[PR #9200] [CLOSED] model: add new engine support for qwen2 family #75179

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#9121