[GH-ISSUE #13935] Error 500 when running glm-4.7-flash with 198k with claude code docker #9118

Open
opened 2026-04-12 21:58:16 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @kentsuiGitHub on GitHub (Jan 27, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13935

What is the issue?

I run a separate docker for ollama and claude code. I created a custom model based on model glm-4.7-flash for 198k context length. After several successful (200) API call, ollama start to return error 500 and the duration is around 5 minutes. Below is the part of the log.

[GIN] 2026/01/27 - 14:08:52 | 200 | 6m34s | 192.168.12.60 | POST "/v1/messages?beta=true"

time=2026-01-27T14:08:52.336Z level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 refCount=0

time=2026-01-27T14:08:52.336Z level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 duration=1h0m0s

time=2026-01-27T14:09:01.047Z level=DEBUG source=server.go:1533 msg="completion request" images=0 prompt=177952 format=""

time=2026-01-27T14:08:52.823Z level=DEBUG source=sched.go:678 msg="evaluating already loaded" model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c

time=2026-01-27T14:13:52.450Z level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752

time=2026-01-27T14:09:01.388Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=46241 prompt=44640 used=16690 remaining=27950

time=2026-01-27T14:13:52.450Z level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 duration=1h0m0s

[GIN] 2026/01/27 - 14:13:52 | 500 | 4m59s | 192.168.12.60 | POST "/v1/messages?beta=true"

time=2026-01-27T14:13:52.450Z level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 refCount=0

time=2026-01-27T14:13:53.849Z level=DEBUG source=sched.go:678 msg="evaluating already loaded" model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c

time=2026-01-27T14:14:01.965Z level=DEBUG source=server.go:1533 msg="completion request" images=0 prompt=177952 format=""

time=2026-01-27T14:14:02.275Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=45327 prompt=44640 used=16690 remaining=27950

time=2026-01-27T14:18:53.446Z level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752

[GIN] 2026/01/27 - 14:18:53 | 500 | 4m59s | 192.168.12.60 | POST "/v1/messages?beta=true"

time=2026-01-27T14:18:53.446Z level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 duration=1h0m0s

time=2026-01-27T14:18:53.446Z level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 refCount=0

time=2026-01-27T14:18:54.968Z level=DEBUG source=sched.go:678 msg="evaluating already loaded" model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c

time=2026-01-27T14:19:02.898Z level=DEBUG source=server.go:1533 msg="completion request" images=0 prompt=177952 format=""

time=2026-01-27T14:19:03.219Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=45329 prompt=44640 used=16690 remaining=27950

time=2026-01-27T14:23:54.602Z level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752

[GIN] 2026/01/27 - 14:23:54 | 500 | 4m59s | 192.168.12.60 | POST "/v1/messages?beta=true"

time=2026-01-27T14:23:54.603Z level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 duration=1h0m0s

time=2026-01-27T14:23:54.603Z level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 refCount=0

time=2026-01-27T14:23:57.420Z level=DEBUG source=sched.go:678 msg="evaluating already loaded" model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c

time=2026-01-27T14:24:05.784Z level=DEBUG source=server.go:1533 msg="completion request" images=0 prompt=177952 format=""

time=2026-01-27T14:24:06.081Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=45331 prompt=44640 used=18684 remaining=25956

[GIN] 2026/01/27 - 14:28:57 | 500 | 4m59s | 192.168.12.60 | POST "/v1/messages?beta=true"

time=2026-01-27T14:28:57.079Z level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752

time=2026-01-27T14:28:57.079Z level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 duration=1h0m0s

time=2026-01-27T14:28:57.079Z level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 refCount=0

time=2026-01-27T14:29:01.937Z level=DEBUG source=sched.go:678 msg="evaluating already loaded" model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c

time=2026-01-27T14:29:10.062Z level=DEBUG source=server.go:1533 msg="completion request" images=0 prompt=177952 format=""

time=2026-01-27T14:29:10.401Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=45386 prompt=44640 used=16690 remaining=27950

time=2026-01-27T14:34:01.582Z level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752

Relevant log output


OS

Docker

GPU

Other

CPU

Other

Ollama version

0.15.1 with jetpack 6

Originally created by @kentsuiGitHub on GitHub (Jan 27, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13935 ### What is the issue? I run a separate docker for ollama and claude code. I created a custom model based on model glm-4.7-flash for 198k context length. After several successful (200) API call, ollama start to return error 500 and the duration is around 5 minutes. Below is the part of the log. [GIN] 2026/01/27 - 14:08:52 | 200 | 6m34s | 192.168.12.60 | POST "/v1/messages?beta=true" time=2026-01-27T14:08:52.336Z level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 refCount=0 time=2026-01-27T14:08:52.336Z level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 duration=1h0m0s time=2026-01-27T14:09:01.047Z level=DEBUG source=server.go:1533 msg="completion request" images=0 prompt=177952 format="" time=2026-01-27T14:08:52.823Z level=DEBUG source=sched.go:678 msg="evaluating already loaded" model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c time=2026-01-27T14:13:52.450Z level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 time=2026-01-27T14:09:01.388Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=46241 prompt=44640 used=16690 remaining=27950 time=2026-01-27T14:13:52.450Z level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 duration=1h0m0s [GIN] 2026/01/27 - 14:13:52 | 500 | 4m59s | 192.168.12.60 | POST "/v1/messages?beta=true" time=2026-01-27T14:13:52.450Z level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 refCount=0 time=2026-01-27T14:13:53.849Z level=DEBUG source=sched.go:678 msg="evaluating already loaded" model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c time=2026-01-27T14:14:01.965Z level=DEBUG source=server.go:1533 msg="completion request" images=0 prompt=177952 format="" time=2026-01-27T14:14:02.275Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=45327 prompt=44640 used=16690 remaining=27950 time=2026-01-27T14:18:53.446Z level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 [GIN] 2026/01/27 - 14:18:53 | 500 | 4m59s | 192.168.12.60 | POST "/v1/messages?beta=true" time=2026-01-27T14:18:53.446Z level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 duration=1h0m0s time=2026-01-27T14:18:53.446Z level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 refCount=0 time=2026-01-27T14:18:54.968Z level=DEBUG source=sched.go:678 msg="evaluating already loaded" model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c time=2026-01-27T14:19:02.898Z level=DEBUG source=server.go:1533 msg="completion request" images=0 prompt=177952 format="" time=2026-01-27T14:19:03.219Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=45329 prompt=44640 used=16690 remaining=27950 time=2026-01-27T14:23:54.602Z level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 [GIN] 2026/01/27 - 14:23:54 | 500 | 4m59s | 192.168.12.60 | POST "/v1/messages?beta=true" time=2026-01-27T14:23:54.603Z level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 duration=1h0m0s time=2026-01-27T14:23:54.603Z level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 refCount=0 time=2026-01-27T14:23:57.420Z level=DEBUG source=sched.go:678 msg="evaluating already loaded" model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c time=2026-01-27T14:24:05.784Z level=DEBUG source=server.go:1533 msg="completion request" images=0 prompt=177952 format="" time=2026-01-27T14:24:06.081Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=45331 prompt=44640 used=18684 remaining=25956 [GIN] 2026/01/27 - 14:28:57 | 500 | 4m59s | 192.168.12.60 | POST "/v1/messages?beta=true" time=2026-01-27T14:28:57.079Z level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 time=2026-01-27T14:28:57.079Z level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 duration=1h0m0s time=2026-01-27T14:28:57.079Z level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 refCount=0 time=2026-01-27T14:29:01.937Z level=DEBUG source=sched.go:678 msg="evaluating already loaded" model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c time=2026-01-27T14:29:10.062Z level=DEBUG source=server.go:1533 msg="completion request" images=0 prompt=177952 format="" time=2026-01-27T14:29:10.401Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=45386 prompt=44640 used=16690 remaining=27950 time=2026-01-27T14:34:01.582Z level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/library/glm-4.7-flash:q8-198k runner.inference="[{ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Library:CUDA}]" runner.size="41.1 GiB" runner.vram="41.1 GiB" runner.parallel=1 runner.pid=87 runner.model=/ollama/blobs/sha256-1bfdff04a01e06051d7dcf5bcd6d7486240e1a92d2ce3325f727a20f2965e68c runner.num_ctx=202752 ### Relevant log output ```shell ``` ### OS Docker ### GPU Other ### CPU Other ### Ollama version 0.15.1 with jetpack 6
GiteaMirror added the bug label 2026-04-12 21:58:16 -05:00
Author
Owner

@kentsuiGitHub commented on GitHub (Jan 27, 2026):

When load the model , key not found is identified in the log.

time=2026-01-27T16:45:31.145Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32

time=2026-01-27T16:45:37.280Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=glm4moelite.pooling_type default=0

time=2026-01-27T16:45:37.280Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false

time=2026-01-27T16:45:37.280Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0

<!-- gh-comment-id:3806303949 --> @kentsuiGitHub commented on GitHub (Jan 27, 2026): When load the model , key not found is identified in the log. time=2026-01-27T16:45:31.145Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32 time=2026-01-27T16:45:37.280Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=glm4moelite.pooling_type default=0 time=2026-01-27T16:45:37.280Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-01-27T16:45:37.280Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
Author
Owner

@rick-github commented on GitHub (Jan 27, 2026):

It's likely the prompt from Claude Code is large and the model doesn't finish before CC triggers a timeout and disconnects.

<!-- gh-comment-id:3807846530 --> @rick-github commented on GitHub (Jan 27, 2026): It's likely the prompt from Claude Code is large and the model doesn't finish before CC triggers a timeout and disconnects.
Author
Owner

@kentsuiGitHub commented on GitHub (Jan 28, 2026):

Added below to claude code to extend the timeout but the error is still exist.
{
"env": {
"BASH_DEFAULT_TIMEOUT_MS": "1800000", // 30 minutes
"BASH_MAX_TIMEOUT_MS": "7200000" // 120 minutes max
}

<!-- gh-comment-id:3808353930 --> @kentsuiGitHub commented on GitHub (Jan 28, 2026): Added below to claude code to extend the timeout but the error is still exist. { "env": { "BASH_DEFAULT_TIMEOUT_MS": "1800000", // 30 minutes "BASH_MAX_TIMEOUT_MS": "7200000" // 120 minutes max }
Author
Owner

@rick-github commented on GitHub (Jan 28, 2026):

API timeout, not bash timeout.

<!-- gh-comment-id:3809351911 --> @rick-github commented on GitHub (Jan 28, 2026): API timeout, not bash timeout.
Author
Owner

@kentsuiGitHub commented on GitHub (Jan 28, 2026):

Suspect error 500 is due to context lenght is too large (>198k) because running /clear in Claude Code will resume the implementation.

<!-- gh-comment-id:3811439349 --> @kentsuiGitHub commented on GitHub (Jan 28, 2026): Suspect error 500 is due to context lenght is too large (>198k) because running /clear in Claude Code will resume the implementation.
Author
Owner

@rick-github commented on GitHub (Jan 28, 2026):

Suspect error 500 is due to context lenght is too large (>198k) because running /clear in Claude Code will resume the implementation.

https://github.com/ollama/ollama/issues/13935#issuecomment-3807846530

<!-- gh-comment-id:3811462018 --> @rick-github commented on GitHub (Jan 28, 2026): > Suspect error 500 is due to context lenght is too large (>198k) because running /clear in Claude Code will resume the implementation. https://github.com/ollama/ollama/issues/13935#issuecomment-3807846530
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9118