[GH-ISSUE #13461] Ollama crashes with 100% cpu on one core when near context limit or truncating #34642

New Issue

GiteaMirror · 2026-04-22T18:22:54-05:00

GiteaMirror commented

2026-04-22 18:22:54 -05:00

Originally created by @arlaneenalra on GitHub (Dec 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13461

What is the issue?

Since about 0.13.3 (might have been a bit earlier not really sure) I've noticed that Ollama will run fine for 1 request and then seemingly drop into a CPU spin loop burning 100% cpu and become at least partially unresponsive. This thread does not release memory it has allocated. This seems to happen any time the API triggers a truncation, though I'm not sure if it's the truncation or not just that seeing this log message:

Dec 13 19:16:40 framework ollama[11089]: time=2025-12-13T19:16:40.738Z level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=11441 keep=4 new=4096

in the logs almost always means that I know have a thread of ollama burning CPU. So far, I've only seen this on my Strix Halo Linux machines running Vulkan. The model doesn't seem to matter too much, I've seen this behavior with gpt-oss:120b, ministral-3:14b, qwen3-next and a few others.

To reproduce, I usually do something like:

ollama run gpt-oss:120b

Output a list of the extended ascii character table as used on IBM compatible computers. In this list include the Decimal, hexadecimal, octal, and binary representations of the character codes, the purpose of non-printing characters as well as their alternative graphic representation. This table should include the original 128 characters as well as the extra 128 characters that were available on IBM compatible computers.

With that prompt it will sometimes hang outright in mid generation and seemingly drop into the same state, but without the truncation log message. Seems like there might be something happening when at or near the context limit. If set larger context windows, the crash does not seem to happen, so I'm almost certain it has something to do with manipulating the context storage in some manner but I haven't dug into that code.

Relevant log output

Hang mid generation:


Dec 13 19:33:03 framework systemd[1]: Started ollama.service - Ollama Service.
Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.036Z level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/jules/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:true OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.049Z level=INFO source=images.go:522 msg="total blobs: 201"
Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.051Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
Dec 13 19:33:03 framework ollama[11994]:  - using env:        export GIN_MODE=release
Dec 13 19:33:03 framework ollama[11994]:  - using code:        gin.SetMode(gin.ReleaseMode)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/me                   --> github.com/ollama/ollama/server.(*Server).WhoamiHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/signout              --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] DELETE /api/user/keys/:encodedKey --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST   /v1/responses             --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.051Z level=INFO source=routes.go:1607 msg="Listening on [::]:11434 (version 0.0.0)"
Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.052Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.052Z level=INFO source=server.go:429 msg="starting runner" cmd="/opt/ollama-0.13.4-rc1/bin/ollama runner --ollama-engine --port 43023"
Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.100Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c300-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama driver=0.0 pci_id=0000:c3:00.0 type=iGPU total="117.7 GiB" available="117.5 GiB"
Dec 13 19:33:10 framework ollama[11994]: [GIN] 2025/12/13 - 19:33:10 | 200 |      49.123µs |       127.0.0.1 | HEAD     "/"
Dec 13 19:33:10 framework ollama[11994]: [GIN] 2025/12/13 - 19:33:10 | 200 |   71.524431ms |       127.0.0.1 | POST     "/api/show"
Dec 13 19:33:10 framework ollama[11994]: [GIN] 2025/12/13 - 19:33:10 | 200 |    71.88595ms |       127.0.0.1 | POST     "/api/show"
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.034Z level=INFO source=server.go:429 msg="starting runner" cmd="/opt/ollama-0.13.4-rc1/bin/ollama runner --ollama-engine --port 33903"
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.158Z level=INFO source=server.go:245 msg="enabling flash attention"
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.158Z level=INFO source=server.go:429 msg="starting runner" cmd="/opt/ollama-0.13.4-rc1/bin/ollama runner --ollama-engine --model /home/jules/.ollama/models/blobs/sha256-90a618fe6ff21b09ca968df959104eb650658b0bef0faef785c18c2795d993e3 --port 33963"
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.159Z level=INFO source=sched.go:443 msg="system memory" total="125.1 GiB" free="116.7 GiB" free_swap="8.0 GiB"
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.159Z level=INFO source=sched.go:450 msg="gpu memory" id=00000000-c300-0000-0000-000000000000 library=Vulkan available="117.1 GiB" free="117.5 GiB" minimum="457.0 MiB" overhead="0 B"
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.159Z level=INFO source=server.go:746 msg="loading model" "model layers"=37 requested=-1
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.167Z level=INFO source=runner.go:1405 msg="starting ollama engine"
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.168Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:33963"
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.171Z level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:16 GPULayers:37[ID:00000000-c300-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.205Z level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=471 num_key_values=30
Dec 13 19:33:11 framework ollama[11994]: ggml_vulkan: Found 1 Vulkan devices:
Dec 13 19:33:11 framework ollama[11994]: ggml_vulkan: 0 = AMD Radeon 8060S (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
Dec 13 19:33:11 framework ollama[11994]: load_backend: loaded Vulkan backend from /opt/ollama-0.13.4-rc1/lib/ollama/libggml-vulkan.so
Dec 13 19:33:11 framework ollama[11994]: load_backend: loaded CPU backend from /opt/ollama-0.13.4-rc1/lib/ollama/libggml-cpu-icelake.so
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.233Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Dec 13 19:33:11 framework ollama[11994]: ggml_backend_vk_get_device_memory called: uuid 00000000-c300-0000-0000-000000000000
Dec 13 19:33:11 framework ollama[11994]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.253Z level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:16 GPULayers:37[ID:00000000-c300-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 13 19:33:11 framework ollama[11994]: ggml_backend_vk_get_device_memory called: uuid 00000000-c300-0000-0000-000000000000
Dec 13 19:33:11 framework ollama[11994]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:16 GPULayers:37[ID:00000000-c300-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="59.8 GiB"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:245 msg="model weights" device=CPU size="1.1 GiB"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="450.0 MiB"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="125.1 MiB"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.6 MiB"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:272 msg="total memory" size="61.4 GiB"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=sched.go:517 msg="loaded runners" count=1
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=server.go:1338 msg="waiting for llama runner to start responding"
Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=server.go:1372 msg="waiting for server to become available" status="llm server loading model"
Dec 13 19:33:30 framework ollama[11994]: time=2025-12-13T19:33:30.900Z level=INFO source=server.go:1376 msg="llama runner started in 19.74 seconds"
Dec 13 19:33:30 framework ollama[11994]: [GIN] 2025/12/13 - 19:33:30 | 200 | 20.008054023s |       127.0.0.1 | POST     "/api/generate"
Dec 13 19:35:43 framework ollama[11994]: [GIN] 2025/12/13 - 19:35:43 | 200 |      72.356µs |       127.0.0.1 | GET      "/api/version"
Dec 13 19:39:54 framework ollama[11994]: [GIN] 2025/12/13 - 19:39:54 | 200 |          5m0s |       127.0.0.1 | POST     "/api/chat"

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

Source built 0.13.4-rc1 (Vulkan SDK vulkansdk-linux-x86_64-1.4.335.0)

Originally created by @arlaneenalra on GitHub (Dec 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13461 ### What is the issue? Since about 0.13.3 (might have been a bit earlier not really sure) I've noticed that Ollama will run fine for 1 request and then seemingly drop into a CPU spin loop burning 100% cpu and become at least partially unresponsive. This thread does not release memory it has allocated. This seems to happen any time the API triggers a truncation, though I'm not sure if it's the truncation or not just that seeing this log message: ``` Dec 13 19:16:40 framework ollama[11089]: time=2025-12-13T19:16:40.738Z level=WARN source=runner.go:186 msg="truncating input prompt" limit=4096 prompt=11441 keep=4 new=4096 ``` in the logs almost always means that I know have a thread of ollama burning CPU. So far, I've only seen this on my Strix Halo Linux machines running Vulkan. The model doesn't seem to matter too much, I've seen this behavior with gpt-oss:120b, ministral-3:14b, qwen3-next and a few others. To reproduce, I usually do something like: ``` ollama run gpt-oss:120b Output a list of the extended ascii character table as used on IBM compatible computers. In this list include the Decimal, hexadecimal, octal, and binary representations of the character codes, the purpose of non-printing characters as well as their alternative graphic representation. This table should include the original 128 characters as well as the extra 128 characters that were available on IBM compatible computers. ``` With that prompt it will sometimes hang outright in mid generation and seemingly drop into the same state, but without the truncation log message. Seems like there might be something happening when at or near the context limit. If set larger context windows, the crash does not seem to happen, so I'm almost certain it has something to do with manipulating the context storage in some manner but I haven't dug into that code. ### Relevant log output ```shell Hang mid generation: Dec 13 19:33:03 framework systemd[1]: Started ollama.service - Ollama Service. Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.036Z level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/jules/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:true OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.049Z level=INFO source=images.go:522 msg="total blobs: 201" Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.051Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. Dec 13 19:33:03 framework ollama[11994]: - using env: export GIN_MODE=release Dec 13 19:33:03 framework ollama[11994]: - using code: gin.SetMode(gin.ReleaseMode) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/me --> github.com/ollama/ollama/server.(*Server).WhoamiHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/signout --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] DELETE /api/user/keys/:encodedKey --> github.com/ollama/ollama/server.(*Server).SignoutHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) Dec 13 19:33:03 framework ollama[11994]: [GIN-debug] POST /v1/responses --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.051Z level=INFO source=routes.go:1607 msg="Listening on [::]:11434 (version 0.0.0)" Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.052Z level=INFO source=runner.go:67 msg="discovering available GPUs..." Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.052Z level=INFO source=server.go:429 msg="starting runner" cmd="/opt/ollama-0.13.4-rc1/bin/ollama runner --ollama-engine --port 43023" Dec 13 19:33:03 framework ollama[11994]: time=2025-12-13T19:33:03.100Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c300-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama driver=0.0 pci_id=0000:c3:00.0 type=iGPU total="117.7 GiB" available="117.5 GiB" Dec 13 19:33:10 framework ollama[11994]: [GIN] 2025/12/13 - 19:33:10 | 200 | 49.123µs | 127.0.0.1 | HEAD "/" Dec 13 19:33:10 framework ollama[11994]: [GIN] 2025/12/13 - 19:33:10 | 200 | 71.524431ms | 127.0.0.1 | POST "/api/show" Dec 13 19:33:10 framework ollama[11994]: [GIN] 2025/12/13 - 19:33:10 | 200 | 71.88595ms | 127.0.0.1 | POST "/api/show" Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.034Z level=INFO source=server.go:429 msg="starting runner" cmd="/opt/ollama-0.13.4-rc1/bin/ollama runner --ollama-engine --port 33903" Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.158Z level=INFO source=server.go:245 msg="enabling flash attention" Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.158Z level=INFO source=server.go:429 msg="starting runner" cmd="/opt/ollama-0.13.4-rc1/bin/ollama runner --ollama-engine --model /home/jules/.ollama/models/blobs/sha256-90a618fe6ff21b09ca968df959104eb650658b0bef0faef785c18c2795d993e3 --port 33963" Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.159Z level=INFO source=sched.go:443 msg="system memory" total="125.1 GiB" free="116.7 GiB" free_swap="8.0 GiB" Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.159Z level=INFO source=sched.go:450 msg="gpu memory" id=00000000-c300-0000-0000-000000000000 library=Vulkan available="117.1 GiB" free="117.5 GiB" minimum="457.0 MiB" overhead="0 B" Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.159Z level=INFO source=server.go:746 msg="loading model" "model layers"=37 requested=-1 Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.167Z level=INFO source=runner.go:1405 msg="starting ollama engine" Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.168Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:33963" Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.171Z level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:16 GPULayers:37[ID:00000000-c300-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.205Z level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=471 num_key_values=30 Dec 13 19:33:11 framework ollama[11994]: ggml_vulkan: Found 1 Vulkan devices: Dec 13 19:33:11 framework ollama[11994]: ggml_vulkan: 0 = AMD Radeon 8060S (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat Dec 13 19:33:11 framework ollama[11994]: load_backend: loaded Vulkan backend from /opt/ollama-0.13.4-rc1/lib/ollama/libggml-vulkan.so Dec 13 19:33:11 framework ollama[11994]: load_backend: loaded CPU backend from /opt/ollama-0.13.4-rc1/lib/ollama/libggml-cpu-icelake.so Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.233Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Dec 13 19:33:11 framework ollama[11994]: ggml_backend_vk_get_device_memory called: uuid 00000000-c300-0000-0000-000000000000 Dec 13 19:33:11 framework ollama[11994]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 Dec 13 19:33:11 framework ollama[11994]: time=2025-12-13T19:33:11.253Z level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:16 GPULayers:37[ID:00000000-c300-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 13 19:33:11 framework ollama[11994]: ggml_backend_vk_get_device_memory called: uuid 00000000-c300-0000-0000-000000000000 Dec 13 19:33:11 framework ollama[11994]: ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:16 GPULayers:37[ID:00000000-c300-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="59.8 GiB" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:245 msg="model weights" device=CPU size="1.1 GiB" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="450.0 MiB" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="125.1 MiB" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.6 MiB" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=device.go:272 msg="total memory" size="61.4 GiB" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=sched.go:517 msg="loaded runners" count=1 Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=server.go:1338 msg="waiting for llama runner to start responding" Dec 13 19:33:14 framework ollama[11994]: time=2025-12-13T19:33:14.103Z level=INFO source=server.go:1372 msg="waiting for server to become available" status="llm server loading model" Dec 13 19:33:30 framework ollama[11994]: time=2025-12-13T19:33:30.900Z level=INFO source=server.go:1376 msg="llama runner started in 19.74 seconds" Dec 13 19:33:30 framework ollama[11994]: [GIN] 2025/12/13 - 19:33:30 | 200 | 20.008054023s | 127.0.0.1 | POST "/api/generate" Dec 13 19:35:43 framework ollama[11994]: [GIN] 2025/12/13 - 19:35:43 | 200 | 72.356µs | 127.0.0.1 | GET "/api/version" Dec 13 19:39:54 framework ollama[11994]: [GIN] 2025/12/13 - 19:39:54 | 200 | 5m0s | 127.0.0.1 | POST "/api/chat" ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version Source built 0.13.4-rc1 (Vulkan SDK vulkansdk-linux-x86_64-1.4.335.0)

GiteaMirror added the vulkan bug labels 2026-04-22 18:22:54 -05:00

GiteaMirror commented

2026-04-22 18:22:56 -05:00

@arlaneenalra commented on GitHub (Dec 13, 2025):

Follow on. I ran:

ollama run gpt-120b

With these prompts:

Output a list of the first 128 ascii characters as used on IBM compatible computers. In this list include the Decimal, hexadecimal, octal, and binary representations of the character codes, the purpose of non-printing characters as well as their alternative graphic representation.

Ok now do the same thing for the 128 characters of the extended ascii table as used on the IBM PC.

And got a hard crash:

| 205 | 0xCD | 0355 | 11001101 | ═ | U+2550 | Box drawings double horizontal |
| 206 | 0xCE | 0356 | 11001110 | ╬ | U+256C | Box drawings double vertical & horizontal |
| 207 | 0xCF | 0357 | 11001111 | ¤ | U+00A4 | Currency sign |
| 208 | 0xD0 | 0360 | 11010000 | ð | U+00F0 | Latin small eth |
| 209 | 0xD1 | 0361 | 11010001 | Ð | U+00D0 | Latin capital eth |
|Error: an error was encountered while running the model: unexpected EOF

logs2.txt

@arlaneenalra commented on GitHub (Dec 13, 2025): Follow on. I ran: ``` ollama run gpt-120b ``` With these prompts: ``` Output a list of the first 128 ascii characters as used on IBM compatible computers. In this list include the Decimal, hexadecimal, octal, and binary representations of the character codes, the purpose of non-printing characters as well as their alternative graphic representation. ``` ``` Ok now do the same thing for the 128 characters of the extended ascii table as used on the IBM PC. ``` And got a hard crash: ``` | 205 | 0xCD | 0355 | 11001101 | ═ | U+2550 | Box drawings double horizontal | | 206 | 0xCE | 0356 | 11001110 | ╬ | U+256C | Box drawings double vertical & horizontal | | 207 | 0xCF | 0357 | 11001111 | ¤ | U+00A4 | Currency sign | | 208 | 0xD0 | 0360 | 11010000 | ð | U+00F0 | Latin small eth | | 209 | 0xD1 | 0361 | 11010001 | Ð | U+00D0 | Latin capital eth | |Error: an error was encountered while running the model: unexpected EOF ``` [logs2.txt](https://github.com/user-attachments/files/24145368/logs2.txt)

GiteaMirror commented

2026-04-22 18:22:56 -05:00

@arlaneenalra commented on GitHub (Dec 13, 2025):

Vulkan Info:

vulkaninfo.txt

jules@framework:~/code/1.4.335.0$ uname -a
Linux framework 6.14.0-37-generic #37~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 20 10:25:38 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

@arlaneenalra commented on GitHub (Dec 13, 2025): Vulkan Info: [vulkaninfo.txt](https://github.com/user-attachments/files/24145385/vulkaninfo.txt) ``` jules@framework:~/code/1.4.335.0$ uname -a Linux framework 6.14.0-37-generic #37~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 20 10:25:38 UTC 2 x86_64 x86_64 x86_64 GNU/Linux ```

GiteaMirror commented

2026-04-22 18:22:57 -05:00

@fmu83 commented on GitHub (Jan 17, 2026):

Same issue here.
gpt-oss:20b-ctx16384

CPU: Intel N100
GPU: Intel Arc Pro B50, 16 GB VRAM

OLLAMA version is 0.14.3-rc1
Vulkan Instance Version: 1.4.304
Mesa 25.3.3
firmware-intel-graphics 20251021-1~bpo13+1

Ollama Model hungs every few hours after a large request
Jan 17 11:35:34 intel-ai ollama[610]: time=2026-01-17T11:35:34.179Z level=WARN source=runner.go:186 msg="truncating input prompt" limit=16384 prompt=28147 keep=4 new=16384

Ollama in general is responsive. I'm able to make a API call to list the models. But the model itself seems to be crashed. It is unresponsive if I try to do a "ollama run" and spinnes endlessly.

Stacktrace after "kill -QUIT OLLAMAPID:

SIGQUIT: quit
PC=0x59093d02f9c1 m=0 sigcode=0
goroutine 0 gp=0x59093efbf180 m=0 mp=0x59093efc0f40 [idle]:
runtime.futex(0x59093efc1080, 0x80, 0x0, 0x0, 0x0, 0x0)
runtime/sys_linux_amd64.s:557 +0x21 fp=0x7ffeef8bc1a8 sp=0x7ffeef8bc1a0 pc=0x59093d02f9c1
runtime.futexsleep(0x7ffeef8bc220?, 0x3cfc8611?, 0x59093d02f5ad?)
runtime/os_linux.go:75 +0x30 fp=0x7ffeef8bc1f8 sp=0x7ffeef8bc1a8 pc=0x59093cfebd70
runtime.notesleep(0x59093efc1080)
runtime/lock_futex.go:47 +0x87 fp=0x7ffeef8bc230 sp=0x7ffeef8bc1f8 pc=0x59093cfc7d27
runtime.mPark(...)
runtime/proc.go:1887
runtime.stopm()
runtime/proc.go:2907 +0x8c fp=0x7ffeef8bc260 sp=0x7ffeef8bc230 pc=0x59093cff75cc
runtime.findRunnable()
runtime/proc.go:3644 +0xd9c fp=0x7ffeef8bc3d8 sp=0x7ffeef8bc260 pc=0x59093cff909c
runtime.schedule()
runtime/proc.go:4017 +0xb1 fp=0x7ffeef8bc410 sp=0x7ffeef8bc3d8 pc=0x59093cffa191
runtime.park_m(0xc000003340)
runtime/proc.go:4141 +0x285 fp=0x7ffeef8bc470 sp=0x7ffeef8bc410 pc=0x59093cffa645
runtime.mcall()
runtime/asm_amd64.s:459 +0x50 fp=0x7ffeef8bc488 sp=0x7ffeef8bc470 pc=0x59093d02bb70
goroutine 1 gp=0xc000002380 m=nil [IO wait, 28 minutes]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00134b790 sp=0xc00134b770 pc=0x59093d025d2e
runtime.netpollblock(0xc00134b7e0?, 0x3cfbf466?, 0x9?)
runtime/netpoll.go:575 +0xf7 fp=0xc00134b7c8 sp=0xc00134b790 pc=0x59093cfeb057
internal/poll.runtime_pollWait(0x7e3973ec6eb0, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc00134b7e8 sp=0xc00134b7c8 pc=0x59093d024f45
internal/poll.(*pollDesc).wait(0xc00011f700?, 0x900fc965e?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00134b810 sp=0xc00134b7e8 pc=0x59093d0ad0c7
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00011f700)
internal/poll/fd_unix.go:620 +0x295 fp=0xc00134b8b8 sp=0xc00134b810 pc=0x59093d0b2495
net.(*netFD).accept(0xc00011f700)
net/fd_unix.go:172 +0x29 fp=0xc00134b970 sp=0xc00134b8b8 pc=0x59093d125549
net.(*TCPListener).accept(0xc000525ec0)
net/tcpsock_posix.go:159 +0x1b fp=0xc00134b9c0 sp=0xc00134b970 pc=0x59093d13b45b
net.(*TCPListener).Accept(0xc000525ec0)
net/tcpsock.go:380 +0x30 fp=0xc00134b9f0 sp=0xc00134b9c0 pc=0x59093d13a310
net/http.(*onceCloseListener).Accept(0xc0000e9dd0?)
:1 +0x24 fp=0xc00134ba08 sp=0xc00134b9f0 pc=0x59093d3520c4
net/http.(*Server).Serve(0xc0001ef500, {0x59093e699a40, 0xc000525ec0})
net/http/server.go:3424 +0x30c fp=0xc00134bb38 sp=0xc00134ba08 pc=0x59093d32998c
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000340a0, 0x4, 0x4})
github.com/ollama/ollama/runner/ollamarunner/runner.go:1441 +0x94e fp=0xc00134bd08 sp=0xc00134bb38 pc=0x59093d591f6e
github.com/ollama/ollama/runner.Execute({0xc000034080?, 0x0?, 0x0?})
github.com/ollama/ollama/runner/runner.go:28 +0x125 fp=0xc00134bd30 sp=0xc00134bd08 pc=0x59093d5bdba5
github.com/ollama/ollama/cmd.NewCLI.func3(0xc0001ef300?, {0x59093e1350e6?, 0x4?, 0x59093e1350ea?})
github.com/ollama/ollama/cmd/cmd.go:1961 +0x45 fp=0xc00134bd58 sp=0xc00134bd30 pc=0x59093dd81125
github.com/spf13/cobra.(*Command).execute(0xc000149808, {0xc000527bd0, 0x5, 0x5})
github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00134be78 sp=0xc00134bd58 pc=0x59093d19f4bc
github.com/spf13/cobra.(*Command).ExecuteC(0xc00052a908)
github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00134bf30 sp=0xc00134be78 pc=0x59093d19fd05
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00134bf50 sp=0xc00134bf30 pc=0x59093dd81c0d
runtime.main()
runtime/proc.go:283 +0x29d fp=0xc00134bfe0 sp=0xc00134bf50 pc=0x59093cff26dd
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00134bfe8 sp=0xc00134bfe0 pc=0x59093d02dbc1
goroutine 2 gp=0xc000002e00 m=nil [force gc (idle), 2 minutes]:
runtime.gopark(0x1bcd18a6059f?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000064fa8 sp=0xc000064f88 pc=0x59093d025d2e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.forcegchelper()
runtime/proc.go:348 +0xb8 fp=0xc000064fe0 sp=0xc000064fa8 pc=0x59093cff2a18
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000064fe8 sp=0xc000064fe0 pc=0x59093d02dbc1
created by runtime.init.7 in goroutine 1
runtime/proc.go:336 +0x1a
goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000065780 sp=0xc000065760 pc=0x59093d025d2e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.bgsweep(0xc00007e000)
runtime/mgcsweep.go:316 +0xdf fp=0xc0000657c8 sp=0xc000065780 pc=0x59093cfdd1bf
runtime.gcenable.gowrap1()
runtime/mgc.go:204 +0x25 fp=0xc0000657e0 sp=0xc0000657c8 pc=0x59093cfd15a5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000657e8 sp=0xc0000657e0 pc=0x59093d02dbc1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0x66
goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait, 2 minutes]:
runtime.gopark(0x16208a7?, 0x152d9d3?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000065f78 sp=0xc000065f58 pc=0x59093d025d2e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.(*scavengerState).park(0x59093efbe120)
runtime/mgcscavenge.go:425 +0x49 fp=0xc000065fa8 sp=0xc000065f78 pc=0x59093cfdac09
runtime.bgscavenge(0xc00007e000)
runtime/mgcscavenge.go:658 +0x59 fp=0xc000065fc8 sp=0xc000065fa8 pc=0x59093cfdb199
runtime.gcenable.gowrap2()
runtime/mgc.go:205 +0x25 fp=0xc000065fe0 sp=0xc000065fc8 pc=0x59093cfd1545
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000065fe8 sp=0xc000065fe0 pc=0x59093d02dbc1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:205 +0xa5
goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait, 131 minutes]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000064688?)
runtime/proc.go:435 +0xce fp=0xc000064630 sp=0xc000064610 pc=0x59093d025d2e
runtime.runfinq()
runtime/mfinal.go:196 +0x107 fp=0xc0000647e0 sp=0xc000064630 pc=0x59093cfd0567
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000647e8 sp=0xc0000647e0 pc=0x59093d02dbc1
created by runtime.createfing in goroutine 1
runtime/mfinal.go:166 +0x3d
goroutine 6 gp=0xc0001cc8c0 m=nil [chan receive, 2 minutes]:
runtime.gopark(0xc00021fae0?, 0xc0005083f0?, 0x60?, 0x67?, 0x59093d10c188?)
runtime/proc.go:435 +0xce fp=0xc000066718 sp=0xc0000666f8 pc=0x59093d025d2e
runtime.chanrecv(0xc00009c310, 0x0, 0x1)
runtime/chan.go:664 +0x445 fp=0xc000066790 sp=0xc000066718 pc=0x59093cfc2045
runtime.chanrecv1(0x0?, 0x0?)
runtime/chan.go:506 +0x12 fp=0xc0000667b8 sp=0xc000066790 pc=0x59093cfc1bd2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
runtime/mgc.go:1799 +0x2f fp=0xc0000667e0 sp=0xc0000667b8 pc=0x59093cfd474f
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000667e8 sp=0xc0000667e0 pc=0x59093d02dbc1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
runtime/mgc.go:1794 +0x85
goroutine 7 gp=0xc0001cd180 m=nil [GC worker (idle)]:
runtime.gopark(0x1bcd1a505d78?, 0x1?, 0x67?, 0x85?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000066f38 sp=0xc000066f18 pc=0x59093d025d2e
runtime.gcBgMarkWorker(0xc00009d730)
runtime/mgc.go:1423 +0xe9 fp=0xc000066fc8 sp=0xc000066f38 pc=0x59093cfd3a69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000066fe0 sp=0xc000066fc8 pc=0x59093cfd3945
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000066fe8 sp=0xc000066fe0 pc=0x59093d02dbc1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 8 gp=0xc0001cd340 m=nil [GC worker (idle), 66 minutes]:
runtime.gopark(0x1832ed8786eb?, 0x3?, 0xa4?, 0x5c?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000067738 sp=0xc000067718 pc=0x59093d025d2e
runtime.gcBgMarkWorker(0xc00009d730)
runtime/mgc.go:1423 +0xe9 fp=0xc0000677c8 sp=0xc000067738 pc=0x59093cfd3a69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000677e0 sp=0xc0000677c8 pc=0x59093cfd3945
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000677e8 sp=0xc0000677e0 pc=0x59093d02dbc1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 9 gp=0xc0001cd500 m=nil [GC worker (idle)]:
runtime.gopark(0x59093f08da60?, 0x1?, 0x91?, 0xfe?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000067f38 sp=0xc000067f18 pc=0x59093d025d2e
runtime.gcBgMarkWorker(0xc00009d730)
runtime/mgc.go:1423 +0xe9 fp=0xc000067fc8 sp=0xc000067f38 pc=0x59093cfd3a69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000067fe0 sp=0xc000067fc8 pc=0x59093cfd3945
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000067fe8 sp=0xc000067fe0 pc=0x59093d02dbc1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 18 gp=0xc000102380 m=nil [GC worker (idle), 37 minutes]:
runtime.gopark(0x19c7179906cb?, 0x3?, 0x1b?, 0xfa?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000060738 sp=0xc000060718 pc=0x59093d025d2e
runtime.gcBgMarkWorker(0xc00009d730)
runtime/mgc.go:1423 +0xe9 fp=0xc0000607c8 sp=0xc000060738 pc=0x59093cfd3a69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000607e0 sp=0xc0000607c8 pc=0x59093cfd3945
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000607e8 sp=0xc0000607e0 pc=0x59093d02dbc1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 10 gp=0xc000540700 m=10 mp=0xc00009f808 [syscall, 37 minutes]:
runtime.cgocall(0x59093ddeffe5, 0xc007033318)
runtime/cgocall.go:167 +0x4b fp=0xc0070332f0 sp=0xc0070332b8 pc=0x59093d0228ab
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7e395c0017b0, 0x7e35a433fd50)
cgo_gotypes.go:977 +0x4a fp=0xc007033318 sp=0xc0070332f0 pc=0x59093d4a390a
github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify.func2(...)
github.com/ollama/ollama/ml/backend/ggml/ggml.go:825
github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(0xc0002d0180, 0x0?, {0x0, 0x0, 0xc007033518?})
github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 +0x1b2 fp=0xc0070333f0 sp=0xc007033318 pc=0x59093d4b12d2
github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc0002d0180?, {0x0?, 0xc0002d0180?, 0x59093e6b26a0?})
github.com/ollama/ollama/ml/backend/ggml/ggml.go:811 +0x25 fp=0xc007033428 sp=0xc0070333f0 pc=0x59093d4b10e5
github.com/ollama/ollama/kvcache.(*Causal).shift(0xc0001ef600, 0x0, 0x4, 0xffffe002)
github.com/ollama/ollama/kvcache/causal.go:608 +0x250 fp=0xc007033588 sp=0xc007033428 pc=0x59093d49f030
github.com/ollama/ollama/kvcache.(*Causal).Remove(0xc0001ef600, 0x0, 0x4, 0x2002)
github.com/ollama/ollama/kvcache/causal.go:659 +0x285 fp=0xc007033620 sp=0xc007033588 pc=0x59093d49f6c5
github.com/ollama/ollama/kvcache.(*WrapperCache).Remove(0xc000114890?, 0x0, 0x4, 0x2002)
github.com/ollama/ollama/kvcache/wrapper.go:103 +0x5e fp=0xc007033658 sp=0xc007033620 pc=0x59093d4a0b3e
github.com/ollama/ollama/runner/ollamarunner.(*InputCache).ShiftCacheSlot(0xc00302c880, 0xc00053a600, 0x4)
github.com/ollama/ollama/runner/ollamarunner/cache.go:290 +0x34c fp=0xc0070337f0 sp=0xc007033658 pc=0x59093d5864ec
github.com/ollama/ollama/runner/ollamarunner.(*Server).forwardBatch(, {0x110b, {0x59093e6a7670, 0xc002ffa080}, {0x59093e6b26a0, 0xc00125b410}, {0xc000232008, 0x3fc, 0x3ff}, {{0x59093e6b26a0, ...}, ...}, ...})
github.com/ollama/ollama/runner/ollamarunner/runner.go:565 +0xec5 fp=0xc007033b58 sp=0xc0070337f0 pc=0x59093d589c85
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002230e0, {0x59093e69c0a0, 0xc000527c70})
github.com/ollama/ollama/runner/ollamarunner/runner.go:452 +0x18c fp=0xc007033fb8 sp=0xc007033b58 pc=0x59093d588b6c
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
github.com/ollama/ollama/runner/ollamarunner/runner.go:1418 +0x28 fp=0xc007033fe0 sp=0xc007033fb8 pc=0x59093d5921e8
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc007033fe8 sp=0xc007033fe0 pc=0x59093d02dbc1
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
github.com/ollama/ollama/runner/ollamarunner/runner.go:1418 +0x4c9
goroutine 9306 gp=0xc000808c40 m=nil [sync.Mutex.Lock, 28 minutes]:
runtime.gopark(0x0?, 0xc001347710?, 0xfe?, 0x25?, 0xc00009c5b0?)
runtime/proc.go:435 +0xce fp=0xc0013476e0 sp=0xc0013476c0 pc=0x59093d025d2e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.semacquire1(0xc0002231dc, 0x0, 0x3, 0x2, 0x15)
runtime/sema.go:188 +0x229 fp=0xc001347748 sp=0xc0013476e0 pc=0x59093d005ca9
internal/sync.runtime_SemacquireMutex(0xc0013477c0?, 0x9f?, 0x59093e526e00?)
runtime/sema.go:95 +0x25 fp=0xc001347780 sp=0xc001347748 pc=0x59093d027545
internal/sync.(*Mutex).lockSlow(0xc0002231d8)
internal/sync/mutex.go:149 +0x15d fp=0xc0013477d0 sp=0xc001347780 pc=0x59093d03769d
internal/sync.(*Mutex).Lock(...)
internal/sync/mutex.go:70
sync.(*Mutex).Lock(...)
sync/mutex.go:46
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc0002230e0, {0x59093e699c20, 0xc0001622a0}, 0xc0004963c0)
github.com/ollama/ollama/runner/ollamarunner/runner.go:923 +0x66e fp=0xc001347ac0 sp=0xc0013477d0 pc=0x59093d58ccae
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x59093e699c20?, 0xc0001622a0?}, 0xc001347b40?)
:1 +0x36 fp=0xc001347af0 sp=0xc001347ac0 pc=0x59093d5926d6
net/http.HandlerFunc.ServeHTTP(0xc00053aa80?, {0x59093e699c20?, 0xc0001622a0?}, 0xc001347b60?)
net/http/server.go:2294 +0x29 fp=0xc001347b18 sp=0xc001347af0 pc=0x59093d325fc9
net/http.(*ServeMux).ServeHTTP(0x59093cfcaa85?, {0x59093e699c20, 0xc0001622a0}, 0xc0004963c0)
net/http/server.go:2822 +0x1c4 fp=0xc001347b68 sp=0xc001347b18 pc=0x59093d327ec4
net/http.serverHandler.ServeHTTP({0x59093e696110?}, {0x59093e699c20?, 0xc0001622a0?}, 0x1?)
net/http/server.go:3301 +0x8e fp=0xc001347b98 sp=0xc001347b68 pc=0x59093d34594e
net/http.(*conn).serve(0xc0000e9dd0, {0x59093e69c068, 0xc000218d20})
net/http/server.go:2102 +0x625 fp=0xc001347fb8 sp=0xc001347b98 pc=0x59093d3244c5
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3454 +0x28 fp=0xc001347fe0 sp=0xc001347fb8 pc=0x59093d329d88
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc001347fe8 sp=0xc001347fe0 pc=0x59093d02dbc1
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3454 +0x485
goroutine 9295 gp=0xc0014441c0 m=nil [sync.Mutex.Lock, 34 minutes]:
runtime.gopark(0x59093efc0f40?, 0xc000e8a0c0?, 0x80?, 0x2a?, 0x59093d023839?)
runtime/proc.go:435 +0xce fp=0xc00007ba88 sp=0xc00007ba68 pc=0x59093d025d2e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.semacquire1(0xc0002231dc, 0x0, 0x3, 0x2, 0x15)
runtime/sema.go:188 +0x229 fp=0xc00007baf0 sp=0xc00007ba88 pc=0x59093d005ca9
internal/sync.runtime_SemacquireMutex(0x59093d41c4d4?, 0x68?, 0xc000e8a0c0?)
runtime/sema.go:95 +0x25 fp=0xc00007bb28 sp=0xc00007baf0 pc=0x59093d027545
internal/sync.(*Mutex).lockSlow(0xc0002231d8)
internal/sync/mutex.go:149 +0x15d fp=0xc00007bb78 sp=0xc00007bb28 pc=0x59093d03769d
internal/sync.(*Mutex).Lock(...)
internal/sync/mutex.go:70
sync.(*Mutex).Lock(...)
sync/mutex.go:46
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc0002230e0, {0x110b, {0x59093e6a7670, 0xc002ffa080}, {0x59093e6b26a0, 0xc00125b410}, {0xc000232008, 0x3fc, 0x3ff}, {{0x59093e6b26a0, ...}, ...}, ...})
github.com/ollama/ollama/runner/ollamarunner/runner.go:735 +0x972 fp=0xc00007bef0 sp=0xc00007bb78 pc=0x59093d58b292
github.com/ollama/ollama/runner/ollamarunner.(*Server).run.gowrap1()
github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x58 fp=0xc00007bfe0 sp=0xc00007bef0 pc=0x59093d588d98
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00007bfe8 sp=0xc00007bfe0 pc=0x59093d02dbc1
created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 10
github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd
rax 0xca
rbx 0x0
rcx 0x59093d02f9c3
rdx 0x0
rdi 0x59093efc1080
rsi 0x80
rbp 0x7ffeef8bc1e8
rsp 0x7ffeef8bc1a0
r8 0x0
r9 0x0
r10 0x0
r11 0x286
r12 0x7ffeef8bc220
r13 0x7e3970219501
r14 0x59093efbf180
r15 0x1
rip 0x59093d02f9c1
rflags 0x286
cs 0x33
fs 0x0
gs 0x0

@fmu83 commented on GitHub (Jan 17, 2026): Same issue here. gpt-oss:20b-ctx16384 CPU: Intel N100 GPU: Intel Arc Pro B50, 16 GB VRAM OLLAMA version is 0.14.3-rc1 Vulkan Instance Version: 1.4.304 Mesa 25.3.3 firmware-intel-graphics 20251021-1~bpo13+1 Ollama Model hungs every few hours after a large request Jan 17 11:35:34 intel-ai ollama[610]: time=2026-01-17T11:35:34.179Z level=WARN source=runner.go:186 msg="truncating input prompt" limit=16384 prompt=28147 keep=4 new=16384 Ollama in general is responsive. I'm able to make a API call to list the models. But the model itself seems to be crashed. It is unresponsive if I try to do a "ollama run" and spinnes endlessly. Stacktrace after "kill -QUIT OLLAMAPID: SIGQUIT: quit PC=0x59093d02f9c1 m=0 sigcode=0 goroutine 0 gp=0x59093efbf180 m=0 mp=0x59093efc0f40 [idle]: runtime.futex(0x59093efc1080, 0x80, 0x0, 0x0, 0x0, 0x0) runtime/sys_linux_amd64.s:557 +0x21 fp=0x7ffeef8bc1a8 sp=0x7ffeef8bc1a0 pc=0x59093d02f9c1 runtime.futexsleep(0x7ffeef8bc220?, 0x3cfc8611?, 0x59093d02f5ad?) runtime/os_linux.go:75 +0x30 fp=0x7ffeef8bc1f8 sp=0x7ffeef8bc1a8 pc=0x59093cfebd70 runtime.notesleep(0x59093efc1080) runtime/lock_futex.go:47 +0x87 fp=0x7ffeef8bc230 sp=0x7ffeef8bc1f8 pc=0x59093cfc7d27 runtime.mPark(...) runtime/proc.go:1887 runtime.stopm() runtime/proc.go:2907 +0x8c fp=0x7ffeef8bc260 sp=0x7ffeef8bc230 pc=0x59093cff75cc runtime.findRunnable() runtime/proc.go:3644 +0xd9c fp=0x7ffeef8bc3d8 sp=0x7ffeef8bc260 pc=0x59093cff909c runtime.schedule() runtime/proc.go:4017 +0xb1 fp=0x7ffeef8bc410 sp=0x7ffeef8bc3d8 pc=0x59093cffa191 runtime.park_m(0xc000003340) runtime/proc.go:4141 +0x285 fp=0x7ffeef8bc470 sp=0x7ffeef8bc410 pc=0x59093cffa645 runtime.mcall() runtime/asm_amd64.s:459 +0x50 fp=0x7ffeef8bc488 sp=0x7ffeef8bc470 pc=0x59093d02bb70 goroutine 1 gp=0xc000002380 m=nil [IO wait, 28 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00134b790 sp=0xc00134b770 pc=0x59093d025d2e runtime.netpollblock(0xc00134b7e0?, 0x3cfbf466?, 0x9?) runtime/netpoll.go:575 +0xf7 fp=0xc00134b7c8 sp=0xc00134b790 pc=0x59093cfeb057 internal/poll.runtime_pollWait(0x7e3973ec6eb0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00134b7e8 sp=0xc00134b7c8 pc=0x59093d024f45 internal/poll.(*pollDesc).wait(0xc00011f700?, 0x900fc965e?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00134b810 sp=0xc00134b7e8 pc=0x59093d0ad0c7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc00011f700) internal/poll/fd_unix.go:620 +0x295 fp=0xc00134b8b8 sp=0xc00134b810 pc=0x59093d0b2495 net.(*netFD).accept(0xc00011f700) net/fd_unix.go:172 +0x29 fp=0xc00134b970 sp=0xc00134b8b8 pc=0x59093d125549 net.(*TCPListener).accept(0xc000525ec0) net/tcpsock_posix.go:159 +0x1b fp=0xc00134b9c0 sp=0xc00134b970 pc=0x59093d13b45b net.(*TCPListener).Accept(0xc000525ec0) net/tcpsock.go:380 +0x30 fp=0xc00134b9f0 sp=0xc00134b9c0 pc=0x59093d13a310 net/http.(*onceCloseListener).Accept(0xc0000e9dd0?) <autogenerated>:1 +0x24 fp=0xc00134ba08 sp=0xc00134b9f0 pc=0x59093d3520c4 net/http.(*Server).Serve(0xc0001ef500, {0x59093e699a40, 0xc000525ec0}) net/http/server.go:3424 +0x30c fp=0xc00134bb38 sp=0xc00134ba08 pc=0x59093d32998c github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000340a0, 0x4, 0x4}) github.com/ollama/ollama/runner/ollamarunner/runner.go:1441 +0x94e fp=0xc00134bd08 sp=0xc00134bb38 pc=0x59093d591f6e github.com/ollama/ollama/runner.Execute({0xc000034080?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:28 +0x125 fp=0xc00134bd30 sp=0xc00134bd08 pc=0x59093d5bdba5 github.com/ollama/ollama/cmd.NewCLI.func3(0xc0001ef300?, {0x59093e1350e6?, 0x4?, 0x59093e1350ea?}) github.com/ollama/ollama/cmd/cmd.go:1961 +0x45 fp=0xc00134bd58 sp=0xc00134bd30 pc=0x59093dd81125 github.com/spf13/cobra.(*Command).execute(0xc000149808, {0xc000527bd0, 0x5, 0x5}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00134be78 sp=0xc00134bd58 pc=0x59093d19f4bc github.com/spf13/cobra.(*Command).ExecuteC(0xc00052a908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00134bf30 sp=0xc00134be78 pc=0x59093d19fd05 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00134bf50 sp=0xc00134bf30 pc=0x59093dd81c0d runtime.main() runtime/proc.go:283 +0x29d fp=0xc00134bfe0 sp=0xc00134bf50 pc=0x59093cff26dd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00134bfe8 sp=0xc00134bfe0 pc=0x59093d02dbc1 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle), 2 minutes]: runtime.gopark(0x1bcd18a6059f?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000064fa8 sp=0xc000064f88 pc=0x59093d025d2e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000064fe0 sp=0xc000064fa8 pc=0x59093cff2a18 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000064fe8 sp=0xc000064fe0 pc=0x59093d02dbc1 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000065780 sp=0xc000065760 pc=0x59093d025d2e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc00007e000) runtime/mgcsweep.go:316 +0xdf fp=0xc0000657c8 sp=0xc000065780 pc=0x59093cfdd1bf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000657e0 sp=0xc0000657c8 pc=0x59093cfd15a5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000657e8 sp=0xc0000657e0 pc=0x59093d02dbc1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait, 2 minutes]: runtime.gopark(0x16208a7?, 0x152d9d3?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000065f78 sp=0xc000065f58 pc=0x59093d025d2e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x59093efbe120) runtime/mgcscavenge.go:425 +0x49 fp=0xc000065fa8 sp=0xc000065f78 pc=0x59093cfdac09 runtime.bgscavenge(0xc00007e000) runtime/mgcscavenge.go:658 +0x59 fp=0xc000065fc8 sp=0xc000065fa8 pc=0x59093cfdb199 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000065fe0 sp=0xc000065fc8 pc=0x59093cfd1545 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000065fe8 sp=0xc000065fe0 pc=0x59093d02dbc1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait, 131 minutes]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000064688?) runtime/proc.go:435 +0xce fp=0xc000064630 sp=0xc000064610 pc=0x59093d025d2e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000647e0 sp=0xc000064630 pc=0x59093cfd0567 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000647e8 sp=0xc0000647e0 pc=0x59093d02dbc1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc0001cc8c0 m=nil [chan receive, 2 minutes]: runtime.gopark(0xc00021fae0?, 0xc0005083f0?, 0x60?, 0x67?, 0x59093d10c188?) runtime/proc.go:435 +0xce fp=0xc000066718 sp=0xc0000666f8 pc=0x59093d025d2e runtime.chanrecv(0xc00009c310, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000066790 sp=0xc000066718 pc=0x59093cfc2045 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc0000667b8 sp=0xc000066790 pc=0x59093cfc1bd2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc0000667e0 sp=0xc0000667b8 pc=0x59093cfd474f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000667e8 sp=0xc0000667e0 pc=0x59093d02dbc1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0001cd180 m=nil [GC worker (idle)]: runtime.gopark(0x1bcd1a505d78?, 0x1?, 0x67?, 0x85?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000066f38 sp=0xc000066f18 pc=0x59093d025d2e runtime.gcBgMarkWorker(0xc00009d730) runtime/mgc.go:1423 +0xe9 fp=0xc000066fc8 sp=0xc000066f38 pc=0x59093cfd3a69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000066fe0 sp=0xc000066fc8 pc=0x59093cfd3945 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000066fe8 sp=0xc000066fe0 pc=0x59093d02dbc1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0001cd340 m=nil [GC worker (idle), 66 minutes]: runtime.gopark(0x1832ed8786eb?, 0x3?, 0xa4?, 0x5c?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000067738 sp=0xc000067718 pc=0x59093d025d2e runtime.gcBgMarkWorker(0xc00009d730) runtime/mgc.go:1423 +0xe9 fp=0xc0000677c8 sp=0xc000067738 pc=0x59093cfd3a69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000677e0 sp=0xc0000677c8 pc=0x59093cfd3945 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000677e8 sp=0xc0000677e0 pc=0x59093d02dbc1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc0001cd500 m=nil [GC worker (idle)]: runtime.gopark(0x59093f08da60?, 0x1?, 0x91?, 0xfe?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000067f38 sp=0xc000067f18 pc=0x59093d025d2e runtime.gcBgMarkWorker(0xc00009d730) runtime/mgc.go:1423 +0xe9 fp=0xc000067fc8 sp=0xc000067f38 pc=0x59093cfd3a69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000067fe0 sp=0xc000067fc8 pc=0x59093cfd3945 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000067fe8 sp=0xc000067fe0 pc=0x59093d02dbc1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc000102380 m=nil [GC worker (idle), 37 minutes]: runtime.gopark(0x19c7179906cb?, 0x3?, 0x1b?, 0xfa?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000060738 sp=0xc000060718 pc=0x59093d025d2e runtime.gcBgMarkWorker(0xc00009d730) runtime/mgc.go:1423 +0xe9 fp=0xc0000607c8 sp=0xc000060738 pc=0x59093cfd3a69 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000607e0 sp=0xc0000607c8 pc=0x59093cfd3945 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000607e8 sp=0xc0000607e0 pc=0x59093d02dbc1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 10 gp=0xc000540700 m=10 mp=0xc00009f808 [syscall, 37 minutes]: runtime.cgocall(0x59093ddeffe5, 0xc007033318) runtime/cgocall.go:167 +0x4b fp=0xc0070332f0 sp=0xc0070332b8 pc=0x59093d0228ab github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7e395c0017b0, 0x7e35a433fd50) _cgo_gotypes.go:977 +0x4a fp=0xc007033318 sp=0xc0070332f0 pc=0x59093d4a390a github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify.func2(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(0xc0002d0180, 0x0?, {0x0, 0x0, 0xc007033518?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 +0x1b2 fp=0xc0070333f0 sp=0xc007033318 pc=0x59093d4b12d2 github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc0002d0180?, {0x0?, 0xc0002d0180?, 0x59093e6b26a0?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:811 +0x25 fp=0xc007033428 sp=0xc0070333f0 pc=0x59093d4b10e5 github.com/ollama/ollama/kvcache.(*Causal).shift(0xc0001ef600, 0x0, 0x4, 0xffffe002) github.com/ollama/ollama/kvcache/causal.go:608 +0x250 fp=0xc007033588 sp=0xc007033428 pc=0x59093d49f030 github.com/ollama/ollama/kvcache.(*Causal).Remove(0xc0001ef600, 0x0, 0x4, 0x2002) github.com/ollama/ollama/kvcache/causal.go:659 +0x285 fp=0xc007033620 sp=0xc007033588 pc=0x59093d49f6c5 github.com/ollama/ollama/kvcache.(*WrapperCache).Remove(0xc000114890?, 0x0, 0x4, 0x2002) github.com/ollama/ollama/kvcache/wrapper.go:103 +0x5e fp=0xc007033658 sp=0xc007033620 pc=0x59093d4a0b3e github.com/ollama/ollama/runner/ollamarunner.(*InputCache).ShiftCacheSlot(0xc00302c880, 0xc00053a600, 0x4) github.com/ollama/ollama/runner/ollamarunner/cache.go:290 +0x34c fp=0xc0070337f0 sp=0xc007033658 pc=0x59093d5864ec github.com/ollama/ollama/runner/ollamarunner.(*Server).forwardBatch(_, {0x110b, {0x59093e6a7670, 0xc002ffa080}, {0x59093e6b26a0, 0xc00125b410}, {0xc000232008, 0x3fc, 0x3ff}, {{0x59093e6b26a0, ...}, ...}, ...}) github.com/ollama/ollama/runner/ollamarunner/runner.go:565 +0xec5 fp=0xc007033b58 sp=0xc0070337f0 pc=0x59093d589c85 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002230e0, {0x59093e69c0a0, 0xc000527c70}) github.com/ollama/ollama/runner/ollamarunner/runner.go:452 +0x18c fp=0xc007033fb8 sp=0xc007033b58 pc=0x59093d588b6c github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:1418 +0x28 fp=0xc007033fe0 sp=0xc007033fb8 pc=0x59093d5921e8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc007033fe8 sp=0xc007033fe0 pc=0x59093d02dbc1 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:1418 +0x4c9 goroutine 9306 gp=0xc000808c40 m=nil [sync.Mutex.Lock, 28 minutes]: runtime.gopark(0x0?, 0xc001347710?, 0xfe?, 0x25?, 0xc00009c5b0?) runtime/proc.go:435 +0xce fp=0xc0013476e0 sp=0xc0013476c0 pc=0x59093d025d2e runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0xc0002231dc, 0x0, 0x3, 0x2, 0x15) runtime/sema.go:188 +0x229 fp=0xc001347748 sp=0xc0013476e0 pc=0x59093d005ca9 internal/sync.runtime_SemacquireMutex(0xc0013477c0?, 0x9f?, 0x59093e526e00?) runtime/sema.go:95 +0x25 fp=0xc001347780 sp=0xc001347748 pc=0x59093d027545 internal/sync.(*Mutex).lockSlow(0xc0002231d8) internal/sync/mutex.go:149 +0x15d fp=0xc0013477d0 sp=0xc001347780 pc=0x59093d03769d internal/sync.(*Mutex).Lock(...) internal/sync/mutex.go:70 sync.(*Mutex).Lock(...) sync/mutex.go:46 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc0002230e0, {0x59093e699c20, 0xc0001622a0}, 0xc0004963c0) github.com/ollama/ollama/runner/ollamarunner/runner.go:923 +0x66e fp=0xc001347ac0 sp=0xc0013477d0 pc=0x59093d58ccae github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x59093e699c20?, 0xc0001622a0?}, 0xc001347b40?) <autogenerated>:1 +0x36 fp=0xc001347af0 sp=0xc001347ac0 pc=0x59093d5926d6 net/http.HandlerFunc.ServeHTTP(0xc00053aa80?, {0x59093e699c20?, 0xc0001622a0?}, 0xc001347b60?) net/http/server.go:2294 +0x29 fp=0xc001347b18 sp=0xc001347af0 pc=0x59093d325fc9 net/http.(*ServeMux).ServeHTTP(0x59093cfcaa85?, {0x59093e699c20, 0xc0001622a0}, 0xc0004963c0) net/http/server.go:2822 +0x1c4 fp=0xc001347b68 sp=0xc001347b18 pc=0x59093d327ec4 net/http.serverHandler.ServeHTTP({0x59093e696110?}, {0x59093e699c20?, 0xc0001622a0?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc001347b98 sp=0xc001347b68 pc=0x59093d34594e net/http.(*conn).serve(0xc0000e9dd0, {0x59093e69c068, 0xc000218d20}) net/http/server.go:2102 +0x625 fp=0xc001347fb8 sp=0xc001347b98 pc=0x59093d3244c5 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc001347fe0 sp=0xc001347fb8 pc=0x59093d329d88 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc001347fe8 sp=0xc001347fe0 pc=0x59093d02dbc1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 9295 gp=0xc0014441c0 m=nil [sync.Mutex.Lock, 34 minutes]: runtime.gopark(0x59093efc0f40?, 0xc000e8a0c0?, 0x80?, 0x2a?, 0x59093d023839?) runtime/proc.go:435 +0xce fp=0xc00007ba88 sp=0xc00007ba68 pc=0x59093d025d2e runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0xc0002231dc, 0x0, 0x3, 0x2, 0x15) runtime/sema.go:188 +0x229 fp=0xc00007baf0 sp=0xc00007ba88 pc=0x59093d005ca9 internal/sync.runtime_SemacquireMutex(0x59093d41c4d4?, 0x68?, 0xc000e8a0c0?) runtime/sema.go:95 +0x25 fp=0xc00007bb28 sp=0xc00007baf0 pc=0x59093d027545 internal/sync.(*Mutex).lockSlow(0xc0002231d8) internal/sync/mutex.go:149 +0x15d fp=0xc00007bb78 sp=0xc00007bb28 pc=0x59093d03769d internal/sync.(*Mutex).Lock(...) internal/sync/mutex.go:70 sync.(*Mutex).Lock(...) sync/mutex.go:46 github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc0002230e0, {0x110b, {0x59093e6a7670, 0xc002ffa080}, {0x59093e6b26a0, 0xc00125b410}, {0xc000232008, 0x3fc, 0x3ff}, {{0x59093e6b26a0, ...}, ...}, ...}) github.com/ollama/ollama/runner/ollamarunner/runner.go:735 +0x972 fp=0xc00007bef0 sp=0xc00007bb78 pc=0x59093d58b292 github.com/ollama/ollama/runner/ollamarunner.(*Server).run.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x58 fp=0xc00007bfe0 sp=0xc00007bef0 pc=0x59093d588d98 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00007bfe8 sp=0xc00007bfe0 pc=0x59093d02dbc1 created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 10 github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd rax 0xca rbx 0x0 rcx 0x59093d02f9c3 rdx 0x0 rdi 0x59093efc1080 rsi 0x80 rbp 0x7ffeef8bc1e8 rsp 0x7ffeef8bc1a0 r8 0x0 r9 0x0 r10 0x0 r11 0x286 r12 0x7ffeef8bc220 r13 0x7e3970219501 r14 0x59093efbf180 r15 0x1 rip 0x59093d02f9c1 rflags 0x286 cs 0x33 fs 0x0 gs 0x0

GiteaMirror commented

2026-04-22 18:22:57 -05:00

@fmu83 commented on GitHub (Jan 17, 2026):

I tried it with smaller CTX (4096) as well with same result. Every few hours the model hang and maxes out one CPU.

@fmu83 commented on GitHub (Jan 17, 2026): I tried it with smaller CTX (4096) as well with same result. Every few hours the model hang and maxes out one CPU.

GiteaMirror commented

2026-04-22 18:22:58 -05:00

@arlaneenalra commented on GitHub (Jan 17, 2026):

Note: From what I'm seeing you'd have to push the context larger not smaller .. :( or shrink the prompt ...

So far, updating to Ubuntu 25.10 has helped with overall stability, but I'm still seeing it drop into the 1 core at 100% state. From what I've been able to tell, any time server side truncation happens, it seems to fail.

I discovered this kind of by accident because I had miss configured context size on a model with 40k context trained. I had expected the server to allow the extended context but instead it clamped to 40k and triggered this chain of logs:

Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.402Z level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:40960 KvCacheType: NumThreads:16 GPULayers:6>
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=ggml.go:482 msg="offloading 64 repeating layers to GPU"
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=ggml.go:494 msg="offloaded 65/65 layers to GPU"
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="18.4 GiB"
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:245 msg="model weights" device=CPU size="417.3 MiB"
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="10.0 GiB"
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="276.0 MiB"
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="10.0 MiB"
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:272 msg="total memory" size="29.1 GiB"
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=sched.go:526 msg="loaded runners" count=1
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=server.go:1347 msg="waiting for llama runner to start responding"
Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model"
Jan 16 23:36:14 framework ollama[456332]: time=2026-01-16T23:36:14.423Z level=INFO source=server.go:1385 msg="llama runner started in 8.78 seconds"
Jan 16 23:36:14 framework ollama[456332]: time=2026-01-16T23:36:14.477Z level=WARN source=runner.go:186 msg="truncating input prompt" limit=40960 prompt=41013 keep=4 new=40960
Jan 17 02:12:25 framework systemd[1]: Stopping ollama.service - Ollama Service...

The restart at the end is me manually restarting the Ollama Service.

I've been slowly upgrading as new versions have released and while this issue has changes slightly in character, it seems to keep happening anytime something in the server decides to truncate the context window. The only way I've been able to avoid it is picking models that have a large enough context window and making sure my num_ctx settings are large enough to avoid server side truncation.

I kind of wish there was a way to just set a global context of 128k/256k and have that active no matter the trained context of the underlying model, but that would really only be a stop gap to work around the problem not a solution to what's actually going on.

What I'm seeing now could be something else since I'm not seeing the crash after updating the host os...

@arlaneenalra commented on GitHub (Jan 17, 2026): Note: From what I'm seeing you'd have to push the context larger not smaller .. :( or shrink the prompt ... So far, updating to Ubuntu 25.10 has helped with overall stability, but I'm still seeing it drop into the 1 core at 100% state. From what I've been able to tell, any time server side truncation happens, it seems to fail. I discovered this kind of by accident because I had miss configured context size on a model with 40k context trained. I had expected the server to allow the extended context but instead it clamped to 40k and triggered this chain of logs: ``` Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.402Z level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:40960 KvCacheType: NumThreads:16 GPULayers:6> Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=ggml.go:482 msg="offloading 64 repeating layers to GPU" Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU" Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=ggml.go:494 msg="offloaded 65/65 layers to GPU" Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="18.4 GiB" Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:245 msg="model weights" device=CPU size="417.3 MiB" Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="10.0 GiB" Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="276.0 MiB" Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="10.0 MiB" Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=device.go:272 msg="total memory" size="29.1 GiB" Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=sched.go:526 msg="loaded runners" count=1 Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=server.go:1347 msg="waiting for llama runner to start responding" Jan 16 23:36:07 framework ollama[456332]: time=2026-01-16T23:36:07.403Z level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model" Jan 16 23:36:14 framework ollama[456332]: time=2026-01-16T23:36:14.423Z level=INFO source=server.go:1385 msg="llama runner started in 8.78 seconds" Jan 16 23:36:14 framework ollama[456332]: time=2026-01-16T23:36:14.477Z level=WARN source=runner.go:186 msg="truncating input prompt" limit=40960 prompt=41013 keep=4 new=40960 Jan 17 02:12:25 framework systemd[1]: Stopping ollama.service - Ollama Service... ``` The restart at the end is me manually restarting the Ollama Service. I've been slowly upgrading as new versions have released and while this issue has changes slightly in character, it seems to keep happening anytime something in the server decides to truncate the context window. The only way I've been able to avoid it is picking models that have a large enough context window and making sure my num_ctx settings are large enough to avoid server side truncation. I kind of wish there was a way to just set a global context of 128k/256k and have that active no matter the trained context of the underlying model, but that would really only be a stop gap to work around the problem not a solution to what's actually going on. What I'm seeing now could be something else since I'm not seeing the crash after updating the host os...

GiteaMirror commented

2026-04-22 18:22:58 -05:00

@fmu83 commented on GitHub (Jan 18, 2026):

Thanks, for you update. I updated all for me possible parts (Intel GPU Firmware, Vulkan, Mesa) to the last available version for Debian/ ubuntu (version numbers see my post above). For updating the kernel I'm limited, because ollama runs inside a Container in Proxmox, so I have to use the Proxmox kernel (Linux 6.17.4-2-pve).

The hang event happens after a few minutes after log message of truncation is visible. I build a watchtdog which restarts ollama after it becomes unresponsive.

Meanwhile I did a analysis on possible root causes:

When using Vulkan backend on an Intel GPU, the runner becomes permanently stuck after a request that triggers context truncation / KV cache shifting. The Ollama server remains responsive for lightweight endpoints (/api/tags, /api/ps), but all inference requests (ollama run, /api/generate, /v1/chat/completions) hang indefinitely until the service is restarted.

Environment

• Kernel: Linux 6.17.4-2-pve
• GPU: Intel B50 Pro (16 GB VRAM) (Vulkan)
• Ollama: ollama ps shows model loaded on 100% GPU
• Model: gpt-oss:20b-ctx16384 (context 16384)

Reproduction

Run model on Vulkan backend (Intel GPU).
Send a very long prompt/history that exceeds context size, e.g. context limit 16384.
Observe Ollama log warning about truncation:
truncating input prompt limit=16384 prompt=27835 keep=4 new=16384
After this, inference endpoints hang:
o ollama run gpt-oss:20b-ctx16384 → no output / never returns
o /api/generate or /v1/chat/completions → request hangs / times out
Control endpoints still work:
o /api/tags, /api/ps return immediately

Expected

• Either the request completes, or it fails cleanly with an error (timeout / “context too large” / etc.).
• Subsequent inference requests should still work (or the runner should restart automatically).

Actual

• Runner gets stuck forever. Only restarting Ollama fixes it.
• CPU shows one core pegged at 100% (busy loop / stuck state), while the model remains listed as loaded on GPU.

Stack trace (SIGQUIT)

Key parts: one goroutine stuck for a long time inside a cgo call to ggml Vulkan compute, while other goroutines wait on a mutex in completion/computeBatch.

goroutine 10 ... [syscall, 37 minutes]:
runtime.cgocall(...)
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(...)
github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(...)
github.com/ollama/ollama/kvcache.(*Causal).shift(...)
github.com/ollama/ollama/kvcache.(*Causal).Remove(...)
github.com/ollama/ollama/runner/ollamarunner.(*InputCache).ShiftCacheSlot(...)
github.com/ollama/ollama/runner/ollamarunner.(*Server).forwardBatch(...)
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(...)

goroutine ... [sync.Mutex.Lock, ...]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(...)
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(...)
(Full trace attached above.)

Hypothesis / Root cause

• The trigger appears to be context truncation / KV cache shifting (ShiftCacheSlot → Causal.shift/Remove).
• During this path, the Vulkan backend enters ggml_backend_sched_graph_compute_async(...) and never returns (likely GPU/driver/backend hang).
• Because the runner holds or requires shared locks, other requests block on mutexes, effectively hanging inference globally.

Proposed fixes / improvements

1) Add a runner-side watchdog/timeout for GPU compute
If a single compute call (or forward batch) exceeds a configured deadline:
• abort the request and return an error
• reset the backend/runner state (or terminate and restart the runner process)
Even if Vulkan/driver hangs, Ollama should recover automatically instead of staying permanently wedged.
2) Improve failure handling around async compute
• Ensure return codes/errors from ggml_backend_sched_graph_compute_async (and related functions) are always checked and propagated.
• If Vulkan returns device lost / error, force a backend reset.
3) Reduce global lock contention so one stuck compute does not block all requests
• Move compute to a worker/queue model and avoid holding global mutexes across long-running operations.
• Make completion/computeBatch resilient to a stuck compute path (e.g., request-scoped cancellation, lock-free state transitions).
4) Mitigation in the meantime (client-side)
• Avoid triggering KV-cache shift by keeping request history below num_ctx (limit chat history / summarise history / chunking).

Notes

• After the hang, /api/tags and /api/ps remain responsive, but any inference hangs indefinitely.
• Restarting ollama restores functionality.

@fmu83 commented on GitHub (Jan 18, 2026): Thanks, for you update. I updated all for me possible parts (Intel GPU Firmware, Vulkan, Mesa) to the last available version for Debian/ ubuntu (version numbers see my post above). For updating the kernel I'm limited, because ollama runs inside a Container in Proxmox, so I have to use the Proxmox kernel (Linux 6.17.4-2-pve). The hang event happens after a few minutes after log message of truncation is visible. I build a watchtdog which restarts ollama after it becomes unresponsive. Meanwhile I did a analysis on possible root causes: When using Vulkan backend on an Intel GPU, the runner becomes permanently stuck after a request that triggers context truncation / KV cache shifting. The Ollama server remains responsive for lightweight endpoints (/api/tags, /api/ps), but all inference requests (ollama run, /api/generate, /v1/chat/completions) hang indefinitely until the service is restarted. ### Environment • Kernel: Linux 6.17.4-2-pve • GPU: Intel B50 Pro (16 GB VRAM) (Vulkan) • Ollama: ollama ps shows model loaded on 100% GPU • Model: gpt-oss:20b-ctx16384 (context 16384) ### Reproduction 1. Run model on Vulkan backend (Intel GPU). 2. Send a very long prompt/history that exceeds context size, e.g. context limit 16384. 3. Observe Ollama log warning about truncation: 4. truncating input prompt limit=16384 prompt=27835 keep=4 new=16384 5. After this, inference endpoints hang: o ollama run gpt-oss:20b-ctx16384 → no output / never returns o /api/generate or /v1/chat/completions → request hangs / times out 6. Control endpoints still work: o /api/tags, /api/ps return immediately ### Expected • Either the request completes, or it fails cleanly with an error (timeout / “context too large” / etc.). • Subsequent inference requests should still work (or the runner should restart automatically). ### Actual • Runner gets stuck forever. Only restarting Ollama fixes it. • CPU shows one core pegged at 100% (busy loop / stuck state), while the model remains listed as loaded on GPU. ### Stack trace (SIGQUIT) Key parts: one goroutine stuck for a long time inside a cgo call to ggml Vulkan compute, while other goroutines wait on a mutex in completion/computeBatch. goroutine 10 ... [syscall, 37 minutes]: runtime.cgocall(...) github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(...) github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(...) github.com/ollama/ollama/kvcache.(*Causal).shift(...) github.com/ollama/ollama/kvcache.(*Causal).Remove(...) github.com/ollama/ollama/runner/ollamarunner.(*InputCache).ShiftCacheSlot(...) github.com/ollama/ollama/runner/ollamarunner.(*Server).forwardBatch(...) github.com/ollama/ollama/runner/ollamarunner.(*Server).run(...) goroutine ... [sync.Mutex.Lock, ...]: github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(...) github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(...) (Full trace attached above.) ### Hypothesis / Root cause • The trigger appears to be context truncation / KV cache shifting (ShiftCacheSlot → Causal.shift/Remove). • During this path, the Vulkan backend enters ggml_backend_sched_graph_compute_async(...) and never returns (likely GPU/driver/backend hang). • Because the runner holds or requires shared locks, other requests block on mutexes, effectively hanging inference globally. ### Proposed fixes / improvements **1) Add a runner-side watchdog/timeout for GPU compute** If a single compute call (or forward batch) exceeds a configured deadline: • abort the request and return an error • reset the backend/runner state (or terminate and restart the runner process) Even if Vulkan/driver hangs, Ollama should recover automatically instead of staying permanently wedged. **2) Improve failure handling around async compute** • Ensure return codes/errors from ggml_backend_sched_graph_compute_async (and related functions) are always checked and propagated. • If Vulkan returns device lost / error, force a backend reset. **3) Reduce global lock contention so one stuck compute does not block all requests** • Move compute to a worker/queue model and avoid holding global mutexes across long-running operations. • Make completion/computeBatch resilient to a stuck compute path (e.g., request-scoped cancellation, lock-free state transitions). **4) Mitigation in the meantime (client-side)** • Avoid triggering KV-cache shift by keeping request history below num_ctx (limit chat history / summarise history / chunking). ### Notes • After the hang, /api/tags and /api/ps remain responsive, but any inference hangs indefinitely. • Restarting ollama restores functionality.

GiteaMirror commented

2026-04-22 18:22:58 -05:00

@svenstaro commented on GitHub (Feb 2, 2026):

I think the title should be amended with something like "on Vulkan" because it doesn't appear on at least ROCm for me.

@svenstaro commented on GitHub (Feb 2, 2026): I think the title should be amended with something like "on Vulkan" because it doesn't appear on at least ROCm for me.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#34642