[GH-ISSUE #10433] Ollama 0.6.6 memory leak with different models #53370

New Issue

GiteaMirror · 2026-04-29T02:44:25-05:00

GiteaMirror commented

2026-04-29 02:44:25 -05:00

Originally created by @somera on GitHub (Apr 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10433

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

In the last weeks I see that after an llm was used, the VRAM stay used. Last week I found that the problem is deepseek-coder-v2:16b. And yes, I have a Modelfile for this:

FROM deepseek-coder-v2:16b
PARAMETER num_ctx 24576
PARAMETER num_predict 8192

The problem I see for deepseek-coder-v2:16b and the model generated from the Modelfile.

:/usr/share/ollama/.ollama/models# grep 5ff0ab */*
blobs/sha256-34488e453cfe3232810bac05c55d94a471228086fcac9e6b00ef3a671e21fa66:{"model_format":"gguf","model_family":"deepseek2","model_families":["deepseek2"],"model_type":"15.7B","file_type":"Q4_0","architecture":"amd64","os":"linux","rootfs":{"type":"layers","diff_ids":["sha256:5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046","sha256:b321cd7de6c7494351e6f0f6b4588378af4bf9cb6d2e0bba022ad81e72d9a776","sha256:4bb71764481f96d4161efc810c6185a0d0eb5a50ab7a0dedbdd283670cbcc2b5","sha256:1c8f573e830ca9b3ebfeb7ace1823146e22b66f99ee223840e7637c9e745e1c7","sha256:19f2fb9e8bc65a143f47903ec07dce010fd2873f994b900ea735a4b5022e968d"]}}

and ollana ps shows nothing.

And I saw this in the last weeks. I would say, this problem war in 0.6.5, 0.6.4 ... too.

Relevant log output

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.6.6

Originally created by @somera on GitHub (Apr 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10433 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? In the last weeks I see that after an llm was used, the VRAM stay used. Last week I found that the problem is `deepseek-coder-v2:16b`. And yes, I have a Modelfile for this: ``` FROM deepseek-coder-v2:16b PARAMETER num_ctx 24576 PARAMETER num_predict 8192 ``` The problem I see for `deepseek-coder-v2:16b` and the model generated from the Modelfile. ![Image](https://github.com/user-attachments/assets/40828b75-7c02-4b2d-ab61-9bc113fd2ce3) ``` :/usr/share/ollama/.ollama/models# grep 5ff0ab */* blobs/sha256-34488e453cfe3232810bac05c55d94a471228086fcac9e6b00ef3a671e21fa66:{"model_format":"gguf","model_family":"deepseek2","model_families":["deepseek2"],"model_type":"15.7B","file_type":"Q4_0","architecture":"amd64","os":"linux","rootfs":{"type":"layers","diff_ids":["sha256:5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046","sha256:b321cd7de6c7494351e6f0f6b4588378af4bf9cb6d2e0bba022ad81e72d9a776","sha256:4bb71764481f96d4161efc810c6185a0d0eb5a50ab7a0dedbdd283670cbcc2b5","sha256:1c8f573e830ca9b3ebfeb7ace1823146e22b66f99ee223840e7637c9e745e1c7","sha256:19f2fb9e8bc65a143f47903ec07dce010fd2873f994b900ea735a4b5022e968d"]}} ``` and `ollana ps` shows nothing. And I saw this in the last weeks. I would say, this problem war in 0.6.5, 0.6.4 ... too. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.6.6

GiteaMirror added the bug label 2026-04-29 02:44:25 -05:00

GiteaMirror closed this issue

2026-04-29 02:44:27 -05:00

GiteaMirror commented

2026-04-29 02:44:29 -05:00

@rick-github commented on GitHub (Apr 28, 2025):

These are runner processes. The most likely explanation is that the server was killed or crashed, orphaning the runners. Server logs may aid in debugging.

@rick-github commented on GitHub (Apr 28, 2025): These are runner processes. The most likely explanation is that the server was killed or crashed, orphaning the runners. [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.

GiteaMirror commented

2026-04-29 02:44:30 -05:00

@moonflash commented on GitHub (Apr 28, 2025):

I'm experiencing the same with gemma3:27b-it-qat.
Runnning:

ollama inside Docker
Linux host
one dedicated RTX-3090 per instance
at the beginning I have 21G of VRAM and less than 1G of RAM usage after an hour i have the same on VRAM but all RAM (30G) used up to 40G of used swap memory

@moonflash commented on GitHub (Apr 28, 2025): I'm experiencing the same with gemma3:27b-it-qat. Runnning: - ollama inside Docker - Linux host - one dedicated RTX-3090 per instance at the beginning I have 21G of VRAM and less than 1G of RAM usage after an hour i have the same on VRAM but all RAM (30G) used up to 40G of used swap memory

GiteaMirror commented

2026-04-29 02:44:32 -05:00

@rick-github commented on GitHub (Apr 28, 2025):

Server logs may aid in debugging.

@rick-github commented on GitHub (Apr 28, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.

GiteaMirror commented

2026-04-29 02:44:35 -05:00

@moonflash commented on GitHub (Apr 28, 2025):

time=2025-04-28T09:12:21.845Z level=DEBUG source=process_text_spm.go:184 msg="adding bos token to prompt" id=2
time=2025-04-28T09:12:22.047Z level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1034 prompt=926 used=0 remaining=926
[GIN] 2025/04/28 - 09:12:26 | 200 |  4.424865014s |      172.19.0.7 | POST     "/api/generate"
time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:409 msg="context for request finished"
time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 duration=2562047h47m16.854775807s
time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:359 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 refCount=0
time=2025-04-28T09:12:26.943Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-28T09:12:26.948Z level=DEBUG source=sched.go:577 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-04-28T09:12:26.949Z level=DEBUG source=routes.go:297 msg="generate request" images=1 prompt="<start_of_turn>user\nYou act as a web-shop expert reaponsible ...Please return response in proper JSON format [text end]<end_of_turn>\n<start_of_turn>model\n\n"

No other logs or errors 🤷‍♂

@moonflash commented on GitHub (Apr 28, 2025): ``` time=2025-04-28T09:12:21.845Z level=DEBUG source=process_text_spm.go:184 msg="adding bos token to prompt" id=2 time=2025-04-28T09:12:22.047Z level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1034 prompt=926 used=0 remaining=926 [GIN] 2025/04/28 - 09:12:26 | 200 | 4.424865014s | 172.19.0.7 | POST "/api/generate" time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:409 msg="context for request finished" time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 duration=2562047h47m16.854775807s time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:359 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 refCount=0 time=2025-04-28T09:12:26.943Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-28T09:12:26.948Z level=DEBUG source=sched.go:577 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-04-28T09:12:26.949Z level=DEBUG source=routes.go:297 msg="generate request" images=1 prompt="<start_of_turn>user\nYou act as a web-shop expert reaponsible ...Please return response in proper JSON format [text end]<end_of_turn>\n<start_of_turn>model\n\n" ``` No other logs or errors 🤷‍♂

GiteaMirror commented

2026-04-29 02:44:38 -05:00

@somera commented on GitHub (Apr 28, 2025):

These are runner processes. The most likely explanation is that the server was killed or crashed, orphaning the runners. Server logs may aid in debugging.

I extended my ollama.service with the OLLAMA_DEBUG to get more details.

@somera commented on GitHub (Apr 28, 2025): > These are runner processes. The most likely explanation is that the server was killed or crashed, orphaning the runners. [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging. I extended my ollama.service with the `OLLAMA_DEBUG` to get more details.

GiteaMirror commented

2026-04-29 02:44:42 -05:00

@rick-github commented on GitHub (Apr 28, 2025):

@moonflash Your problem looks different to the OP, open a new ticket and attach full logs.

@rick-github commented on GitHub (Apr 28, 2025): @moonflash Your problem looks different to the OP, open a new ticket and attach full logs.

GiteaMirror commented

2026-04-29 02:44:47 -05:00

@somera commented on GitHub (Apr 28, 2025):

I get same problem today with qwen2.5-coder:32b. You can see it here:

is the first line in the screenshot.

@rick-github here is the log for the qwen2.5-coder:32bissue.

ollama.zip

@somera commented on GitHub (Apr 28, 2025): I get same problem today with `qwen2.5-coder:32b`. You can see it here: ![Image](https://github.com/user-attachments/assets/b5755d18-887b-4b10-b0e7-8c9266ab1795) is the first line in the screenshot. @rick-github here is the log for the `qwen2.5-coder:32b`issue. [ollama.zip](https://github.com/user-attachments/files/19940562/ollama.zip)

GiteaMirror commented

2026-04-29 02:44:52 -05:00

@rick-github commented on GitHub (Apr 28, 2025):

What's the output of

ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)

@rick-github commented on GitHub (Apr 28, 2025): What's the output of ``` ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) ```

GiteaMirror commented

2026-04-29 02:44:57 -05:00

@somera commented on GitHub (Apr 28, 2025):

After I saw the problem, I restarted ollama service.

Now I see

# ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)
      1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve
2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385
2268027 2278642 2268027 Mon Apr 28 15:21:08 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 32 --parallel 1 --port 37655

and I'm wondering, cause I see

and

Model has ~23GB and GPU say 38GB are used.

I loaded mistral-small3.1:24b and ...

# ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)
      1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve
2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385
2268027 2280145 2268027 Mon Apr 28 15:35:15 2025 /usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc --ctx-size 4096 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 1 --port 36971

The ctx-size 32768 is unused phi4:14

manifests/registry.ollama.ai/library/phi4/latest:{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:f5d6f49c64775df1536e9d747c6b6b4c101f6a8658108fbd18a15d046575c68b","size":486},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20","size":9053114464},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:32695b892af87ef8fca6e13a1a31c67c1441d7398be037e366e2fc763857c06a","size":275},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:fa8235e5b48faca34e3ca98cf4f694ef08bd216d28b58071a1f85b1d50cb814d","size":1084},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:45a1c652dddc9efdcefa977ab81cfbe26b6e52bc8e78f2f4c698538783e0ac80","size":82}]}

model.

Looks like the problem occurs when ctx-size is high.

@somera commented on GitHub (Apr 28, 2025): After I saw the problem, I restarted ollama service. Now I see ``` # ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) 1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve 2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385 2268027 2278642 2268027 Mon Apr 28 15:21:08 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 32 --parallel 1 --port 37655 ``` and I'm wondering, cause I see ![Image](https://github.com/user-attachments/assets/31eea523-2599-450d-8adc-ba94de3e03e3) and ![Image](https://github.com/user-attachments/assets/504f6b66-2de4-4525-8c05-15a7da5f53e3) Model has ~23GB and GPU say 38GB are used. I loaded `mistral-small3.1:24b` and ... ``` # ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) 1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve 2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385 2268027 2280145 2268027 Mon Apr 28 15:35:15 2025 /usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc --ctx-size 4096 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 1 --port 36971 ``` The `ctx-size 32768 `is unused `phi4:14` ``` manifests/registry.ollama.ai/library/phi4/latest:{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:f5d6f49c64775df1536e9d747c6b6b4c101f6a8658108fbd18a15d046575c68b","size":486},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20","size":9053114464},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:32695b892af87ef8fca6e13a1a31c67c1441d7398be037e366e2fc763857c06a","size":275},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:fa8235e5b48faca34e3ca98cf4f694ef08bd216d28b58071a1f85b1d50cb814d","size":1084},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:45a1c652dddc9efdcefa977ab81cfbe26b6e52bc8e78f2f4c698538783e0ac80","size":82}]} ``` model. Looks like the problem occurs when ctx-size is high.

GiteaMirror commented

2026-04-29 02:44:57 -05:00

@somera commented on GitHub (Apr 28, 2025):

Now ollama ps show nothing. But

# ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)
      1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve
2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385

and VRAM is used:

There is a memory leak which occurs with different models on our setup.

@somera commented on GitHub (Apr 28, 2025): Now `ollama ps` show nothing. But ``` # ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) 1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve 2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385 ``` and VRAM is used: ![Image](https://github.com/user-attachments/assets/73f7535b-d3bc-4ae3-964a-78a331451f0e) There is a memory leak which occurs with different models on our setup.

GiteaMirror commented

2026-04-29 02:44:58 -05:00

@rick-github commented on GitHub (Apr 28, 2025):

What's in the log now?

@rick-github commented on GitHub (Apr 28, 2025): What's in the log now?

GiteaMirror commented

2026-04-29 02:44:59 -05:00

@somera commented on GitHub (Apr 28, 2025):

Her the logs from 13:00:00+

ollama2.zip

@somera commented on GitHub (Apr 28, 2025): Her the logs from 13:00:00+ [ollama2.zip](https://github.com/user-attachments/files/19941230/ollama2.zip)

GiteaMirror commented

2026-04-29 02:45:00 -05:00

@somera commented on GitHub (Apr 28, 2025):

I didn't restart ollama service at the moment. If you need more input.

@somera commented on GitHub (Apr 28, 2025): I didn't restart ollama service at the moment. If you need more input.

GiteaMirror commented

2026-04-29 02:45:01 -05:00

@somera commented on GitHub (Apr 28, 2025):

Currently I saw the memory leak with:

deepseek-coder-v2:16b (the original model and new one created from Modelfile)
qwen2.5-coder:32b
phi4:14b

@somera commented on GitHub (Apr 28, 2025): Currently I saw the memory leak with: - deepseek-coder-v2:16b (the original model and new one created from Modelfile) - qwen2.5-coder:32b - phi4:14b

GiteaMirror commented

2026-04-29 02:45:02 -05:00

@rick-github commented on GitHub (Apr 28, 2025):

Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:24.544+02:00 level=INFO source=server.go:405
 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20
 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385"

ollama started a runner for phi4:14b at 14:53:24.

Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:24.639+02:00 level=INFO source=runner.go:913
 msg="Server listening on 127.0.0.1:46385"

Runner ready and ready to load model.

Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: load_tensors: layer  40 assigned to device CUDA0, is_swa = 0

Runner finished layer assignment and starting VRAM writes.

Apr 28 14:53:25 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:25.656+02:00 level=DEBUG source=sched.go:386 
msg="sending an unloaded event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20

Runner told to unload model.

Apr 28 14:53:25 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:25.803+02:00 level=DEBUG source=server.go:625
 msg="model load progress 0.28"

Runner continues to load model.

Apr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.809+02:00 level=INFO source=server.go:619
 msg="llama runner started in 2.26 seconds"

Model loaded.

pr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.809+02:00 level=DEBUG source=routes.go:297
 msg="generate request" images=0 prompt="<fim_prefix>// Path: AI_testsuite\n//

Runner processes request.

Apr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.905+02:00 level=ERROR source=sched.go:327
 msg="finished request signal received after model unloaded" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20

ollama notices that an inflight request was finished even though the model was unloaded.

So the model was unloaded in the middle of processing a request. The runner then basically forgot it was supposed to stop, while the ollama server, having told the model to stop, discarded all state that it had. So the runner is orphaned, waiting around until something kills it.

It's not clear why the runner got an unload event. Do you set keep_alive in your requests?

@rick-github commented on GitHub (Apr 28, 2025): ``` Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:24.544+02:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385" ``` ollama started a runner for phi4:14b at 14:53:24. ``` Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:24.639+02:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:46385" ``` Runner ready and ready to load model. ``` Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: load_tensors: layer 40 assigned to device CUDA0, is_swa = 0 ``` Runner finished layer assignment and starting VRAM writes. ``` Apr 28 14:53:25 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:25.656+02:00 level=DEBUG source=sched.go:386 msg="sending an unloaded event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 ``` Runner told to unload model. ``` Apr 28 14:53:25 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:25.803+02:00 level=DEBUG source=server.go:625 msg="model load progress 0.28" ``` Runner continues to load model. ``` Apr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.809+02:00 level=INFO source=server.go:619 msg="llama runner started in 2.26 seconds" ``` Model loaded. ``` pr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.809+02:00 level=DEBUG source=routes.go:297 msg="generate request" images=0 prompt="<fim_prefix>// Path: AI_testsuite\n// ``` Runner processes request. ``` Apr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.905+02:00 level=ERROR source=sched.go:327 msg="finished request signal received after model unloaded" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 ``` ollama notices that an inflight request was finished even though the model was unloaded. So the model was unloaded in the middle of processing a request. The runner then basically forgot it was supposed to stop, while the ollama server, having told the model to stop, discarded all state that it had. So the runner is orphaned, waiting around until something kills it. It's not clear why the runner got an unload event. Do you set `keep_alive` in your requests?

GiteaMirror commented

2026-04-29 02:45:03 -05:00

@rick-github commented on GitHub (Apr 28, 2025):

Hmm, earlier we see:

pr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.559+02:00 level=INFO source=server.go:405
 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20
 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 38573"
Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.652+02:00 level=DEBUG source=server.go:569
 msg="server unhealthy" error="health resp: Get \"http://127.0.0.1:38573/health\":
 dial tcp 127.0.0.1:38573: connect: connection refused"
Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.652+02:00 level=DEBUG source=sched.go:285
 msg="resetting model to expire immediately to make room" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=0
Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.683+02:00 level=INFO source=runner.go:913
 msg="Server listening on 127.0.0.1:38573"

So what I think is happening that a runner is started, fails a health check but is ready 31ms later, and is marked for unload. The runner continues loading the model and processing the request, but the ollama server has committed to unloading it, so we get to the state where the runner is running and waiting do to completions, but the ollama server has forgotten about it.

@rick-github commented on GitHub (Apr 28, 2025): Hmm, earlier we see: ``` pr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.559+02:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 38573" Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.652+02:00 level=DEBUG source=server.go:569 msg="server unhealthy" error="health resp: Get \"http://127.0.0.1:38573/health\": dial tcp 127.0.0.1:38573: connect: connection refused" Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.652+02:00 level=DEBUG source=sched.go:285 msg="resetting model to expire immediately to make room" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=0 Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.683+02:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:38573" ``` So what I think is happening that a runner is started, fails a health check but is ready 31ms later, and is marked for unload. The runner continues loading the model and processing the request, but the ollama server has committed to unloading it, so we get to the state where the runner is running and waiting do to completions, but the ollama server has forgotten about it.

GiteaMirror commented

2026-04-29 02:45:03 -05:00

@somera commented on GitHub (Apr 28, 2025):

It's not clear why the runner got an unload event. Do you set keep_alive in your requests?

I'm not the only one, who is using it. But as I know, keep_alive will be not used.

But we use nginx as reverse proxy for ollama:

server {
    listen 7777;
    server_name ollama.internal-domain.de;

    location / {
        # Proxy the request to the ollama service
        proxy_pass http://xxx.xxx.xxx.xxx:11434;
        proxy_set_header Host $host;

        proxy_connect_timeout       600;
        proxy_send_timeout          600;
        proxy_read_timeout          600;
        send_timeout                600;

        proxy_hide_header Access-Control-Allow-Origin;
        proxy_hide_header Access-Control-Allow-Methods;
        proxy_hide_header Access-Control-Allow-Headers;
        proxy_hide_header Access-Control-Expose-Headers;

        # (Optional) Disable proxy buffering for better streaming response from models
        proxy_buffering off;
    }
}

@somera commented on GitHub (Apr 28, 2025): > It's not clear why the runner got an unload event. Do you set `keep_alive` in your requests? I'm not the only one, who is using it. But as I know, `keep_alive` will be not used. But we use nginx as reverse proxy for ollama: ``` server { listen 7777; server_name ollama.internal-domain.de; location / { # Proxy the request to the ollama service proxy_pass http://xxx.xxx.xxx.xxx:11434; proxy_set_header Host $host; proxy_connect_timeout 600; proxy_send_timeout 600; proxy_read_timeout 600; send_timeout 600; proxy_hide_header Access-Control-Allow-Origin; proxy_hide_header Access-Control-Allow-Methods; proxy_hide_header Access-Control-Allow-Headers; proxy_hide_header Access-Control-Expose-Headers; # (Optional) Disable proxy buffering for better streaming response from models proxy_buffering off; } } ```

GiteaMirror commented

2026-04-29 02:45:06 -05:00

@somera commented on GitHub (Apr 28, 2025):

I've been noticing the VRAM usage issue since March, but I haven't had time for a deep analysis and was convinced the problem was Gemma 3.

@somera commented on GitHub (Apr 28, 2025): I've been noticing the VRAM usage issue since March, but I haven't had time for a deep analysis and was convinced the problem was Gemma 3.

GiteaMirror commented

2026-04-29 02:45:08 -05:00

@somera commented on GitHub (Apr 28, 2025):

So what I think is happening that a runner is started, fails a health check but is ready 31ms later, and is marked for unload. The runner continues loading the model and processing the request, but the ollama server has committed to unloading it, so we get to the state where the runner is running and waiting do to completions, but the ollama server has forgotten about it.

Is this a feature or bug? ;)

Normally, all models available in our setup are preloaded into RAM for better performance.

@somera commented on GitHub (Apr 28, 2025): > So what I think is happening that a runner is started, fails a health check but is ready 31ms later, and is marked for unload. The runner continues loading the model and processing the request, but the ollama server has committed to unloading it, so we get to the state where the runner is running and waiting do to completions, but the ollama server has forgotten about it. Is this a feature or bug? ;) Normally, all models available in our setup are preloaded into RAM for better performance.

GiteaMirror commented

2026-04-29 02:45:10 -05:00

@rick-github commented on GitHub (Apr 28, 2025):

convinced the problem was gemma3.

gemma3 did have a memory leak problem, resolved in 0.6.6. This issue is not a classical memory leak.

Is this a feature or bug? ;)

This is a bug. It looks like a race condition, I'm still trying to replicate the problem on my setup. I think multiple queued requests are somehow triggering needsReload() before the model has finished loading, causing the simultaneous load and unload actions.

@rick-github commented on GitHub (Apr 28, 2025): > convinced the problem was gemma3. gemma3 did have a memory leak problem, resolved in 0.6.6. This issue is not a classical memory leak. > Is this a feature or bug? ;) This is a bug. It looks like a race condition, I'm still trying to replicate the problem on my setup. I think multiple queued requests are somehow triggering `needsReload()` before the model has finished loading, causing the simultaneous load and unload actions.

GiteaMirror commented

2026-04-29 02:45:10 -05:00

@somera commented on GitHub (Apr 28, 2025):

The error occurs when others access Ollama from VS Code using Continue. I am currently trying to reproduce it with a Python script that performs API calls.

    def generate_code(self, prompt: str, params: GenerationParams) -> str:
        """Generate code using Ollama API with streaming and cancellation support.

        Args:
            prompt: Input prompt for code generation.
            params: Generation parameters.

        Returns:
            str: Generated code.

        Raises:
            ValueError: If parameters are invalid.
            Exception: If generation fails or is stopped.
        """
        is_valid, message = params.validate()
        if not is_valid:
            raise ValueError(message)

        options = {
            'temperature': params.temperature,
            'top_k': params.top_k,
            'top_p': params.top_p,
            'repeat_penalty': params.repeat_penalty,
            'num_predict': params.num_predict,
            'num_ctx': 32768,
        }
        if params.seed is not None:
            options['seed'] = params.seed

        url = f"{self.host}/api/chat"
        payload = {
            'model': params.model,
            'messages': [{'role': 'user', 'content': prompt}],
            'options': options,
            'stream': True  # We need streaming to implement cancellation
        }

        try:
            response = self._session.post(
                url,
                json=payload,
                stream=True,
                timeout=120
            )
            response.raise_for_status()

            full_response = []
            for line in response.iter_lines():
                if self._stop_flag:
                    raise Exception("Generation stopped by user")

                if line:
                    decoded_line = line.decode('utf-8')
                    if decoded_line.strip():
                        data = json.loads(decoded_line)
                        if 'message' in data and 'content' in data['message']:
                            full_response.append(data['message']['content'])

            return ''.join(full_response)
        except requests.exceptions.RequestException as e:
            raise Exception(f"Generation failed: {str(e)}")

Although I set num_ctx=32768 and I am sending only a small prompt. This means the behavior on the GPU is different because it is not heavily loaded. I might try sending a longer prompt, but I will need to adjust my script for that.

@somera commented on GitHub (Apr 28, 2025): The error occurs when others access Ollama from VS Code using Continue. I am currently trying to reproduce it with a Python script that performs API calls. ``` def generate_code(self, prompt: str, params: GenerationParams) -> str: """Generate code using Ollama API with streaming and cancellation support. Args: prompt: Input prompt for code generation. params: Generation parameters. Returns: str: Generated code. Raises: ValueError: If parameters are invalid. Exception: If generation fails or is stopped. """ is_valid, message = params.validate() if not is_valid: raise ValueError(message) options = { 'temperature': params.temperature, 'top_k': params.top_k, 'top_p': params.top_p, 'repeat_penalty': params.repeat_penalty, 'num_predict': params.num_predict, 'num_ctx': 32768, } if params.seed is not None: options['seed'] = params.seed url = f"{self.host}/api/chat" payload = { 'model': params.model, 'messages': [{'role': 'user', 'content': prompt}], 'options': options, 'stream': True # We need streaming to implement cancellation } try: response = self._session.post( url, json=payload, stream=True, timeout=120 ) response.raise_for_status() full_response = [] for line in response.iter_lines(): if self._stop_flag: raise Exception("Generation stopped by user") if line: decoded_line = line.decode('utf-8') if decoded_line.strip(): data = json.loads(decoded_line) if 'message' in data and 'content' in data['message']: full_response.append(data['message']['content']) return ''.join(full_response) except requests.exceptions.RequestException as e: raise Exception(f"Generation failed: {str(e)}") ``` Although I set `num_ctx=32768` and I am sending only a small prompt. This means the behavior on the GPU is different because it is not heavily loaded. I might try sending a longer prompt, but I will need to adjust my script for that.

GiteaMirror commented

2026-04-29 02:45:11 -05:00

@somera commented on GitHub (Apr 29, 2025):

I can't reproduce this with my python script.

And today the same propblem with phi4:14b and lover ctx-size

$ ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)
      1 2287965 2287965 Mon Apr 28 16:38:57 2025 /usr/local/bin/ollama serve
2287965 2424183 2287965 Tue Apr 29 08:40:16 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 1 --port 45425

@somera commented on GitHub (Apr 29, 2025): I can't reproduce this with my python script. And today the same propblem with `phi4:14b` and lover `ctx-size` ``` $ ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) 1 2287965 2287965 Mon Apr 28 16:38:57 2025 /usr/local/bin/ollama serve 2287965 2424183 2287965 Tue Apr 29 08:40:16 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 1 --port 45425 ```

GiteaMirror commented

2026-04-29 02:45:12 -05:00

@somera commented on GitHub (Apr 29, 2025):

@rick-github and I found this

$ sudo journalctl -u ollama --since='2025-04-28 10:00:00' | grep "expired event with positive ref count, retrying" | wc -l
1220404

too. In 26,5 hours 1.220.404 entries.

Apr 29 08:47:40 AI-DEV-VM ollama[2287965]: time=2025-04-29T08:47:40.506+02:00 level=DEBUG source=sched.go:365 msg="expired event with positive ref count, retrying" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=1
Apr 29 08:47:40 AI-DEV-VM ollama[2287965]: time=2025-04-29T08:47:40.517+02:00 level=DEBUG source=sched.go:365 msg="expired event with positive ref count, retrying" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=1

@somera commented on GitHub (Apr 29, 2025): @rick-github and I found this ``` $ sudo journalctl -u ollama --since='2025-04-28 10:00:00' | grep "expired event with positive ref count, retrying" | wc -l 1220404 ``` too. In 26,5 hours 1.220.404 entries. ``` Apr 29 08:47:40 AI-DEV-VM ollama[2287965]: time=2025-04-29T08:47:40.506+02:00 level=DEBUG source=sched.go:365 msg="expired event with positive ref count, retrying" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=1 Apr 29 08:47:40 AI-DEV-VM ollama[2287965]: time=2025-04-29T08:47:40.517+02:00 level=DEBUG source=sched.go:365 msg="expired event with positive ref count, retrying" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=1 ```

GiteaMirror commented

2026-04-29 02:45:12 -05:00

@somera commented on GitHub (Apr 29, 2025):

It's not clear why the runner got an unload event. Do you set keep_alive in your requests?

No one set keep_alive.

We are using Open WebUI. And there it's possible to send same prompt to more than one model. I coudn't here reproduce the problem.

I this the problem occurs when someone is using VS Code with the Continue Plugin.

@somera commented on GitHub (Apr 29, 2025): > It's not clear why the runner got an unload event. Do you set `keep_alive` in your requests? No one set `keep_alive`. We are using Open WebUI. And there it's possible to send same prompt to more than one model. I coudn't here reproduce the problem. I this the problem occurs when someone is using VS Code with the Continue Plugin.

GiteaMirror commented

2026-04-29 02:45:12 -05:00

@somera commented on GitHub (Apr 29, 2025):

I have now a bash script which detect the problem:

Here some details ...

=== Ollama Process Status ===
Active models (ollama ps)          : 1
Runner processes detected          : 4

=== Detailed Process Information ===
[ollama ps output]
NAME                              ID              SIZE     PROCESSOR    UNTIL
deepseek-coder-v2-fixed:latest    18245823b634    15 GB    100% CPU     24 minutes from now

[running processes]
      1 2446603 2446603 Tue Apr 29 13:00:07 2025 /usr/local/bin/ollama serve
2446603 2464527 2446603 Tue Apr 29 15:24:18 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 98304 --batch-size 512 --n-gpu-layers 28 --verbose --threads 32 --parallel 4 --port 44237
2446603 2465369 2446603 Tue Apr 29 15:25:07 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 9 --verbose --threads 32 --parallel 1 --port 42327
2446603 2477310 2446603 Tue Apr 29 16:43:17 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --verbose --threads 32 --no-mmap --parallel 1 --port 36139
2446603 2477448 2446603 Tue Apr 29 16:43:38 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --verbose --threads 32 --no-mmap --parallel 1 --port 40271

I'm wondering about the 4 ollama runners.

I restarted ollama. And later I saw ollama runner with ctx-size=98304

=== Ollama Process Status ===
Active models (ollama ps)          : 1
Runner processes detected          : 1

=== Detailed Process Information ===
[ollama ps output]
NAME                              ID              SIZE     PROCESSOR    UNTIL
deepseek-coder-v2-fixed:latest    18245823b634    40 GB    100% GPU     26 minutes from now

[running processes]
      1 2584727 2584727 Tue Apr 29 17:22:33 2025 /usr/local/bin/ollama serve
2584727 2589143 2584727 Tue Apr 29 18:00:48 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 98304 --batch-size 512 --n-gpu-layers 28 --verbose --threads 32 --parallel 4 --port 39637

✓ System Normal
Active models match running processes

I made an Ollana API Call with ctx-size=8192. Ollama used the runner with -ctx-size 98304

I don't know how, but I thing the initial problem occurs when someone is working with VS Code and Continue plugin and is using code completition.

@somera commented on GitHub (Apr 29, 2025): I have now a bash script which detect the problem: ![Image](https://github.com/user-attachments/assets/7fed6990-25e9-4293-b48c-a50cc4a7a121) Here some details ... ``` === Ollama Process Status === Active models (ollama ps) : 1 Runner processes detected : 4 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2-fixed:latest 18245823b634 15 GB 100% CPU 24 minutes from now [running processes] 1 2446603 2446603 Tue Apr 29 13:00:07 2025 /usr/local/bin/ollama serve 2446603 2464527 2446603 Tue Apr 29 15:24:18 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 98304 --batch-size 512 --n-gpu-layers 28 --verbose --threads 32 --parallel 4 --port 44237 2446603 2465369 2446603 Tue Apr 29 15:25:07 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 9 --verbose --threads 32 --parallel 1 --port 42327 2446603 2477310 2446603 Tue Apr 29 16:43:17 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --verbose --threads 32 --no-mmap --parallel 1 --port 36139 2446603 2477448 2446603 Tue Apr 29 16:43:38 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --verbose --threads 32 --no-mmap --parallel 1 --port 40271 ``` I'm wondering about the 4 ollama runners. I restarted ollama. And later I saw ollama runner with ctx-size=98304 ``` === Ollama Process Status === Active models (ollama ps) : 1 Runner processes detected : 1 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2-fixed:latest 18245823b634 40 GB 100% GPU 26 minutes from now [running processes] 1 2584727 2584727 Tue Apr 29 17:22:33 2025 /usr/local/bin/ollama serve 2584727 2589143 2584727 Tue Apr 29 18:00:48 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 98304 --batch-size 512 --n-gpu-layers 28 --verbose --threads 32 --parallel 4 --port 39637 ✓ System Normal Active models match running processes ``` I made an Ollana API Call with `ctx-size=8192`. Ollama used the runner with `-ctx-size 98304` I don't know how, but I thing the initial problem occurs when someone is working with VS Code and Continue plugin and is using code completition.

GiteaMirror commented

2026-04-29 02:45:14 -05:00

@dhiltgen commented on GitHub (May 3, 2025):

I fixed a race condition bug in the scheduler in 0.6.7 which changed some of the startup logic. I'm curious if the behavior is any better in 0.6.7?

@dhiltgen commented on GitHub (May 3, 2025): I fixed a race condition bug in the scheduler in 0.6.7 which changed some of the startup logic. I'm curious if the behavior is any better in 0.6.7?

GiteaMirror commented

2026-04-29 02:45:15 -05:00

@somera commented on GitHub (May 3, 2025):

@dhiltgen I'll install the new version on Monday and then I'll keep an eye on it.

@somera commented on GitHub (May 3, 2025): @dhiltgen I'll install the new version on Monday and then I'll keep an eye on it.

GiteaMirror commented

2026-04-29 02:45:16 -05:00

@somera commented on GitHub (May 4, 2025):

@dhiltgen I see no changes.

$ ./ollama_detect_problems_v8.sh

=== Ollama Process Status ===
Active models (ollama ps)          : 1
Runner processes detected          : 2

=== Detailed Process Information ===
[ollama ps output]
NAME                              ID              SIZE     PROCESSOR    UNTIL
deepseek-coder-v2-fixed:latest    18245823b634    17 GB    100% GPU     29 minutes from now

[running processes]
      1    1043    1043 Sat May  3 12:59:00 2025 /usr/local/bin/ollama serve
   1043 2193215    1043 Sat May  3 19:07:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 42361
   1043  251497    1043 Sun May  4 11:36:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 37181

🛑 PROBLEM DETECTED
Found 2 runner processes but only 1 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
   1043 2193215    1043 Sat May  3 19:07:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 42361

Do you want to restart Ollama service to clean up? (y/N)

@somera commented on GitHub (May 4, 2025): @dhiltgen I see no changes. ``` $ ./ollama_detect_problems_v8.sh === Ollama Process Status === Active models (ollama ps) : 1 Runner processes detected : 2 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2-fixed:latest 18245823b634 17 GB 100% GPU 29 minutes from now [running processes] 1 1043 1043 Sat May 3 12:59:00 2025 /usr/local/bin/ollama serve 1043 2193215 1043 Sat May 3 19:07:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 42361 1043 251497 1043 Sun May 4 11:36:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 37181 🛑 PROBLEM DETECTED Found 2 runner processes but only 1 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 1043 2193215 1043 Sat May 3 19:07:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 42361 Do you want to restart Ollama service to clean up? (y/N) ``` ![Image](https://github.com/user-attachments/assets/cba5b5b5-c5cc-4698-ab79-a9724a77de5a)

GiteaMirror commented

2026-04-29 02:45:16 -05:00

@rick-github commented on GitHub (May 4, 2025):

Can you post your shell script?

@rick-github commented on GitHub (May 4, 2025): Can you post your shell script?

GiteaMirror commented

2026-04-29 02:45:17 -05:00

@rick-github commented on GitHub (May 4, 2025):

Oh, wait, it just detects the problem but doesn't trigger it?

@rick-github commented on GitHub (May 4, 2025): Oh, wait, it just detects the problem but doesn't trigger it?

GiteaMirror commented

2026-04-29 02:45:19 -05:00

@somera commented on GitHub (May 4, 2025):

@rick-github yes, this is a bash script, which detect the problem. Make it easier for me.

#!/bin/bash

# Color codes for better output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Check if ollama is installed
if ! command -v ollama &> /dev/null; then
    echo -e "${RED}✗ Error: Ollama is not installed or not in PATH${NC}"
    echo -e "Please install Ollama first: ${BLUE}https://ollama.ai${NC}"
    exit 1
fi

# Check if ollama is running
if ! pidof ollama > /dev/null; then
    echo -e "${BLUE}ℹ Ollama is not currently running${NC}"
    exit 0
fi

# Get status information
ollama_ps_output=$(ollama ps)
ollama_processes=$(ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama))
active_models=$(echo "$ollama_ps_output" | grep -v "^NAME" | grep -v "^$" | wc -l)
runner_processes=$(echo "$ollama_processes" | grep -c "ollama runner")

# Display status header
echo -e "\n=== ${BLUE}Ollama Process Status${NC} ==="
printf "%-35s: %d\n" "Active models (ollama ps)" "$active_models"
printf "%-35s: %d\n" "Runner processes detected" "$runner_processes"

# Only show details if not in normal state
if [ $runner_processes -ne $active_models ] || [ $active_models -gt 0 ]; then
    echo -e "\n=== ${YELLOW}Detailed Process Information${NC} ==="
    echo -e "[${YELLOW}ollama ps output${NC}]"
    echo "$ollama_ps_output"
    echo -e "\n[${YELLOW}running processes${NC}]"
    echo "$ollama_processes"
fi

# Evaluate the situation
if [ $runner_processes -gt $active_models ]; then
    echo -e "\n${RED}🛑 PROBLEM DETECTED${NC}"
    echo -e "Found ${RED}$runner_processes${NC} runner processes but only ${GREEN}$active_models${NC} active models!"
    echo -e "This indicates unloaded models still occupying resources."
    
    oldest_runner=$(echo "$ollama_processes" | grep "ollama runner" | head -n 1)
    echo -e "\n${YELLOW}Oldest runner process (zombie candidate):${NC}"
    echo "$oldest_runner"
    
    read -p $'\nDo you want to restart Ollama service to clean up? (y/N) ' -n 1 -r
    echo ""
    if [[ $REPLY =~ ^[Yy]$ ]]; then
        # Extract process start time and calculate log start time (30 mins earlier)
        proc_start=$(echo "$oldest_runner" | awk '{print $4" "$5" "$6" "$7" "$8}')
        proc_start_epoch=$(date -d "$proc_start" +%s)
        log_start_epoch=$((proc_start_epoch - 1800))
        log_start=$(date -d "@$log_start_epoch" "+%Y-%m-%d %H:%M:%S")
        
        LOG_FILE="ollama-$(date +'%Y-%m-%d_%H-%M-%S').log"
        echo -e "\n${BLUE}Restarting Ollama service...${NC}"
        
        sudo systemctl restart ollama
        
        if ! systemctl is-active --quiet ollama; then
            echo -e "${RED}✗ Error: Ollama service failed to restart!${NC}"
            exit 1
        fi
        
        echo -e "Capturing logs from ${YELLOW}$log_start${NC}..."
        sudo journalctl -u ollama --since="$log_start" > "$LOG_FILE"
        gzip "$LOG_FILE"
        COMPRESSED_FILE="${LOG_FILE}.gz"
        
        echo -e "\n${GREEN}✓ Service successfully restarted${NC}"
        echo -e "Logs saved to: ${YELLOW}${COMPRESSED_FILE}${NC}"
        
        read -p $'\nView logs now? (y/N) ' -n 1 -r
        echo ""
        [[ $REPLY =~ ^[Yy]$ ]] && zless "$COMPRESSED_FILE"
        
        read -p $'\nKeep log file? (Y/n) ' -n 1 -r  # Fixed -R to -r here
        echo ""
        [[ $REPLY =~ ^[Nn]$ ]] && rm "$COMPRESSED_FILE" && echo -e "${YELLOW}Log file deleted${NC}"
        
    else
        echo -e "${YELLOW}No action taken${NC}"
    fi
elif [ $runner_processes -eq 0 ] && [ $active_models -eq 0 ]; then
    echo -e "\n${GREEN}✓ System Normal${NC}"
    echo -e "${BLUE}Only main Ollama serve process running (expected state)${NC}"
elif [ $runner_processes -eq $active_models ]; then
    echo -e "\n${GREEN}✓ System Normal${NC}"
    echo -e "${BLUE}Active models match running processes${NC}"
fi

echo -e "\n${GREEN}Script completed${NC}"

@somera commented on GitHub (May 4, 2025): @rick-github yes, this is a bash script, which detect the problem. Make it easier for me. ``` #!/bin/bash # Color codes for better output RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' # No Color # Check if ollama is installed if ! command -v ollama &> /dev/null; then echo -e "${RED}✗ Error: Ollama is not installed or not in PATH${NC}" echo -e "Please install Ollama first: ${BLUE}https://ollama.ai${NC}" exit 1 fi # Check if ollama is running if ! pidof ollama > /dev/null; then echo -e "${BLUE}ℹ Ollama is not currently running${NC}" exit 0 fi # Get status information ollama_ps_output=$(ollama ps) ollama_processes=$(ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)) active_models=$(echo "$ollama_ps_output" | grep -v "^NAME" | grep -v "^$" | wc -l) runner_processes=$(echo "$ollama_processes" | grep -c "ollama runner") # Display status header echo -e "\n=== ${BLUE}Ollama Process Status${NC} ===" printf "%-35s: %d\n" "Active models (ollama ps)" "$active_models" printf "%-35s: %d\n" "Runner processes detected" "$runner_processes" # Only show details if not in normal state if [ $runner_processes -ne $active_models ] || [ $active_models -gt 0 ]; then echo -e "\n=== ${YELLOW}Detailed Process Information${NC} ===" echo -e "[${YELLOW}ollama ps output${NC}]" echo "$ollama_ps_output" echo -e "\n[${YELLOW}running processes${NC}]" echo "$ollama_processes" fi # Evaluate the situation if [ $runner_processes -gt $active_models ]; then echo -e "\n${RED}🛑 PROBLEM DETECTED${NC}" echo -e "Found ${RED}$runner_processes${NC} runner processes but only ${GREEN}$active_models${NC} active models!" echo -e "This indicates unloaded models still occupying resources." oldest_runner=$(echo "$ollama_processes" | grep "ollama runner" | head -n 1) echo -e "\n${YELLOW}Oldest runner process (zombie candidate):${NC}" echo "$oldest_runner" read -p $'\nDo you want to restart Ollama service to clean up? (y/N) ' -n 1 -r echo "" if [[ $REPLY =~ ^[Yy]$ ]]; then # Extract process start time and calculate log start time (30 mins earlier) proc_start=$(echo "$oldest_runner" | awk '{print $4" "$5" "$6" "$7" "$8}') proc_start_epoch=$(date -d "$proc_start" +%s) log_start_epoch=$((proc_start_epoch - 1800)) log_start=$(date -d "@$log_start_epoch" "+%Y-%m-%d %H:%M:%S") LOG_FILE="ollama-$(date +'%Y-%m-%d_%H-%M-%S').log" echo -e "\n${BLUE}Restarting Ollama service...${NC}" sudo systemctl restart ollama if ! systemctl is-active --quiet ollama; then echo -e "${RED}✗ Error: Ollama service failed to restart!${NC}" exit 1 fi echo -e "Capturing logs from ${YELLOW}$log_start${NC}..." sudo journalctl -u ollama --since="$log_start" > "$LOG_FILE" gzip "$LOG_FILE" COMPRESSED_FILE="${LOG_FILE}.gz" echo -e "\n${GREEN}✓ Service successfully restarted${NC}" echo -e "Logs saved to: ${YELLOW}${COMPRESSED_FILE}${NC}" read -p $'\nView logs now? (y/N) ' -n 1 -r echo "" [[ $REPLY =~ ^[Yy]$ ]] && zless "$COMPRESSED_FILE" read -p $'\nKeep log file? (Y/n) ' -n 1 -r # Fixed -R to -r here echo "" [[ $REPLY =~ ^[Nn]$ ]] && rm "$COMPRESSED_FILE" && echo -e "${YELLOW}Log file deleted${NC}" else echo -e "${YELLOW}No action taken${NC}" fi elif [ $runner_processes -eq 0 ] && [ $active_models -eq 0 ]; then echo -e "\n${GREEN}✓ System Normal${NC}" echo -e "${BLUE}Only main Ollama serve process running (expected state)${NC}" elif [ $runner_processes -eq $active_models ]; then echo -e "\n${GREEN}✓ System Normal${NC}" echo -e "${BLUE}Active models match running processes${NC}" fi echo -e "\n${GREEN}Script completed${NC}" ```

GiteaMirror commented

2026-04-29 02:45:21 -05:00

@somera commented on GitHub (May 4, 2025):

The problem occurs, when someone is working with Visual Studio Code and Continue plugin and is using all the AI features which are possible with this plugin.

@somera commented on GitHub (May 4, 2025): The problem occurs, when someone is working with Visual Studio Code and Continue plugin and is using all the AI features which are possible with this plugin.

GiteaMirror commented

2026-04-29 02:45:22 -05:00

@rick-github commented on GitHub (May 4, 2025):

Thanks. I mis-read your post and thought you had a way to trigger the problem, which would make it easier to find the root cause. I'll have another go at debugging this today.

@rick-github commented on GitHub (May 4, 2025): Thanks. I mis-read your post and thought you had a way to trigger the problem, which would make it easier to find the root cause. I'll have another go at debugging this today.

GiteaMirror commented

2026-04-29 02:45:25 -05:00

@dhiltgen commented on GitHub (May 5, 2025):

Bummer. I've added some more logging for the next release to expose the PID of the runner during scheduling operations (when OLLAMA_DEBUG=1 is set) which should further help narrow down where the race is that's leading to the orphaned runners. Combined with your ps output showing the PIDs of the runners left behind we'll be able to see when those exact runners started, what the scheduler was doing with them, and when/if it tried to shut them down.

@dhiltgen commented on GitHub (May 5, 2025): Bummer. I've added some more logging for the next release to expose the PID of the runner during scheduling operations (when OLLAMA_DEBUG=1 is set) which should further help narrow down where the race is that's leading to the orphaned runners. Combined with your `ps` output showing the PIDs of the runners left behind we'll be able to see when those exact runners started, what the scheduler was doing with them, and when/if it tried to shut them down.

GiteaMirror commented

2026-04-29 02:45:27 -05:00

@somera commented on GitHub (May 5, 2025):

@dhiltgen sounds good to do it so. Meand this will be in the 0.6.8 release?

@somera commented on GitHub (May 5, 2025): @dhiltgen sounds good to do it so. Meand this will be in the 0.6.8 release?

GiteaMirror commented

2026-04-29 02:45:29 -05:00

@somera commented on GitHub (May 5, 2025):

Are your changes in v0.6.8?

@somera commented on GitHub (May 5, 2025): Are your changes in v0.6.8?

GiteaMirror commented

2026-04-29 02:45:30 -05:00

@rick-github commented on GitHub (May 5, 2025):

Log lines from 0.6.8 now contain information like 'runner.inference=cuda runner.devices=1 runner.size="20.9 GiB" runner.vram="11.5 GiB" runner.num_ctx=4096 runner.parallel=1 runner.pid=193'

@rick-github commented on GitHub (May 5, 2025): Log lines from 0.6.8 now contain information like 'runner.inference=cuda runner.devices=1 runner.size="20.9 GiB" runner.vram="11.5 GiB" runner.num_ctx=4096 runner.parallel=1 runner.pid=193'

GiteaMirror commented

2026-04-29 02:45:31 -05:00

@somera commented on GitHub (May 5, 2025):

ok. I installed the new version. As soon as the error occurs, I will send the logs.

@somera commented on GitHub (May 5, 2025): ok. I installed the new version. As soon as the error occurs, I will send the logs.

GiteaMirror commented

2026-04-29 02:45:34 -05:00

@somera commented on GitHub (May 6, 2025):

@rick-github @dhiltgen I got the error again.

$ ollama -v
ollama version is 0.6.8
$ ./ollama_detect_problems_v8.sh

=== Ollama Process Status ===
Active models (ollama ps)          : 1
Runner processes detected          : 2

=== Detailed Process Information ===
[ollama ps output]
NAME                 ID              SIZE     PROCESSOR    UNTIL
gemma3:27b-it-qat    29eb0b9aeda3    22 GB    100% GPU     4 minutes from now

[running processes]
      1 3198467 3198467 Mon May  5 19:10:41 2025 /usr/local/bin/ollama serve
3198467 3228400 3198467 Tue May  6 08:32:00 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 16384 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 2 --port 37561
3198467 3234141 3198467 Tue May  6 10:56:24 2025 /usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 --ctx-size 8192 --batch-size 512 --n-gpu-layers 63 --verbose --threads 32 --parallel 2 --port 37107

🛑 PROBLEM DETECTED
Found 2 runner processes but only 1 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
3198467 3228400 3198467 Tue May  6 08:32:00 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 16384 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 2 --port 37561

Do you want to restart Ollama service to clean up? (y/N) y

Restarting Ollama service...
Capturing logs from 2025-05-06 08:02:00...

✓ Service successfully restarted
Logs saved to: ollama-2025-05-06_10-57-01.log.gz

View logs now? (y/N) n

Keep log file? (Y/n)


Script completed

See the attached logs.

ollama-2025-05-06_10-57-01.zip

Is this helpful?

@somera commented on GitHub (May 6, 2025): @rick-github @dhiltgen I got the error again. ``` $ ollama -v ollama version is 0.6.8 $ ./ollama_detect_problems_v8.sh === Ollama Process Status === Active models (ollama ps) : 1 Runner processes detected : 2 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL gemma3:27b-it-qat 29eb0b9aeda3 22 GB 100% GPU 4 minutes from now [running processes] 1 3198467 3198467 Mon May 5 19:10:41 2025 /usr/local/bin/ollama serve 3198467 3228400 3198467 Tue May 6 08:32:00 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 16384 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 2 --port 37561 3198467 3234141 3198467 Tue May 6 10:56:24 2025 /usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 --ctx-size 8192 --batch-size 512 --n-gpu-layers 63 --verbose --threads 32 --parallel 2 --port 37107 🛑 PROBLEM DETECTED Found 2 runner processes but only 1 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 3198467 3228400 3198467 Tue May 6 08:32:00 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 16384 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 2 --port 37561 Do you want to restart Ollama service to clean up? (y/N) y Restarting Ollama service... Capturing logs from 2025-05-06 08:02:00... ✓ Service successfully restarted Logs saved to: ollama-2025-05-06_10-57-01.log.gz View logs now? (y/N) n Keep log file? (Y/n) Script completed ``` See the attached logs. [ollama-2025-05-06_10-57-01.zip](https://github.com/user-attachments/files/20056199/ollama-2025-05-06_10-57-01.zip) Is this helpful?

GiteaMirror commented

2026-04-29 02:45:39 -05:00

@dhiltgen commented on GitHub (May 6, 2025):

Yes, thanks! The offending PID does not appear in the log at all which indicates the leak is during the very early startup of the runner before it finished initialization.

@dhiltgen commented on GitHub (May 6, 2025): Yes, thanks! The offending PID does not appear in the log at all which indicates the leak is during the very early startup of the runner before it finished initialization.

GiteaMirror commented

2026-04-29 02:45:41 -05:00

@somera commented on GitHub (May 6, 2025):

I need to evaluate my script cause:

log_start_epoch=$((proc_start_epoch - 1800))

and the logs starts at 08:31:55.

@somera commented on GitHub (May 6, 2025): I need to evaluate my script cause: ``` log_start_epoch=$((proc_start_epoch - 1800)) ``` and the logs starts at 08:31:55.

GiteaMirror commented

2026-04-29 02:45:42 -05:00

@somera commented on GitHub (May 7, 2025):

@dhiltgen sounds good.

I'm curious if there's a reason why I only noticed this now (I noticed it back in March, but only recently). Or is there no explanation?

@somera commented on GitHub (May 7, 2025): @dhiltgen sounds good. I'm curious if there's a reason why I only noticed this now (I noticed it back in March, but only recently). Or is there no explanation?

GiteaMirror commented

2026-04-29 02:45:43 -05:00

@dhiltgen commented on GitHub (May 7, 2025):

Based on your logs, it seems the race condition is caused by lots of concurrent requests for a model where the context size varies, thus the model is being reloaded a lot, coupled with clients giving up and aborting the request before the model finishes loading.

@dhiltgen commented on GitHub (May 7, 2025): Based on your logs, it seems the race condition is caused by lots of concurrent requests for a model where the context size varies, thus the model is being reloaded a lot, coupled with clients giving up and aborting the request before the model finishes loading.

GiteaMirror commented

2026-04-29 02:45:45 -05:00

@somera commented on GitHub (May 12, 2025):

@dhiltgen when you will release the fix?

@somera commented on GitHub (May 12, 2025): @dhiltgen when you will release the fix?

GiteaMirror commented

2026-04-29 02:45:48 -05:00

@dhiltgen commented on GitHub (May 12, 2025):

@somera the next release should be out within a few days.

@dhiltgen commented on GitHub (May 12, 2025): @somera the next release should be out within a few days.

GiteaMirror commented

2026-04-29 02:45:50 -05:00

@somera commented on GitHub (May 19, 2025):

@dhiltgen I installed yesterday the v0.7.0 version. And I see the the problem:

$ ./ollama_detect_problems_v13.sh --debug
ℹ Debug mode enabled
ℹ Dry-run mode: false
ℹ Checking sudo credentials...
ℹ Ollama version: ollama version is 0.7.0

=== Ollama Process Status ===
Ollama version                     : ollama version is 0.7.0
Active models (ollama ps)          : 1
Runner processes detected          : 3

=== Detailed Process Information ===
[ollama ps output]
NAME               ID              SIZE     PROCESSOR          UNTIL
deepseek-r1:32b    38056bbcbb2d    22 GB    75%/25% CPU/GPU    4 minutes from now

[running processes]
   1051  181171    1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375
   1051  182420    1051 Mon May 19 14:27:14 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 33451
   1051  643757    1051 Mon May 19 15:09:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6150cb382311b69f09cc0f9a1b69fc029cbd742b66bb8ec531aa5ecf5c613e93 --ctx-size 4096 --batch-size 512 --n-gpu-layers 11 --threads 32 --parallel 1 --port 32955
      1    1051    1051 Sun May 18 10:57:12 2025 /usr/local/bin/ollama serve

🛑 PROBLEM DETECTED
Found 3 runner processes but only 1 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
   1051  181171    1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375

Do you want to restart Ollama service to clean up? (y/N) N
No action taken

Script completed

And when the running ollama process is finished, I see the zombie processes:

$ ./ollama_detect_problems_v13.sh --debug
ℹ Debug mode enabled
ℹ Dry-run mode: false
ℹ Checking sudo credentials...
ℹ Ollama version: ollama version is 0.7.0

=== Ollama Process Status ===
Ollama version                     : ollama version is 0.7.0
Active models (ollama ps)          : 0
Runner processes detected          : 2

=== Detailed Process Information ===
[ollama ps output]
NAME    ID    SIZE    PROCESSOR    UNTIL

[running processes]
   1051  181171    1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375
   1051  182420    1051 Mon May 19 14:27:14 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 33451
      1    1051    1051 Sun May 18 10:57:12 2025 /usr/local/bin/ollama serve

🛑 PROBLEM DETECTED
Found 2 runner processes but only 0 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
   1051  181171    1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375

Do you want to restart Ollama service to clean up? (y/N)

No action taken

Script completed

@somera commented on GitHub (May 19, 2025): @dhiltgen I installed yesterday the v0.7.0 version. And I see the the problem: ``` $ ./ollama_detect_problems_v13.sh --debug ℹ Debug mode enabled ℹ Dry-run mode: false ℹ Checking sudo credentials... ℹ Ollama version: ollama version is 0.7.0 === Ollama Process Status === Ollama version : ollama version is 0.7.0 Active models (ollama ps) : 1 Runner processes detected : 3 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-r1:32b 38056bbcbb2d 22 GB 75%/25% CPU/GPU 4 minutes from now [running processes] 1051 181171 1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375 1051 182420 1051 Mon May 19 14:27:14 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 33451 1051 643757 1051 Mon May 19 15:09:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6150cb382311b69f09cc0f9a1b69fc029cbd742b66bb8ec531aa5ecf5c613e93 --ctx-size 4096 --batch-size 512 --n-gpu-layers 11 --threads 32 --parallel 1 --port 32955 1 1051 1051 Sun May 18 10:57:12 2025 /usr/local/bin/ollama serve 🛑 PROBLEM DETECTED Found 3 runner processes but only 1 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 1051 181171 1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375 Do you want to restart Ollama service to clean up? (y/N) N No action taken Script completed ``` And when the running ollama process is finished, I see the zombie processes: ``` $ ./ollama_detect_problems_v13.sh --debug ℹ Debug mode enabled ℹ Dry-run mode: false ℹ Checking sudo credentials... ℹ Ollama version: ollama version is 0.7.0 === Ollama Process Status === Ollama version : ollama version is 0.7.0 Active models (ollama ps) : 0 Runner processes detected : 2 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL [running processes] 1051 181171 1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375 1051 182420 1051 Mon May 19 14:27:14 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 33451 1 1051 1051 Sun May 18 10:57:12 2025 /usr/local/bin/ollama serve 🛑 PROBLEM DETECTED Found 2 runner processes but only 0 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 1051 181171 1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375 Do you want to restart Ollama service to clean up? (y/N) No action taken Script completed ```

GiteaMirror commented

2026-04-29 02:45:55 -05:00

@dhiltgen commented on GitHub (May 20, 2025):

Sorry to hear that. @somera please share a server log that overlaps with the time of these requests so I can see references to these PIDs and try to find why we're still leaking runners.

@dhiltgen commented on GitHub (May 20, 2025): Sorry to hear that. @somera please share a server log that overlaps with the time of these requests so I can see references to these PIDs and try to find why we're still leaking runners.

GiteaMirror commented

2026-04-29 02:45:59 -05:00

@somera commented on GitHub (May 20, 2025):

@dhiltgen here the log for yesterday.

ollama-2025-05-19_15-52-50.zip

And here for today:

$ ./ollama_detect_problems_v13.sh
ℹ Checking sudo credentials...
[sudo] password for xxx:

=== Ollama Process Status ===
Ollama version                     : ollama version is 0.7.0
Active models (ollama ps)          : 0
Runner processes detected          : 1

=== Detailed Process Information ===
[ollama ps output]
NAME    ID    SIZE    PROCESSOR    UNTIL

[running processes]
      1  828276  828276 Mon May 19 15:52:50 2025 /usr/local/bin/ollama serve
 828276  896339  828276 Tue May 20 12:46:56 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 37027

🛑 PROBLEM DETECTED
Found 1 runner processes but only 0 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
 828276  896339  828276 Tue May 20 12:46:56 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 37027

Do you want to restart Ollama service to clean up? (y/N) y
Calculated log start time (-30min): 2025-05-20 12:16:56

Restarting Ollama service...

Capturing logs from 2025-05-20 12:16:56...

✓ Service successfully restarted
Logs saved to: ollama-2025-05-20_14-28-24.log.gz

View logs now? (y/N)


Keep log file? (Y/n)


Script completed

ollama-2025-05-20_14-28-24.zip

@somera commented on GitHub (May 20, 2025): @dhiltgen here the log for yesterday. [ollama-2025-05-19_15-52-50.zip](https://github.com/user-attachments/files/20347705/ollama-2025-05-19_15-52-50.zip) And here for today: ``` $ ./ollama_detect_problems_v13.sh ℹ Checking sudo credentials... [sudo] password for xxx: === Ollama Process Status === Ollama version : ollama version is 0.7.0 Active models (ollama ps) : 0 Runner processes detected : 1 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL [running processes] 1 828276 828276 Mon May 19 15:52:50 2025 /usr/local/bin/ollama serve 828276 896339 828276 Tue May 20 12:46:56 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 37027 🛑 PROBLEM DETECTED Found 1 runner processes but only 0 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 828276 896339 828276 Tue May 20 12:46:56 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 37027 Do you want to restart Ollama service to clean up? (y/N) y Calculated log start time (-30min): 2025-05-20 12:16:56 Restarting Ollama service... Capturing logs from 2025-05-20 12:16:56... ✓ Service successfully restarted Logs saved to: ollama-2025-05-20_14-28-24.log.gz View logs now? (y/N) Keep log file? (Y/n) Script completed ``` [ollama-2025-05-20_14-28-24.zip](https://github.com/user-attachments/files/20347707/ollama-2025-05-20_14-28-24.zip)

GiteaMirror commented

2026-04-29 02:46:02 -05:00

@somera commented on GitHub (May 20, 2025):

@dhiltgen I hope this is helpful, cause I don't run Ollama at the moment with OLLAMA_DEBUG=1. Cause it was too much.

@somera commented on GitHub (May 20, 2025): @dhiltgen I hope this is helpful, cause I don't run Ollama at the moment with OLLAMA_DEBUG=1. Cause it was too much.

GiteaMirror commented

2026-04-29 02:46:05 -05:00

@dhiltgen commented on GitHub (May 20, 2025):

Unfortunately it doesn't look like those PIDs are showing up in the non-debug logs. If you're still seeing it on a somewhat regular basis, please try running with debug enabled for a bit so we can try to capture the failure case details.

@dhiltgen commented on GitHub (May 20, 2025): Unfortunately it doesn't look like those PIDs are showing up in the non-debug logs. If you're still seeing it on a somewhat regular basis, please try running with debug enabled for a bit so we can try to capture the failure case details.

GiteaMirror commented

2026-04-29 02:46:07 -05:00

@somera commented on GitHub (May 20, 2025):

ok, I need than OLLAMA_DEBUG=1? I will set it tomorrow.

@somera commented on GitHub (May 20, 2025): ok, I need than OLLAMA_DEBUG=1? I will set it tomorrow.

GiteaMirror commented

2026-04-29 02:46:09 -05:00

@somera commented on GitHub (May 21, 2025):

@dhiltgen next try. Tell me then if it's ok, cause I will disable OLLAMA_DEBUG.

$ ./ollama_detect_problems_v13.sh
ℹ Checking sudo credentials...

=== Ollama Process Status ===
Ollama version                     : ollama version is 0.7.0
Active models (ollama ps)          : 1
Runner processes detected          : 3

=== Detailed Process Information ===
[ollama ps output]
NAME                              ID              SIZE     PROCESSOR          UNTIL
deepseek-coder-v2-fixed:latest    18245823b634    17 GB    68%/32% CPU/GPU    29 minutes from now

[running processes]
      1 1016422 1016422 Wed May 21 08:34:36 2025 /usr/local/bin/ollama serve
1016422 1027376 1016422 Wed May 21 13:37:53 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34493
1016422 1029209 1016422 Wed May 21 14:14:21 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 37881
1016422 1162059 1016422 Wed May 21 14:20:31 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 6 --threads 32 --parallel 1 --port 33863

🛑 PROBLEM DETECTED
Found 3 runner processes but only 1 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
1016422 1027376 1016422 Wed May 21 13:37:53 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34493

Do you want to restart Ollama service to clean up? (y/N) y
Calculated log start time (-30min): 2025-05-21 13:07:53

Restarting Ollama service...

Capturing logs from 2025-05-21 13:07:53...

✓ Service successfully restarted
Logs saved to: ollama-2025-05-21_14-21-47.log.gz

View logs now? (y/N)


Keep log file? (Y/n)


Script completed

And here the log.

ollama-2025-05-21_14-21-47.zip

@somera commented on GitHub (May 21, 2025): @dhiltgen next try. Tell me then if it's ok, cause I will disable OLLAMA_DEBUG. ``` $ ./ollama_detect_problems_v13.sh ℹ Checking sudo credentials... === Ollama Process Status === Ollama version : ollama version is 0.7.0 Active models (ollama ps) : 1 Runner processes detected : 3 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2-fixed:latest 18245823b634 17 GB 68%/32% CPU/GPU 29 minutes from now [running processes] 1 1016422 1016422 Wed May 21 08:34:36 2025 /usr/local/bin/ollama serve 1016422 1027376 1016422 Wed May 21 13:37:53 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34493 1016422 1029209 1016422 Wed May 21 14:14:21 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 37881 1016422 1162059 1016422 Wed May 21 14:20:31 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 6 --threads 32 --parallel 1 --port 33863 🛑 PROBLEM DETECTED Found 3 runner processes but only 1 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 1016422 1027376 1016422 Wed May 21 13:37:53 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34493 Do you want to restart Ollama service to clean up? (y/N) y Calculated log start time (-30min): 2025-05-21 13:07:53 Restarting Ollama service... Capturing logs from 2025-05-21 13:07:53... ✓ Service successfully restarted Logs saved to: ollama-2025-05-21_14-21-47.log.gz View logs now? (y/N) Keep log file? (Y/n) Script completed ``` And here the log. [ollama-2025-05-21_14-21-47.zip](https://github.com/user-attachments/files/20368003/ollama-2025-05-21_14-21-47.zip)

GiteaMirror commented

2026-04-29 02:46:13 -05:00

@dhiltgen commented on GitHub (May 21, 2025):

Yes, the PIDs are in there - go ahead and remove debug logging while I analyze the log.

@dhiltgen commented on GitHub (May 21, 2025): Yes, the PIDs are in there - go ahead and remove debug logging while I analyze the log.

GiteaMirror commented

2026-04-29 02:46:18 -05:00

@somera commented on GitHub (May 21, 2025):

@dhiltgen I have new one, if ...

$ ./ollama_detect_problems_v13.sh --debug
ℹ Debug mode enabled
ℹ Dry-run mode: false
ℹ Checking sudo credentials...
ℹ Ollama version: ollama version is 0.7.0

=== Ollama Process Status ===
Ollama version                     : ollama version is 0.7.0
Active models (ollama ps)          : 1
Runner processes detected          : 2

=== Detailed Process Information ===
[ollama ps output]
NAME                              ID              SIZE     PROCESSOR    UNTIL
deepseek-coder-v2-fixed:latest    18245823b634    17 GB    100% GPU     15 minutes from now

[running processes]
      1 1176728 1176728 Wed May 21 14:21:48 2025 /usr/local/bin/ollama serve
1176728 1177075 1176728 Wed May 21 14:23:34 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3      c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34231
1176728 1193547 1176728 Wed May 21 16:52:40 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3      c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 42989

🛑 PROBLEM DETECTED
Found 2 runner processes but only 1 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
1176728 1177075 1176728 Wed May 21 14:23:34 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3      c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34231

Do you want to restart Ollama service to clean up? (y/N) y
ℹ Raw date extracted: 'Wed May 21 14:23:34 2025'
ℹ Calculated exact log start time: 2025-05-21 13:53:34
Calculated log start time (-30min): 2025-05-21 13:53:34

Restarting Ollama service...
ℹ Actually performing service restart

Capturing logs from 2025-05-21 13:53:34...

✓ Service successfully restarted
Logs saved to: ollama-2025-05-21_17-07-39.log.gz

View logs now? (y/N)


Keep log file? (Y/n)


Script completed

And the logs.

ollama-2025-05-21_17-07-39.zip

@somera commented on GitHub (May 21, 2025): @dhiltgen I have new one, if ... ``` $ ./ollama_detect_problems_v13.sh --debug ℹ Debug mode enabled ℹ Dry-run mode: false ℹ Checking sudo credentials... ℹ Ollama version: ollama version is 0.7.0 === Ollama Process Status === Ollama version : ollama version is 0.7.0 Active models (ollama ps) : 1 Runner processes detected : 2 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2-fixed:latest 18245823b634 17 GB 100% GPU 15 minutes from now [running processes] 1 1176728 1176728 Wed May 21 14:21:48 2025 /usr/local/bin/ollama serve 1176728 1177075 1176728 Wed May 21 14:23:34 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3 c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34231 1176728 1193547 1176728 Wed May 21 16:52:40 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3 c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 42989 🛑 PROBLEM DETECTED Found 2 runner processes but only 1 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 1176728 1177075 1176728 Wed May 21 14:23:34 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3 c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34231 Do you want to restart Ollama service to clean up? (y/N) y ℹ Raw date extracted: 'Wed May 21 14:23:34 2025' ℹ Calculated exact log start time: 2025-05-21 13:53:34 Calculated log start time (-30min): 2025-05-21 13:53:34 Restarting Ollama service... ℹ Actually performing service restart Capturing logs from 2025-05-21 13:53:34... ✓ Service successfully restarted Logs saved to: ollama-2025-05-21_17-07-39.log.gz View logs now? (y/N) Keep log file? (Y/n) Script completed ``` And the logs. [ollama-2025-05-21_17-07-39.zip](https://github.com/user-attachments/files/20370812/ollama-2025-05-21_17-07-39.zip)

GiteaMirror commented

2026-04-29 02:46:21 -05:00

@somera commented on GitHub (May 22, 2025):

@dhiltgen will the fix be a part of v0.7.1 or v0.7.2?

@somera commented on GitHub (May 22, 2025): @dhiltgen will the fix be a part of v0.7.1 or v0.7.2?

GiteaMirror commented

2026-04-29 02:46:24 -05:00

@dhiltgen commented on GitHub (May 23, 2025):

The fix will be in v0.7.1

@dhiltgen commented on GitHub (May 23, 2025): The fix will be in v0.7.1

GiteaMirror commented

2026-04-29 02:46:26 -05:00

@somera commented on GitHub (May 27, 2025):

@dhiltgen v0.7.1 is now running for ~2 days and it looks good. Thx.

@somera commented on GitHub (May 27, 2025): @dhiltgen v0.7.1 is now running for ~2 days and it looks good. Thx.

GiteaMirror commented

2026-04-29 02:46:27 -05:00

@somera commented on GitHub (Jun 5, 2025):

I haven't seen the problem since the update. Thanks.

@somera commented on GitHub (Jun 5, 2025): I haven't seen the problem since the update. Thanks.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#53370