[GH-ISSUE #10433] Ollama 0.6.6 memory leak with different models #53370

Closed
opened 2026-04-29 02:44:25 -05:00 by GiteaMirror · 58 comments
Owner

Originally created by @somera on GitHub (Apr 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10433

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

In the last weeks I see that after an llm was used, the VRAM stay used. Last week I found that the problem is deepseek-coder-v2:16b. And yes, I have a Modelfile for this:

FROM deepseek-coder-v2:16b
PARAMETER num_ctx 24576
PARAMETER num_predict 8192

The problem I see for deepseek-coder-v2:16b and the model generated from the Modelfile.

Image

:/usr/share/ollama/.ollama/models# grep 5ff0ab */*
blobs/sha256-34488e453cfe3232810bac05c55d94a471228086fcac9e6b00ef3a671e21fa66:{"model_format":"gguf","model_family":"deepseek2","model_families":["deepseek2"],"model_type":"15.7B","file_type":"Q4_0","architecture":"amd64","os":"linux","rootfs":{"type":"layers","diff_ids":["sha256:5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046","sha256:b321cd7de6c7494351e6f0f6b4588378af4bf9cb6d2e0bba022ad81e72d9a776","sha256:4bb71764481f96d4161efc810c6185a0d0eb5a50ab7a0dedbdd283670cbcc2b5","sha256:1c8f573e830ca9b3ebfeb7ace1823146e22b66f99ee223840e7637c9e745e1c7","sha256:19f2fb9e8bc65a143f47903ec07dce010fd2873f994b900ea735a4b5022e968d"]}}

and ollana ps shows nothing.

And I saw this in the last weeks. I would say, this problem war in 0.6.5, 0.6.4 ... too.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.6.6

Originally created by @somera on GitHub (Apr 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10433 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? In the last weeks I see that after an llm was used, the VRAM stay used. Last week I found that the problem is `deepseek-coder-v2:16b`. And yes, I have a Modelfile for this: ``` FROM deepseek-coder-v2:16b PARAMETER num_ctx 24576 PARAMETER num_predict 8192 ``` The problem I see for `deepseek-coder-v2:16b` and the model generated from the Modelfile. ![Image](https://github.com/user-attachments/assets/40828b75-7c02-4b2d-ab61-9bc113fd2ce3) ``` :/usr/share/ollama/.ollama/models# grep 5ff0ab */* blobs/sha256-34488e453cfe3232810bac05c55d94a471228086fcac9e6b00ef3a671e21fa66:{"model_format":"gguf","model_family":"deepseek2","model_families":["deepseek2"],"model_type":"15.7B","file_type":"Q4_0","architecture":"amd64","os":"linux","rootfs":{"type":"layers","diff_ids":["sha256:5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046","sha256:b321cd7de6c7494351e6f0f6b4588378af4bf9cb6d2e0bba022ad81e72d9a776","sha256:4bb71764481f96d4161efc810c6185a0d0eb5a50ab7a0dedbdd283670cbcc2b5","sha256:1c8f573e830ca9b3ebfeb7ace1823146e22b66f99ee223840e7637c9e745e1c7","sha256:19f2fb9e8bc65a143f47903ec07dce010fd2873f994b900ea735a4b5022e968d"]}} ``` and `ollana ps` shows nothing. And I saw this in the last weeks. I would say, this problem war in 0.6.5, 0.6.4 ... too. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.6.6
GiteaMirror added the bug label 2026-04-29 02:44:25 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 28, 2025):

These are runner processes. The most likely explanation is that the server was killed or crashed, orphaning the runners. Server logs may aid in debugging.

<!-- gh-comment-id:2834510729 --> @rick-github commented on GitHub (Apr 28, 2025): These are runner processes. The most likely explanation is that the server was killed or crashed, orphaning the runners. [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@moonflash commented on GitHub (Apr 28, 2025):

I'm experiencing the same with gemma3:27b-it-qat.
Runnning:

  • ollama inside Docker
  • Linux host
  • one dedicated RTX-3090 per instance
    at the beginning I have 21G of VRAM and less than 1G of RAM usage after an hour i have the same on VRAM but all RAM (30G) used up to 40G of used swap memory
<!-- gh-comment-id:2834531378 --> @moonflash commented on GitHub (Apr 28, 2025): I'm experiencing the same with gemma3:27b-it-qat. Runnning: - ollama inside Docker - Linux host - one dedicated RTX-3090 per instance at the beginning I have 21G of VRAM and less than 1G of RAM usage after an hour i have the same on VRAM but all RAM (30G) used up to 40G of used swap memory
Author
Owner

@rick-github commented on GitHub (Apr 28, 2025):

Server logs may aid in debugging.

<!-- gh-comment-id:2834535079 --> @rick-github commented on GitHub (Apr 28, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@moonflash commented on GitHub (Apr 28, 2025):

time=2025-04-28T09:12:21.845Z level=DEBUG source=process_text_spm.go:184 msg="adding bos token to prompt" id=2
time=2025-04-28T09:12:22.047Z level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1034 prompt=926 used=0 remaining=926
[GIN] 2025/04/28 - 09:12:26 | 200 |  4.424865014s |      172.19.0.7 | POST     "/api/generate"
time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:409 msg="context for request finished"
time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 duration=2562047h47m16.854775807s
time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:359 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 refCount=0
time=2025-04-28T09:12:26.943Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-28T09:12:26.948Z level=DEBUG source=sched.go:577 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-04-28T09:12:26.949Z level=DEBUG source=routes.go:297 msg="generate request" images=1 prompt="<start_of_turn>user\nYou act as a web-shop expert reaponsible ...Please return response in proper JSON format [text end]<end_of_turn>\n<start_of_turn>model\n\n"

No other logs or errors 🤷‍♂

<!-- gh-comment-id:2834557085 --> @moonflash commented on GitHub (Apr 28, 2025): ``` time=2025-04-28T09:12:21.845Z level=DEBUG source=process_text_spm.go:184 msg="adding bos token to prompt" id=2 time=2025-04-28T09:12:22.047Z level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1034 prompt=926 used=0 remaining=926 [GIN] 2025/04/28 - 09:12:26 | 200 | 4.424865014s | 172.19.0.7 | POST "/api/generate" time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:409 msg="context for request finished" time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 duration=2562047h47m16.854775807s time=2025-04-28T09:12:26.163Z level=DEBUG source=sched.go:359 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 refCount=0 time=2025-04-28T09:12:26.943Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-28T09:12:26.948Z level=DEBUG source=sched.go:577 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-04-28T09:12:26.949Z level=DEBUG source=routes.go:297 msg="generate request" images=1 prompt="<start_of_turn>user\nYou act as a web-shop expert reaponsible ...Please return response in proper JSON format [text end]<end_of_turn>\n<start_of_turn>model\n\n" ``` No other logs or errors 🤷‍♂
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

These are runner processes. The most likely explanation is that the server was killed or crashed, orphaning the runners. Server logs may aid in debugging.

I extended my ollama.service with the OLLAMA_DEBUG to get more details.

<!-- gh-comment-id:2834560734 --> @somera commented on GitHub (Apr 28, 2025): > These are runner processes. The most likely explanation is that the server was killed or crashed, orphaning the runners. [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging. I extended my ollama.service with the `OLLAMA_DEBUG` to get more details.
Author
Owner

@rick-github commented on GitHub (Apr 28, 2025):

@moonflash Your problem looks different to the OP, open a new ticket and attach full logs.

<!-- gh-comment-id:2834563685 --> @rick-github commented on GitHub (Apr 28, 2025): @moonflash Your problem looks different to the OP, open a new ticket and attach full logs.
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

I get same problem today with qwen2.5-coder:32b. You can see it here:

Image

is the first line in the screenshot.

@rick-github here is the log for the qwen2.5-coder:32bissue.

ollama.zip

<!-- gh-comment-id:2835201251 --> @somera commented on GitHub (Apr 28, 2025): I get same problem today with `qwen2.5-coder:32b`. You can see it here: ![Image](https://github.com/user-attachments/assets/b5755d18-887b-4b10-b0e7-8c9266ab1795) is the first line in the screenshot. @rick-github here is the log for the `qwen2.5-coder:32b`issue. [ollama.zip](https://github.com/user-attachments/files/19940562/ollama.zip)
Author
Owner

@rick-github commented on GitHub (Apr 28, 2025):

What's the output of

ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) 
<!-- gh-comment-id:2835255435 --> @rick-github commented on GitHub (Apr 28, 2025): What's the output of ``` ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) ```
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

After I saw the problem, I restarted ollama service.

Now I see

# ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)
      1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve
2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385
2268027 2278642 2268027 Mon Apr 28 15:21:08 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 32 --parallel 1 --port 37655

and I'm wondering, cause I see

Image

and

Image

Model has ~23GB and GPU say 38GB are used.

I loaded mistral-small3.1:24b and ...

# ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)
      1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve
2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385
2268027 2280145 2268027 Mon Apr 28 15:35:15 2025 /usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc --ctx-size 4096 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 1 --port 36971

The ctx-size 32768 is unused phi4:14

manifests/registry.ollama.ai/library/phi4/latest:{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:f5d6f49c64775df1536e9d747c6b6b4c101f6a8658108fbd18a15d046575c68b","size":486},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20","size":9053114464},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:32695b892af87ef8fca6e13a1a31c67c1441d7398be037e366e2fc763857c06a","size":275},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:fa8235e5b48faca34e3ca98cf4f694ef08bd216d28b58071a1f85b1d50cb814d","size":1084},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:45a1c652dddc9efdcefa977ab81cfbe26b6e52bc8e78f2f4c698538783e0ac80","size":82}]}

model.

Looks like the problem occurs when ctx-size is high.

<!-- gh-comment-id:2835280363 --> @somera commented on GitHub (Apr 28, 2025): After I saw the problem, I restarted ollama service. Now I see ``` # ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) 1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve 2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385 2268027 2278642 2268027 Mon Apr 28 15:21:08 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 32 --parallel 1 --port 37655 ``` and I'm wondering, cause I see ![Image](https://github.com/user-attachments/assets/31eea523-2599-450d-8adc-ba94de3e03e3) and ![Image](https://github.com/user-attachments/assets/504f6b66-2de4-4525-8c05-15a7da5f53e3) Model has ~23GB and GPU say 38GB are used. I loaded `mistral-small3.1:24b` and ... ``` # ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) 1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve 2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385 2268027 2280145 2268027 Mon Apr 28 15:35:15 2025 /usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc --ctx-size 4096 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 1 --port 36971 ``` The `ctx-size 32768 `is unused `phi4:14` ``` manifests/registry.ollama.ai/library/phi4/latest:{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:f5d6f49c64775df1536e9d747c6b6b4c101f6a8658108fbd18a15d046575c68b","size":486},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20","size":9053114464},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:32695b892af87ef8fca6e13a1a31c67c1441d7398be037e366e2fc763857c06a","size":275},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:fa8235e5b48faca34e3ca98cf4f694ef08bd216d28b58071a1f85b1d50cb814d","size":1084},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:45a1c652dddc9efdcefa977ab81cfbe26b6e52bc8e78f2f4c698538783e0ac80","size":82}]} ``` model. Looks like the problem occurs when ctx-size is high.
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

Now ollama ps show nothing. But

# ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)
      1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve
2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385

and VRAM is used:

Image

There is a memory leak which occurs with different models on our setup.

<!-- gh-comment-id:2835291594 --> @somera commented on GitHub (Apr 28, 2025): Now `ollama ps` show nothing. But ``` # ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) 1 2268027 2268027 Mon Apr 28 13:42:51 2025 /usr/local/bin/ollama serve 2268027 2275563 2268027 Mon Apr 28 14:53:24 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385 ``` and VRAM is used: ![Image](https://github.com/user-attachments/assets/73f7535b-d3bc-4ae3-964a-78a331451f0e) There is a memory leak which occurs with different models on our setup.
Author
Owner

@rick-github commented on GitHub (Apr 28, 2025):

What's in the log now?

<!-- gh-comment-id:2835299995 --> @rick-github commented on GitHub (Apr 28, 2025): What's in the log now?
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

Her the logs from 13:00:00+

ollama2.zip

<!-- gh-comment-id:2835326844 --> @somera commented on GitHub (Apr 28, 2025): Her the logs from 13:00:00+ [ollama2.zip](https://github.com/user-attachments/files/19941230/ollama2.zip)
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

I didn't restart ollama service at the moment. If you need more input.

<!-- gh-comment-id:2835330313 --> @somera commented on GitHub (Apr 28, 2025): I didn't restart ollama service at the moment. If you need more input.
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

Currently I saw the memory leak with:

  • deepseek-coder-v2:16b (the original model and new one created from Modelfile)
  • qwen2.5-coder:32b
  • phi4:14b
<!-- gh-comment-id:2835347274 --> @somera commented on GitHub (Apr 28, 2025): Currently I saw the memory leak with: - deepseek-coder-v2:16b (the original model and new one created from Modelfile) - qwen2.5-coder:32b - phi4:14b
Author
Owner

@rick-github commented on GitHub (Apr 28, 2025):

Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:24.544+02:00 level=INFO source=server.go:405
 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20
 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385"

ollama started a runner for phi4:14b at 14:53:24.

Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:24.639+02:00 level=INFO source=runner.go:913
 msg="Server listening on 127.0.0.1:46385"

Runner ready and ready to load model.

Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: load_tensors: layer  40 assigned to device CUDA0, is_swa = 0

Runner finished layer assignment and starting VRAM writes.

Apr 28 14:53:25 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:25.656+02:00 level=DEBUG source=sched.go:386 
msg="sending an unloaded event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20

Runner told to unload model.

Apr 28 14:53:25 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:25.803+02:00 level=DEBUG source=server.go:625
 msg="model load progress 0.28"

Runner continues to load model.

Apr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.809+02:00 level=INFO source=server.go:619
 msg="llama runner started in 2.26 seconds"

Model loaded.

pr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.809+02:00 level=DEBUG source=routes.go:297
 msg="generate request" images=0 prompt="<fim_prefix>// Path: AI_testsuite\n//

Runner processes request.

Apr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.905+02:00 level=ERROR source=sched.go:327
 msg="finished request signal received after model unloaded" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20

ollama notices that an inflight request was finished even though the model was unloaded.

So the model was unloaded in the middle of processing a request. The runner then basically forgot it was supposed to stop, while the ollama server, having told the model to stop, discarded all state that it had. So the runner is orphaned, waiting around until something kills it.

It's not clear why the runner got an unload event. Do you set keep_alive in your requests?

<!-- gh-comment-id:2835384876 --> @rick-github commented on GitHub (Apr 28, 2025): ``` Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:24.544+02:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 46385" ``` ollama started a runner for phi4:14b at 14:53:24. ``` Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:24.639+02:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:46385" ``` Runner ready and ready to load model. ``` Apr 28 14:53:24 AI-DEV-VM ollama[2268027]: load_tensors: layer 40 assigned to device CUDA0, is_swa = 0 ``` Runner finished layer assignment and starting VRAM writes. ``` Apr 28 14:53:25 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:25.656+02:00 level=DEBUG source=sched.go:386 msg="sending an unloaded event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 ``` Runner told to unload model. ``` Apr 28 14:53:25 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:25.803+02:00 level=DEBUG source=server.go:625 msg="model load progress 0.28" ``` Runner continues to load model. ``` Apr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.809+02:00 level=INFO source=server.go:619 msg="llama runner started in 2.26 seconds" ``` Model loaded. ``` pr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.809+02:00 level=DEBUG source=routes.go:297 msg="generate request" images=0 prompt="<fim_prefix>// Path: AI_testsuite\n// ``` Runner processes request. ``` Apr 28 14:53:26 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:26.905+02:00 level=ERROR source=sched.go:327 msg="finished request signal received after model unloaded" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 ``` ollama notices that an inflight request was finished even though the model was unloaded. So the model was unloaded in the middle of processing a request. The runner then basically forgot it was supposed to stop, while the ollama server, having told the model to stop, discarded all state that it had. So the runner is orphaned, waiting around until something kills it. It's not clear why the runner got an unload event. Do you set `keep_alive` in your requests?
Author
Owner

@rick-github commented on GitHub (Apr 28, 2025):

Hmm, earlier we see:

pr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.559+02:00 level=INFO source=server.go:405
 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20
 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 38573"
Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.652+02:00 level=DEBUG source=server.go:569
 msg="server unhealthy" error="health resp: Get \"http://127.0.0.1:38573/health\":
 dial tcp 127.0.0.1:38573: connect: connection refused"
Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.652+02:00 level=DEBUG source=sched.go:285
 msg="resetting model to expire immediately to make room" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=0
Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.683+02:00 level=INFO source=runner.go:913
 msg="Server listening on 127.0.0.1:38573"

So what I think is happening that a runner is started, fails a health check but is ready 31ms later, and is marked for unload. The runner continues loading the model and processing the request, but the ollama server has committed to unloading it, so we get to the state where the runner is running and waiting do to completions, but the ollama server has forgotten about it.

<!-- gh-comment-id:2835426736 --> @rick-github commented on GitHub (Apr 28, 2025): Hmm, earlier we see: ``` pr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.559+02:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 32768 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 4 --port 38573" Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.652+02:00 level=DEBUG source=server.go:569 msg="server unhealthy" error="health resp: Get \"http://127.0.0.1:38573/health\": dial tcp 127.0.0.1:38573: connect: connection refused" Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.652+02:00 level=DEBUG source=sched.go:285 msg="resetting model to expire immediately to make room" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=0 Apr 28 14:53:22 AI-DEV-VM ollama[2268027]: time=2025-04-28T14:53:22.683+02:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:38573" ``` So what I think is happening that a runner is started, fails a health check but is ready 31ms later, and is marked for unload. The runner continues loading the model and processing the request, but the ollama server has committed to unloading it, so we get to the state where the runner is running and waiting do to completions, but the ollama server has forgotten about it.
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

It's not clear why the runner got an unload event. Do you set keep_alive in your requests?

I'm not the only one, who is using it. But as I know, keep_alive will be not used.

But we use nginx as reverse proxy for ollama:

server {
    listen 7777;
    server_name ollama.internal-domain.de;

    location / {
        # Proxy the request to the ollama service
        proxy_pass http://xxx.xxx.xxx.xxx:11434;
        proxy_set_header Host $host;

        proxy_connect_timeout       600;
        proxy_send_timeout          600;
        proxy_read_timeout          600;
        send_timeout                600;

        proxy_hide_header Access-Control-Allow-Origin;
        proxy_hide_header Access-Control-Allow-Methods;
        proxy_hide_header Access-Control-Allow-Headers;
        proxy_hide_header Access-Control-Expose-Headers;

        # (Optional) Disable proxy buffering for better streaming response from models
        proxy_buffering off;
    }
}
<!-- gh-comment-id:2835466277 --> @somera commented on GitHub (Apr 28, 2025): > It's not clear why the runner got an unload event. Do you set `keep_alive` in your requests? I'm not the only one, who is using it. But as I know, `keep_alive` will be not used. But we use nginx as reverse proxy for ollama: ``` server { listen 7777; server_name ollama.internal-domain.de; location / { # Proxy the request to the ollama service proxy_pass http://xxx.xxx.xxx.xxx:11434; proxy_set_header Host $host; proxy_connect_timeout 600; proxy_send_timeout 600; proxy_read_timeout 600; send_timeout 600; proxy_hide_header Access-Control-Allow-Origin; proxy_hide_header Access-Control-Allow-Methods; proxy_hide_header Access-Control-Allow-Headers; proxy_hide_header Access-Control-Expose-Headers; # (Optional) Disable proxy buffering for better streaming response from models proxy_buffering off; } } ```
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

I've been noticing the VRAM usage issue since March, but I haven't had time for a deep analysis and was convinced the problem was Gemma 3.

<!-- gh-comment-id:2835473814 --> @somera commented on GitHub (Apr 28, 2025): I've been noticing the VRAM usage issue since March, but I haven't had time for a deep analysis and was convinced the problem was Gemma 3.
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

So what I think is happening that a runner is started, fails a health check but is ready 31ms later, and is marked for unload. The runner continues loading the model and processing the request, but the ollama server has committed to unloading it, so we get to the state where the runner is running and waiting do to completions, but the ollama server has forgotten about it.

Is this a feature or bug? ;)

Normally, all models available in our setup are preloaded into RAM for better performance.

<!-- gh-comment-id:2835503828 --> @somera commented on GitHub (Apr 28, 2025): > So what I think is happening that a runner is started, fails a health check but is ready 31ms later, and is marked for unload. The runner continues loading the model and processing the request, but the ollama server has committed to unloading it, so we get to the state where the runner is running and waiting do to completions, but the ollama server has forgotten about it. Is this a feature or bug? ;) Normally, all models available in our setup are preloaded into RAM for better performance.
Author
Owner

@rick-github commented on GitHub (Apr 28, 2025):

convinced the problem was gemma3.

gemma3 did have a memory leak problem, resolved in 0.6.6. This issue is not a classical memory leak.

Is this a feature or bug? ;)

This is a bug. It looks like a race condition, I'm still trying to replicate the problem on my setup. I think multiple queued requests are somehow triggering needsReload() before the model has finished loading, causing the simultaneous load and unload actions.

<!-- gh-comment-id:2835523718 --> @rick-github commented on GitHub (Apr 28, 2025): > convinced the problem was gemma3. gemma3 did have a memory leak problem, resolved in 0.6.6. This issue is not a classical memory leak. > Is this a feature or bug? ;) This is a bug. It looks like a race condition, I'm still trying to replicate the problem on my setup. I think multiple queued requests are somehow triggering `needsReload()` before the model has finished loading, causing the simultaneous load and unload actions.
Author
Owner

@somera commented on GitHub (Apr 28, 2025):

The error occurs when others access Ollama from VS Code using Continue. I am currently trying to reproduce it with a Python script that performs API calls.

    def generate_code(self, prompt: str, params: GenerationParams) -> str:
        """Generate code using Ollama API with streaming and cancellation support.

        Args:
            prompt: Input prompt for code generation.
            params: Generation parameters.

        Returns:
            str: Generated code.

        Raises:
            ValueError: If parameters are invalid.
            Exception: If generation fails or is stopped.
        """
        is_valid, message = params.validate()
        if not is_valid:
            raise ValueError(message)

        options = {
            'temperature': params.temperature,
            'top_k': params.top_k,
            'top_p': params.top_p,
            'repeat_penalty': params.repeat_penalty,
            'num_predict': params.num_predict,
            'num_ctx': 32768,
        }
        if params.seed is not None:
            options['seed'] = params.seed

        url = f"{self.host}/api/chat"
        payload = {
            'model': params.model,
            'messages': [{'role': 'user', 'content': prompt}],
            'options': options,
            'stream': True  # We need streaming to implement cancellation
        }

        try:
            response = self._session.post(
                url,
                json=payload,
                stream=True,
                timeout=120
            )
            response.raise_for_status()

            full_response = []
            for line in response.iter_lines():
                if self._stop_flag:
                    raise Exception("Generation stopped by user")

                if line:
                    decoded_line = line.decode('utf-8')
                    if decoded_line.strip():
                        data = json.loads(decoded_line)
                        if 'message' in data and 'content' in data['message']:
                            full_response.append(data['message']['content'])

            return ''.join(full_response)
        except requests.exceptions.RequestException as e:
            raise Exception(f"Generation failed: {str(e)}")

Although I set num_ctx=32768 and I am sending only a small prompt. This means the behavior on the GPU is different because it is not heavily loaded. I might try sending a longer prompt, but I will need to adjust my script for that.

<!-- gh-comment-id:2835991813 --> @somera commented on GitHub (Apr 28, 2025): The error occurs when others access Ollama from VS Code using Continue. I am currently trying to reproduce it with a Python script that performs API calls. ``` def generate_code(self, prompt: str, params: GenerationParams) -> str: """Generate code using Ollama API with streaming and cancellation support. Args: prompt: Input prompt for code generation. params: Generation parameters. Returns: str: Generated code. Raises: ValueError: If parameters are invalid. Exception: If generation fails or is stopped. """ is_valid, message = params.validate() if not is_valid: raise ValueError(message) options = { 'temperature': params.temperature, 'top_k': params.top_k, 'top_p': params.top_p, 'repeat_penalty': params.repeat_penalty, 'num_predict': params.num_predict, 'num_ctx': 32768, } if params.seed is not None: options['seed'] = params.seed url = f"{self.host}/api/chat" payload = { 'model': params.model, 'messages': [{'role': 'user', 'content': prompt}], 'options': options, 'stream': True # We need streaming to implement cancellation } try: response = self._session.post( url, json=payload, stream=True, timeout=120 ) response.raise_for_status() full_response = [] for line in response.iter_lines(): if self._stop_flag: raise Exception("Generation stopped by user") if line: decoded_line = line.decode('utf-8') if decoded_line.strip(): data = json.loads(decoded_line) if 'message' in data and 'content' in data['message']: full_response.append(data['message']['content']) return ''.join(full_response) except requests.exceptions.RequestException as e: raise Exception(f"Generation failed: {str(e)}") ``` Although I set `num_ctx=32768` and I am sending only a small prompt. This means the behavior on the GPU is different because it is not heavily loaded. I might try sending a longer prompt, but I will need to adjust my script for that.
Author
Owner

@somera commented on GitHub (Apr 29, 2025):

I can't reproduce this with my python script.

And today the same propblem with phi4:14b and lover ctx-size

$ ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)
      1 2287965 2287965 Mon Apr 28 16:38:57 2025 /usr/local/bin/ollama serve
2287965 2424183 2287965 Tue Apr 29 08:40:16 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 1 --port 45425
<!-- gh-comment-id:2837929999 --> @somera commented on GitHub (Apr 29, 2025): I can't reproduce this with my python script. And today the same propblem with `phi4:14b` and lover `ctx-size` ``` $ ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama) 1 2287965 2287965 Mon Apr 28 16:38:57 2025 /usr/local/bin/ollama serve 2287965 2424183 2287965 Tue Apr 29 08:40:16 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 1 --port 45425 ```
Author
Owner

@somera commented on GitHub (Apr 29, 2025):

@rick-github and I found this

$ sudo journalctl -u ollama --since='2025-04-28 10:00:00' | grep "expired event with positive ref count, retrying" | wc -l
1220404

too. In 26,5 hours 1.220.404 entries.

Apr 29 08:47:40 AI-DEV-VM ollama[2287965]: time=2025-04-29T08:47:40.506+02:00 level=DEBUG source=sched.go:365 msg="expired event with positive ref count, retrying" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=1
Apr 29 08:47:40 AI-DEV-VM ollama[2287965]: time=2025-04-29T08:47:40.517+02:00 level=DEBUG source=sched.go:365 msg="expired event with positive ref count, retrying" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=1
<!-- gh-comment-id:2838242314 --> @somera commented on GitHub (Apr 29, 2025): @rick-github and I found this ``` $ sudo journalctl -u ollama --since='2025-04-28 10:00:00' | grep "expired event with positive ref count, retrying" | wc -l 1220404 ``` too. In 26,5 hours 1.220.404 entries. ``` Apr 29 08:47:40 AI-DEV-VM ollama[2287965]: time=2025-04-29T08:47:40.506+02:00 level=DEBUG source=sched.go:365 msg="expired event with positive ref count, retrying" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=1 Apr 29 08:47:40 AI-DEV-VM ollama[2287965]: time=2025-04-29T08:47:40.517+02:00 level=DEBUG source=sched.go:365 msg="expired event with positive ref count, retrying" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 refCount=1 ```
Author
Owner

@somera commented on GitHub (Apr 29, 2025):

It's not clear why the runner got an unload event. Do you set keep_alive in your requests?

No one set keep_alive.

We are using Open WebUI. And there it's possible to send same prompt to more than one model. I coudn't here reproduce the problem.

I this the problem occurs when someone is using VS Code with the Continue Plugin.

<!-- gh-comment-id:2838420570 --> @somera commented on GitHub (Apr 29, 2025): > It's not clear why the runner got an unload event. Do you set `keep_alive` in your requests? No one set `keep_alive`. We are using Open WebUI. And there it's possible to send same prompt to more than one model. I coudn't here reproduce the problem. I this the problem occurs when someone is using VS Code with the Continue Plugin.
Author
Owner

@somera commented on GitHub (Apr 29, 2025):

I have now a bash script which detect the problem:

Image

Here some details ...

=== Ollama Process Status ===
Active models (ollama ps)          : 1
Runner processes detected          : 4

=== Detailed Process Information ===
[ollama ps output]
NAME                              ID              SIZE     PROCESSOR    UNTIL
deepseek-coder-v2-fixed:latest    18245823b634    15 GB    100% CPU     24 minutes from now

[running processes]
      1 2446603 2446603 Tue Apr 29 13:00:07 2025 /usr/local/bin/ollama serve
2446603 2464527 2446603 Tue Apr 29 15:24:18 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 98304 --batch-size 512 --n-gpu-layers 28 --verbose --threads 32 --parallel 4 --port 44237
2446603 2465369 2446603 Tue Apr 29 15:25:07 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 9 --verbose --threads 32 --parallel 1 --port 42327
2446603 2477310 2446603 Tue Apr 29 16:43:17 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --verbose --threads 32 --no-mmap --parallel 1 --port 36139
2446603 2477448 2446603 Tue Apr 29 16:43:38 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --verbose --threads 32 --no-mmap --parallel 1 --port 40271

I'm wondering about the 4 ollama runners.

I restarted ollama. And later I saw ollama runner with ctx-size=98304

=== Ollama Process Status ===
Active models (ollama ps)          : 1
Runner processes detected          : 1

=== Detailed Process Information ===
[ollama ps output]
NAME                              ID              SIZE     PROCESSOR    UNTIL
deepseek-coder-v2-fixed:latest    18245823b634    40 GB    100% GPU     26 minutes from now

[running processes]
      1 2584727 2584727 Tue Apr 29 17:22:33 2025 /usr/local/bin/ollama serve
2584727 2589143 2584727 Tue Apr 29 18:00:48 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 98304 --batch-size 512 --n-gpu-layers 28 --verbose --threads 32 --parallel 4 --port 39637

✓ System Normal
Active models match running processes

I made an Ollana API Call with ctx-size=8192. Ollama used the runner with -ctx-size 98304

I don't know how, but I thing the initial problem occurs when someone is working with VS Code and Continue plugin and is using code completition.

<!-- gh-comment-id:2839319461 --> @somera commented on GitHub (Apr 29, 2025): I have now a bash script which detect the problem: ![Image](https://github.com/user-attachments/assets/7fed6990-25e9-4293-b48c-a50cc4a7a121) Here some details ... ``` === Ollama Process Status === Active models (ollama ps) : 1 Runner processes detected : 4 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2-fixed:latest 18245823b634 15 GB 100% CPU 24 minutes from now [running processes] 1 2446603 2446603 Tue Apr 29 13:00:07 2025 /usr/local/bin/ollama serve 2446603 2464527 2446603 Tue Apr 29 15:24:18 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 98304 --batch-size 512 --n-gpu-layers 28 --verbose --threads 32 --parallel 4 --port 44237 2446603 2465369 2446603 Tue Apr 29 15:25:07 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 9 --verbose --threads 32 --parallel 1 --port 42327 2446603 2477310 2446603 Tue Apr 29 16:43:17 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --verbose --threads 32 --no-mmap --parallel 1 --port 36139 2446603 2477448 2446603 Tue Apr 29 16:43:38 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --verbose --threads 32 --no-mmap --parallel 1 --port 40271 ``` I'm wondering about the 4 ollama runners. I restarted ollama. And later I saw ollama runner with ctx-size=98304 ``` === Ollama Process Status === Active models (ollama ps) : 1 Runner processes detected : 1 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2-fixed:latest 18245823b634 40 GB 100% GPU 26 minutes from now [running processes] 1 2584727 2584727 Tue Apr 29 17:22:33 2025 /usr/local/bin/ollama serve 2584727 2589143 2584727 Tue Apr 29 18:00:48 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 98304 --batch-size 512 --n-gpu-layers 28 --verbose --threads 32 --parallel 4 --port 39637 ✓ System Normal Active models match running processes ``` I made an Ollana API Call with `ctx-size=8192`. Ollama used the runner with `-ctx-size 98304` I don't know how, but I thing the initial problem occurs when someone is working with VS Code and Continue plugin and is using code completition.
Author
Owner

@dhiltgen commented on GitHub (May 3, 2025):

I fixed a race condition bug in the scheduler in 0.6.7 which changed some of the startup logic. I'm curious if the behavior is any better in 0.6.7?

<!-- gh-comment-id:2848295411 --> @dhiltgen commented on GitHub (May 3, 2025): I fixed a race condition bug in the scheduler in 0.6.7 which changed some of the startup logic. I'm curious if the behavior is any better in 0.6.7?
Author
Owner

@somera commented on GitHub (May 3, 2025):

@dhiltgen I'll install the new version on Monday and then I'll keep an eye on it.

<!-- gh-comment-id:2848545303 --> @somera commented on GitHub (May 3, 2025): @dhiltgen I'll install the new version on Monday and then I'll keep an eye on it.
Author
Owner

@somera commented on GitHub (May 4, 2025):

@dhiltgen I see no changes.

$ ./ollama_detect_problems_v8.sh

=== Ollama Process Status ===
Active models (ollama ps)          : 1
Runner processes detected          : 2

=== Detailed Process Information ===
[ollama ps output]
NAME                              ID              SIZE     PROCESSOR    UNTIL
deepseek-coder-v2-fixed:latest    18245823b634    17 GB    100% GPU     29 minutes from now

[running processes]
      1    1043    1043 Sat May  3 12:59:00 2025 /usr/local/bin/ollama serve
   1043 2193215    1043 Sat May  3 19:07:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 42361
   1043  251497    1043 Sun May  4 11:36:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 37181

🛑 PROBLEM DETECTED
Found 2 runner processes but only 1 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
   1043 2193215    1043 Sat May  3 19:07:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 42361

Do you want to restart Ollama service to clean up? (y/N)

Image

<!-- gh-comment-id:2849114094 --> @somera commented on GitHub (May 4, 2025): @dhiltgen I see no changes. ``` $ ./ollama_detect_problems_v8.sh === Ollama Process Status === Active models (ollama ps) : 1 Runner processes detected : 2 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2-fixed:latest 18245823b634 17 GB 100% GPU 29 minutes from now [running processes] 1 1043 1043 Sat May 3 12:59:00 2025 /usr/local/bin/ollama serve 1043 2193215 1043 Sat May 3 19:07:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 42361 1043 251497 1043 Sun May 4 11:36:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 37181 🛑 PROBLEM DETECTED Found 2 runner processes but only 1 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 1043 2193215 1043 Sat May 3 19:07:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 42361 Do you want to restart Ollama service to clean up? (y/N) ``` ![Image](https://github.com/user-attachments/assets/cba5b5b5-c5cc-4698-ab79-a9724a77de5a)
Author
Owner

@rick-github commented on GitHub (May 4, 2025):

Can you post your shell script?

<!-- gh-comment-id:2849114930 --> @rick-github commented on GitHub (May 4, 2025): Can you post your shell script?
Author
Owner

@rick-github commented on GitHub (May 4, 2025):

Oh, wait, it just detects the problem but doesn't trigger it?

<!-- gh-comment-id:2849115391 --> @rick-github commented on GitHub (May 4, 2025): Oh, wait, it just detects the problem but doesn't trigger it?
Author
Owner

@somera commented on GitHub (May 4, 2025):

@rick-github yes, this is a bash script, which detect the problem. Make it easier for me.

#!/bin/bash

# Color codes for better output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Check if ollama is installed
if ! command -v ollama &> /dev/null; then
    echo -e "${RED}✗ Error: Ollama is not installed or not in PATH${NC}"
    echo -e "Please install Ollama first: ${BLUE}https://ollama.ai${NC}"
    exit 1
fi

# Check if ollama is running
if ! pidof ollama > /dev/null; then
    echo -e "${BLUE}ℹ Ollama is not currently running${NC}"
    exit 0
fi

# Get status information
ollama_ps_output=$(ollama ps)
ollama_processes=$(ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama))
active_models=$(echo "$ollama_ps_output" | grep -v "^NAME" | grep -v "^$" | wc -l)
runner_processes=$(echo "$ollama_processes" | grep -c "ollama runner")

# Display status header
echo -e "\n=== ${BLUE}Ollama Process Status${NC} ==="
printf "%-35s: %d\n" "Active models (ollama ps)" "$active_models"
printf "%-35s: %d\n" "Runner processes detected" "$runner_processes"

# Only show details if not in normal state
if [ $runner_processes -ne $active_models ] || [ $active_models -gt 0 ]; then
    echo -e "\n=== ${YELLOW}Detailed Process Information${NC} ==="
    echo -e "[${YELLOW}ollama ps output${NC}]"
    echo "$ollama_ps_output"
    echo -e "\n[${YELLOW}running processes${NC}]"
    echo "$ollama_processes"
fi

# Evaluate the situation
if [ $runner_processes -gt $active_models ]; then
    echo -e "\n${RED}🛑 PROBLEM DETECTED${NC}"
    echo -e "Found ${RED}$runner_processes${NC} runner processes but only ${GREEN}$active_models${NC} active models!"
    echo -e "This indicates unloaded models still occupying resources."
    
    oldest_runner=$(echo "$ollama_processes" | grep "ollama runner" | head -n 1)
    echo -e "\n${YELLOW}Oldest runner process (zombie candidate):${NC}"
    echo "$oldest_runner"
    
    read -p $'\nDo you want to restart Ollama service to clean up? (y/N) ' -n 1 -r
    echo ""
    if [[ $REPLY =~ ^[Yy]$ ]]; then
        # Extract process start time and calculate log start time (30 mins earlier)
        proc_start=$(echo "$oldest_runner" | awk '{print $4" "$5" "$6" "$7" "$8}')
        proc_start_epoch=$(date -d "$proc_start" +%s)
        log_start_epoch=$((proc_start_epoch - 1800))
        log_start=$(date -d "@$log_start_epoch" "+%Y-%m-%d %H:%M:%S")
        
        LOG_FILE="ollama-$(date +'%Y-%m-%d_%H-%M-%S').log"
        echo -e "\n${BLUE}Restarting Ollama service...${NC}"
        
        sudo systemctl restart ollama
        
        if ! systemctl is-active --quiet ollama; then
            echo -e "${RED}✗ Error: Ollama service failed to restart!${NC}"
            exit 1
        fi
        
        echo -e "Capturing logs from ${YELLOW}$log_start${NC}..."
        sudo journalctl -u ollama --since="$log_start" > "$LOG_FILE"
        gzip "$LOG_FILE"
        COMPRESSED_FILE="${LOG_FILE}.gz"
        
        echo -e "\n${GREEN}✓ Service successfully restarted${NC}"
        echo -e "Logs saved to: ${YELLOW}${COMPRESSED_FILE}${NC}"
        
        read -p $'\nView logs now? (y/N) ' -n 1 -r
        echo ""
        [[ $REPLY =~ ^[Yy]$ ]] && zless "$COMPRESSED_FILE"
        
        read -p $'\nKeep log file? (Y/n) ' -n 1 -r  # Fixed -R to -r here
        echo ""
        [[ $REPLY =~ ^[Nn]$ ]] && rm "$COMPRESSED_FILE" && echo -e "${YELLOW}Log file deleted${NC}"
        
    else
        echo -e "${YELLOW}No action taken${NC}"
    fi
elif [ $runner_processes -eq 0 ] && [ $active_models -eq 0 ]; then
    echo -e "\n${GREEN}✓ System Normal${NC}"
    echo -e "${BLUE}Only main Ollama serve process running (expected state)${NC}"
elif [ $runner_processes -eq $active_models ]; then
    echo -e "\n${GREEN}✓ System Normal${NC}"
    echo -e "${BLUE}Active models match running processes${NC}"
fi

echo -e "\n${GREEN}Script completed${NC}"
<!-- gh-comment-id:2849120048 --> @somera commented on GitHub (May 4, 2025): @rick-github yes, this is a bash script, which detect the problem. Make it easier for me. ``` #!/bin/bash # Color codes for better output RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' # No Color # Check if ollama is installed if ! command -v ollama &> /dev/null; then echo -e "${RED}✗ Error: Ollama is not installed or not in PATH${NC}" echo -e "Please install Ollama first: ${BLUE}https://ollama.ai${NC}" exit 1 fi # Check if ollama is running if ! pidof ollama > /dev/null; then echo -e "${BLUE}ℹ Ollama is not currently running${NC}" exit 0 fi # Get status information ollama_ps_output=$(ollama ps) ollama_processes=$(ps wwho ppid,pid,pgid,lstart,cmd klstart p$(pidof ollama)) active_models=$(echo "$ollama_ps_output" | grep -v "^NAME" | grep -v "^$" | wc -l) runner_processes=$(echo "$ollama_processes" | grep -c "ollama runner") # Display status header echo -e "\n=== ${BLUE}Ollama Process Status${NC} ===" printf "%-35s: %d\n" "Active models (ollama ps)" "$active_models" printf "%-35s: %d\n" "Runner processes detected" "$runner_processes" # Only show details if not in normal state if [ $runner_processes -ne $active_models ] || [ $active_models -gt 0 ]; then echo -e "\n=== ${YELLOW}Detailed Process Information${NC} ===" echo -e "[${YELLOW}ollama ps output${NC}]" echo "$ollama_ps_output" echo -e "\n[${YELLOW}running processes${NC}]" echo "$ollama_processes" fi # Evaluate the situation if [ $runner_processes -gt $active_models ]; then echo -e "\n${RED}🛑 PROBLEM DETECTED${NC}" echo -e "Found ${RED}$runner_processes${NC} runner processes but only ${GREEN}$active_models${NC} active models!" echo -e "This indicates unloaded models still occupying resources." oldest_runner=$(echo "$ollama_processes" | grep "ollama runner" | head -n 1) echo -e "\n${YELLOW}Oldest runner process (zombie candidate):${NC}" echo "$oldest_runner" read -p $'\nDo you want to restart Ollama service to clean up? (y/N) ' -n 1 -r echo "" if [[ $REPLY =~ ^[Yy]$ ]]; then # Extract process start time and calculate log start time (30 mins earlier) proc_start=$(echo "$oldest_runner" | awk '{print $4" "$5" "$6" "$7" "$8}') proc_start_epoch=$(date -d "$proc_start" +%s) log_start_epoch=$((proc_start_epoch - 1800)) log_start=$(date -d "@$log_start_epoch" "+%Y-%m-%d %H:%M:%S") LOG_FILE="ollama-$(date +'%Y-%m-%d_%H-%M-%S').log" echo -e "\n${BLUE}Restarting Ollama service...${NC}" sudo systemctl restart ollama if ! systemctl is-active --quiet ollama; then echo -e "${RED}✗ Error: Ollama service failed to restart!${NC}" exit 1 fi echo -e "Capturing logs from ${YELLOW}$log_start${NC}..." sudo journalctl -u ollama --since="$log_start" > "$LOG_FILE" gzip "$LOG_FILE" COMPRESSED_FILE="${LOG_FILE}.gz" echo -e "\n${GREEN}✓ Service successfully restarted${NC}" echo -e "Logs saved to: ${YELLOW}${COMPRESSED_FILE}${NC}" read -p $'\nView logs now? (y/N) ' -n 1 -r echo "" [[ $REPLY =~ ^[Yy]$ ]] && zless "$COMPRESSED_FILE" read -p $'\nKeep log file? (Y/n) ' -n 1 -r # Fixed -R to -r here echo "" [[ $REPLY =~ ^[Nn]$ ]] && rm "$COMPRESSED_FILE" && echo -e "${YELLOW}Log file deleted${NC}" else echo -e "${YELLOW}No action taken${NC}" fi elif [ $runner_processes -eq 0 ] && [ $active_models -eq 0 ]; then echo -e "\n${GREEN}✓ System Normal${NC}" echo -e "${BLUE}Only main Ollama serve process running (expected state)${NC}" elif [ $runner_processes -eq $active_models ]; then echo -e "\n${GREEN}✓ System Normal${NC}" echo -e "${BLUE}Active models match running processes${NC}" fi echo -e "\n${GREEN}Script completed${NC}" ```
Author
Owner

@somera commented on GitHub (May 4, 2025):

The problem occurs, when someone is working with Visual Studio Code and Continue plugin and is using all the AI features which are possible with this plugin.

<!-- gh-comment-id:2849120569 --> @somera commented on GitHub (May 4, 2025): The problem occurs, when someone is working with Visual Studio Code and Continue plugin and is using all the AI features which are possible with this plugin.
Author
Owner

@rick-github commented on GitHub (May 4, 2025):

Thanks. I mis-read your post and thought you had a way to trigger the problem, which would make it easier to find the root cause. I'll have another go at debugging this today.

<!-- gh-comment-id:2849121622 --> @rick-github commented on GitHub (May 4, 2025): Thanks. I mis-read your post and thought you had a way to trigger the problem, which would make it easier to find the root cause. I'll have another go at debugging this today.
Author
Owner

@dhiltgen commented on GitHub (May 5, 2025):

Bummer. I've added some more logging for the next release to expose the PID of the runner during scheduling operations (when OLLAMA_DEBUG=1 is set) which should further help narrow down where the race is that's leading to the orphaned runners. Combined with your ps output showing the PIDs of the runners left behind we'll be able to see when those exact runners started, what the scheduler was doing with them, and when/if it tried to shut them down.

<!-- gh-comment-id:2851320184 --> @dhiltgen commented on GitHub (May 5, 2025): Bummer. I've added some more logging for the next release to expose the PID of the runner during scheduling operations (when OLLAMA_DEBUG=1 is set) which should further help narrow down where the race is that's leading to the orphaned runners. Combined with your `ps` output showing the PIDs of the runners left behind we'll be able to see when those exact runners started, what the scheduler was doing with them, and when/if it tried to shut them down.
Author
Owner

@somera commented on GitHub (May 5, 2025):

@dhiltgen sounds good to do it so. Meand this will be in the 0.6.8 release?

<!-- gh-comment-id:2851337816 --> @somera commented on GitHub (May 5, 2025): @dhiltgen sounds good to do it so. Meand this will be in the 0.6.8 release?
Author
Owner

@somera commented on GitHub (May 5, 2025):

Are your changes in v0.6.8?

<!-- gh-comment-id:2851643828 --> @somera commented on GitHub (May 5, 2025): Are your changes in v0.6.8?
Author
Owner

@rick-github commented on GitHub (May 5, 2025):

Log lines from 0.6.8 now contain information like 'runner.inference=cuda runner.devices=1 runner.size="20.9 GiB" runner.vram="11.5 GiB" runner.num_ctx=4096 runner.parallel=1 runner.pid=193'

<!-- gh-comment-id:2851655091 --> @rick-github commented on GitHub (May 5, 2025): Log lines from 0.6.8 now contain information like 'runner.inference=cuda runner.devices=1 runner.size="20.9 GiB" runner.vram="11.5 GiB" runner.num_ctx=4096 runner.parallel=1 runner.pid=193'
Author
Owner

@somera commented on GitHub (May 5, 2025):

ok. I installed the new version. As soon as the error occurs, I will send the logs.

<!-- gh-comment-id:2851772794 --> @somera commented on GitHub (May 5, 2025): ok. I installed the new version. As soon as the error occurs, I will send the logs.
Author
Owner

@somera commented on GitHub (May 6, 2025):

@rick-github @dhiltgen I got the error again.

$ ollama -v
ollama version is 0.6.8
$ ./ollama_detect_problems_v8.sh

=== Ollama Process Status ===
Active models (ollama ps)          : 1
Runner processes detected          : 2

=== Detailed Process Information ===
[ollama ps output]
NAME                 ID              SIZE     PROCESSOR    UNTIL
gemma3:27b-it-qat    29eb0b9aeda3    22 GB    100% GPU     4 minutes from now

[running processes]
      1 3198467 3198467 Mon May  5 19:10:41 2025 /usr/local/bin/ollama serve
3198467 3228400 3198467 Tue May  6 08:32:00 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 16384 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 2 --port 37561
3198467 3234141 3198467 Tue May  6 10:56:24 2025 /usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 --ctx-size 8192 --batch-size 512 --n-gpu-layers 63 --verbose --threads 32 --parallel 2 --port 37107

🛑 PROBLEM DETECTED
Found 2 runner processes but only 1 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
3198467 3228400 3198467 Tue May  6 08:32:00 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 16384 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 2 --port 37561

Do you want to restart Ollama service to clean up? (y/N) y

Restarting Ollama service...
Capturing logs from 2025-05-06 08:02:00...

✓ Service successfully restarted
Logs saved to: ollama-2025-05-06_10-57-01.log.gz

View logs now? (y/N) n

Keep log file? (Y/n)


Script completed

See the attached logs.

ollama-2025-05-06_10-57-01.zip

Is this helpful?

<!-- gh-comment-id:2853923346 --> @somera commented on GitHub (May 6, 2025): @rick-github @dhiltgen I got the error again. ``` $ ollama -v ollama version is 0.6.8 $ ./ollama_detect_problems_v8.sh === Ollama Process Status === Active models (ollama ps) : 1 Runner processes detected : 2 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL gemma3:27b-it-qat 29eb0b9aeda3 22 GB 100% GPU 4 minutes from now [running processes] 1 3198467 3198467 Mon May 5 19:10:41 2025 /usr/local/bin/ollama serve 3198467 3228400 3198467 Tue May 6 08:32:00 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 16384 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 2 --port 37561 3198467 3234141 3198467 Tue May 6 10:56:24 2025 /usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 --ctx-size 8192 --batch-size 512 --n-gpu-layers 63 --verbose --threads 32 --parallel 2 --port 37107 🛑 PROBLEM DETECTED Found 2 runner processes but only 1 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 3198467 3228400 3198467 Tue May 6 08:32:00 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-fd7b6731c33c57f61767612f56517460ec2d1e2e5a3f0163e0eb3d8d8cb5df20 --ctx-size 16384 --batch-size 512 --n-gpu-layers 41 --verbose --threads 32 --parallel 2 --port 37561 Do you want to restart Ollama service to clean up? (y/N) y Restarting Ollama service... Capturing logs from 2025-05-06 08:02:00... ✓ Service successfully restarted Logs saved to: ollama-2025-05-06_10-57-01.log.gz View logs now? (y/N) n Keep log file? (Y/n) Script completed ``` See the attached logs. [ollama-2025-05-06_10-57-01.zip](https://github.com/user-attachments/files/20056199/ollama-2025-05-06_10-57-01.zip) Is this helpful?
Author
Owner

@dhiltgen commented on GitHub (May 6, 2025):

Yes, thanks! The offending PID does not appear in the log at all which indicates the leak is during the very early startup of the runner before it finished initialization.

<!-- gh-comment-id:2854897016 --> @dhiltgen commented on GitHub (May 6, 2025): Yes, thanks! The offending PID does not appear in the log at all which indicates the leak is during the very early startup of the runner before it finished initialization.
Author
Owner

@somera commented on GitHub (May 6, 2025):

I need to evaluate my script cause:

log_start_epoch=$((proc_start_epoch - 1800))

and the logs starts at 08:31:55.

<!-- gh-comment-id:2855205589 --> @somera commented on GitHub (May 6, 2025): I need to evaluate my script cause: ``` log_start_epoch=$((proc_start_epoch - 1800)) ``` and the logs starts at 08:31:55.
Author
Owner

@somera commented on GitHub (May 7, 2025):

@dhiltgen sounds good.

I'm curious if there's a reason why I only noticed this now (I noticed it back in March, but only recently). Or is there no explanation?

<!-- gh-comment-id:2859416456 --> @somera commented on GitHub (May 7, 2025): @dhiltgen sounds good. I'm curious if there's a reason why I only noticed this now (I noticed it back in March, but only recently). Or is there no explanation?
Author
Owner

@dhiltgen commented on GitHub (May 7, 2025):

Based on your logs, it seems the race condition is caused by lots of concurrent requests for a model where the context size varies, thus the model is being reloaded a lot, coupled with clients giving up and aborting the request before the model finishes loading.

<!-- gh-comment-id:2859622155 --> @dhiltgen commented on GitHub (May 7, 2025): Based on your logs, it seems the race condition is caused by lots of concurrent requests for a model where the context size varies, thus the model is being reloaded a lot, coupled with clients giving up and aborting the request before the model finishes loading.
Author
Owner

@somera commented on GitHub (May 12, 2025):

@dhiltgen when you will release the fix?

<!-- gh-comment-id:2871016607 --> @somera commented on GitHub (May 12, 2025): @dhiltgen when you will release the fix?
Author
Owner

@dhiltgen commented on GitHub (May 12, 2025):

@somera the next release should be out within a few days.

<!-- gh-comment-id:2873606032 --> @dhiltgen commented on GitHub (May 12, 2025): @somera the next release should be out within a few days.
Author
Owner

@somera commented on GitHub (May 19, 2025):

@dhiltgen I installed yesterday the v0.7.0 version. And I see the the problem:

$ ./ollama_detect_problems_v13.sh --debug
ℹ Debug mode enabled
ℹ Dry-run mode: false
ℹ Checking sudo credentials...
ℹ Ollama version: ollama version is 0.7.0

=== Ollama Process Status ===
Ollama version                     : ollama version is 0.7.0
Active models (ollama ps)          : 1
Runner processes detected          : 3

=== Detailed Process Information ===
[ollama ps output]
NAME               ID              SIZE     PROCESSOR          UNTIL
deepseek-r1:32b    38056bbcbb2d    22 GB    75%/25% CPU/GPU    4 minutes from now

[running processes]
   1051  181171    1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375
   1051  182420    1051 Mon May 19 14:27:14 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 33451
   1051  643757    1051 Mon May 19 15:09:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6150cb382311b69f09cc0f9a1b69fc029cbd742b66bb8ec531aa5ecf5c613e93 --ctx-size 4096 --batch-size 512 --n-gpu-layers 11 --threads 32 --parallel 1 --port 32955
      1    1051    1051 Sun May 18 10:57:12 2025 /usr/local/bin/ollama serve

🛑 PROBLEM DETECTED
Found 3 runner processes but only 1 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
   1051  181171    1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375

Do you want to restart Ollama service to clean up? (y/N) N
No action taken

Script completed

And when the running ollama process is finished, I see the zombie processes:

$ ./ollama_detect_problems_v13.sh --debug
ℹ Debug mode enabled
ℹ Dry-run mode: false
ℹ Checking sudo credentials...
ℹ Ollama version: ollama version is 0.7.0

=== Ollama Process Status ===
Ollama version                     : ollama version is 0.7.0
Active models (ollama ps)          : 0
Runner processes detected          : 2

=== Detailed Process Information ===
[ollama ps output]
NAME    ID    SIZE    PROCESSOR    UNTIL

[running processes]
   1051  181171    1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375
   1051  182420    1051 Mon May 19 14:27:14 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 33451
      1    1051    1051 Sun May 18 10:57:12 2025 /usr/local/bin/ollama serve

🛑 PROBLEM DETECTED
Found 2 runner processes but only 0 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
   1051  181171    1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375

Do you want to restart Ollama service to clean up? (y/N)

No action taken

Script completed
<!-- gh-comment-id:2891061088 --> @somera commented on GitHub (May 19, 2025): @dhiltgen I installed yesterday the v0.7.0 version. And I see the the problem: ``` $ ./ollama_detect_problems_v13.sh --debug ℹ Debug mode enabled ℹ Dry-run mode: false ℹ Checking sudo credentials... ℹ Ollama version: ollama version is 0.7.0 === Ollama Process Status === Ollama version : ollama version is 0.7.0 Active models (ollama ps) : 1 Runner processes detected : 3 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-r1:32b 38056bbcbb2d 22 GB 75%/25% CPU/GPU 4 minutes from now [running processes] 1051 181171 1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375 1051 182420 1051 Mon May 19 14:27:14 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 33451 1051 643757 1051 Mon May 19 15:09:37 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6150cb382311b69f09cc0f9a1b69fc029cbd742b66bb8ec531aa5ecf5c613e93 --ctx-size 4096 --batch-size 512 --n-gpu-layers 11 --threads 32 --parallel 1 --port 32955 1 1051 1051 Sun May 18 10:57:12 2025 /usr/local/bin/ollama serve 🛑 PROBLEM DETECTED Found 3 runner processes but only 1 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 1051 181171 1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375 Do you want to restart Ollama service to clean up? (y/N) N No action taken Script completed ``` And when the running ollama process is finished, I see the zombie processes: ``` $ ./ollama_detect_problems_v13.sh --debug ℹ Debug mode enabled ℹ Dry-run mode: false ℹ Checking sudo credentials... ℹ Ollama version: ollama version is 0.7.0 === Ollama Process Status === Ollama version : ollama version is 0.7.0 Active models (ollama ps) : 0 Runner processes detected : 2 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL [running processes] 1051 181171 1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375 1051 182420 1051 Mon May 19 14:27:14 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 33451 1 1051 1051 Sun May 18 10:57:12 2025 /usr/local/bin/ollama serve 🛑 PROBLEM DETECTED Found 2 runner processes but only 0 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 1051 181171 1051 Mon May 19 14:17:44 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 35375 Do you want to restart Ollama service to clean up? (y/N) No action taken Script completed ```
Author
Owner

@dhiltgen commented on GitHub (May 20, 2025):

Sorry to hear that. @somera please share a server log that overlaps with the time of these requests so I can see references to these PIDs and try to find why we're still leaking runners.

<!-- gh-comment-id:2892545147 --> @dhiltgen commented on GitHub (May 20, 2025): Sorry to hear that. @somera please share a server log that overlaps with the time of these requests so I can see references to these PIDs and try to find why we're still leaking runners.
Author
Owner

@somera commented on GitHub (May 20, 2025):

@dhiltgen here the log for yesterday.

ollama-2025-05-19_15-52-50.zip

And here for today:

$ ./ollama_detect_problems_v13.sh
ℹ Checking sudo credentials...
[sudo] password for xxx:

=== Ollama Process Status ===
Ollama version                     : ollama version is 0.7.0
Active models (ollama ps)          : 0
Runner processes detected          : 1

=== Detailed Process Information ===
[ollama ps output]
NAME    ID    SIZE    PROCESSOR    UNTIL

[running processes]
      1  828276  828276 Mon May 19 15:52:50 2025 /usr/local/bin/ollama serve
 828276  896339  828276 Tue May 20 12:46:56 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 37027

🛑 PROBLEM DETECTED
Found 1 runner processes but only 0 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
 828276  896339  828276 Tue May 20 12:46:56 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 37027

Do you want to restart Ollama service to clean up? (y/N) y
Calculated log start time (-30min): 2025-05-20 12:16:56

Restarting Ollama service...

Capturing logs from 2025-05-20 12:16:56...

✓ Service successfully restarted
Logs saved to: ollama-2025-05-20_14-28-24.log.gz

View logs now? (y/N)


Keep log file? (Y/n)


Script completed

ollama-2025-05-20_14-28-24.zip

<!-- gh-comment-id:2894244473 --> @somera commented on GitHub (May 20, 2025): @dhiltgen here the log for yesterday. [ollama-2025-05-19_15-52-50.zip](https://github.com/user-attachments/files/20347705/ollama-2025-05-19_15-52-50.zip) And here for today: ``` $ ./ollama_detect_problems_v13.sh ℹ Checking sudo credentials... [sudo] password for xxx: === Ollama Process Status === Ollama version : ollama version is 0.7.0 Active models (ollama ps) : 0 Runner processes detected : 1 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL [running processes] 1 828276 828276 Mon May 19 15:52:50 2025 /usr/local/bin/ollama serve 828276 896339 828276 Tue May 20 12:46:56 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 37027 🛑 PROBLEM DETECTED Found 1 runner processes but only 0 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 828276 896339 828276 Tue May 20 12:46:56 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 37027 Do you want to restart Ollama service to clean up? (y/N) y Calculated log start time (-30min): 2025-05-20 12:16:56 Restarting Ollama service... Capturing logs from 2025-05-20 12:16:56... ✓ Service successfully restarted Logs saved to: ollama-2025-05-20_14-28-24.log.gz View logs now? (y/N) Keep log file? (Y/n) Script completed ``` [ollama-2025-05-20_14-28-24.zip](https://github.com/user-attachments/files/20347707/ollama-2025-05-20_14-28-24.zip)
Author
Owner

@somera commented on GitHub (May 20, 2025):

@dhiltgen I hope this is helpful, cause I don't run Ollama at the moment with OLLAMA_DEBUG=1. Cause it was too much.

<!-- gh-comment-id:2894296382 --> @somera commented on GitHub (May 20, 2025): @dhiltgen I hope this is helpful, cause I don't run Ollama at the moment with OLLAMA_DEBUG=1. Cause it was too much.
Author
Owner

@dhiltgen commented on GitHub (May 20, 2025):

Unfortunately it doesn't look like those PIDs are showing up in the non-debug logs. If you're still seeing it on a somewhat regular basis, please try running with debug enabled for a bit so we can try to capture the failure case details.

<!-- gh-comment-id:2895184277 --> @dhiltgen commented on GitHub (May 20, 2025): Unfortunately it doesn't look like those PIDs are showing up in the non-debug logs. If you're still seeing it on a somewhat regular basis, please try running with debug enabled for a bit so we can try to capture the failure case details.
Author
Owner

@somera commented on GitHub (May 20, 2025):

ok, I need than OLLAMA_DEBUG=1? I will set it tomorrow.

<!-- gh-comment-id:2895211679 --> @somera commented on GitHub (May 20, 2025): ok, I need than OLLAMA_DEBUG=1? I will set it tomorrow.
Author
Owner

@somera commented on GitHub (May 21, 2025):

@dhiltgen next try. Tell me then if it's ok, cause I will disable OLLAMA_DEBUG.

$ ./ollama_detect_problems_v13.sh
ℹ Checking sudo credentials...

=== Ollama Process Status ===
Ollama version                     : ollama version is 0.7.0
Active models (ollama ps)          : 1
Runner processes detected          : 3

=== Detailed Process Information ===
[ollama ps output]
NAME                              ID              SIZE     PROCESSOR          UNTIL
deepseek-coder-v2-fixed:latest    18245823b634    17 GB    68%/32% CPU/GPU    29 minutes from now

[running processes]
      1 1016422 1016422 Wed May 21 08:34:36 2025 /usr/local/bin/ollama serve
1016422 1027376 1016422 Wed May 21 13:37:53 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34493
1016422 1029209 1016422 Wed May 21 14:14:21 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 37881
1016422 1162059 1016422 Wed May 21 14:20:31 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 6 --threads 32 --parallel 1 --port 33863

🛑 PROBLEM DETECTED
Found 3 runner processes but only 1 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
1016422 1027376 1016422 Wed May 21 13:37:53 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34493

Do you want to restart Ollama service to clean up? (y/N) y
Calculated log start time (-30min): 2025-05-21 13:07:53

Restarting Ollama service...

Capturing logs from 2025-05-21 13:07:53...

✓ Service successfully restarted
Logs saved to: ollama-2025-05-21_14-21-47.log.gz

View logs now? (y/N)


Keep log file? (Y/n)


Script completed

And here the log.

ollama-2025-05-21_14-21-47.zip

<!-- gh-comment-id:2897783091 --> @somera commented on GitHub (May 21, 2025): @dhiltgen next try. Tell me then if it's ok, cause I will disable OLLAMA_DEBUG. ``` $ ./ollama_detect_problems_v13.sh ℹ Checking sudo credentials... === Ollama Process Status === Ollama version : ollama version is 0.7.0 Active models (ollama ps) : 1 Runner processes detected : 3 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2-fixed:latest 18245823b634 17 GB 68%/32% CPU/GPU 29 minutes from now [running processes] 1 1016422 1016422 Wed May 21 08:34:36 2025 /usr/local/bin/ollama serve 1016422 1027376 1016422 Wed May 21 13:37:53 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34493 1016422 1029209 1016422 Wed May 21 14:14:21 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 37881 1016422 1162059 1016422 Wed May 21 14:20:31 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 6 --threads 32 --parallel 1 --port 33863 🛑 PROBLEM DETECTED Found 3 runner processes but only 1 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 1016422 1027376 1016422 Wed May 21 13:37:53 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34493 Do you want to restart Ollama service to clean up? (y/N) y Calculated log start time (-30min): 2025-05-21 13:07:53 Restarting Ollama service... Capturing logs from 2025-05-21 13:07:53... ✓ Service successfully restarted Logs saved to: ollama-2025-05-21_14-21-47.log.gz View logs now? (y/N) Keep log file? (Y/n) Script completed ``` And here the log. [ollama-2025-05-21_14-21-47.zip](https://github.com/user-attachments/files/20368003/ollama-2025-05-21_14-21-47.zip)
Author
Owner

@dhiltgen commented on GitHub (May 21, 2025):

Yes, the PIDs are in there - go ahead and remove debug logging while I analyze the log.

<!-- gh-comment-id:2898244402 --> @dhiltgen commented on GitHub (May 21, 2025): Yes, the PIDs are in there - go ahead and remove debug logging while I analyze the log.
Author
Owner

@somera commented on GitHub (May 21, 2025):

@dhiltgen I have new one, if ...

$ ./ollama_detect_problems_v13.sh --debug
ℹ Debug mode enabled
ℹ Dry-run mode: false
ℹ Checking sudo credentials...
ℹ Ollama version: ollama version is 0.7.0

=== Ollama Process Status ===
Ollama version                     : ollama version is 0.7.0
Active models (ollama ps)          : 1
Runner processes detected          : 2

=== Detailed Process Information ===
[ollama ps output]
NAME                              ID              SIZE     PROCESSOR    UNTIL
deepseek-coder-v2-fixed:latest    18245823b634    17 GB    100% GPU     15 minutes from now

[running processes]
      1 1176728 1176728 Wed May 21 14:21:48 2025 /usr/local/bin/ollama serve
1176728 1177075 1176728 Wed May 21 14:23:34 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3      c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34231
1176728 1193547 1176728 Wed May 21 16:52:40 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3      c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 42989

🛑 PROBLEM DETECTED
Found 2 runner processes but only 1 active models!
This indicates unloaded models still occupying resources.

Oldest runner process (zombie candidate):
1176728 1177075 1176728 Wed May 21 14:23:34 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3      c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34231

Do you want to restart Ollama service to clean up? (y/N) y
ℹ Raw date extracted: 'Wed May 21 14:23:34 2025'
ℹ Calculated exact log start time: 2025-05-21 13:53:34
Calculated log start time (-30min): 2025-05-21 13:53:34

Restarting Ollama service...
ℹ Actually performing service restart

Capturing logs from 2025-05-21 13:53:34...

✓ Service successfully restarted
Logs saved to: ollama-2025-05-21_17-07-39.log.gz

View logs now? (y/N)


Keep log file? (Y/n)


Script completed

And the logs.

ollama-2025-05-21_17-07-39.zip

<!-- gh-comment-id:2898364294 --> @somera commented on GitHub (May 21, 2025): @dhiltgen I have new one, if ... ``` $ ./ollama_detect_problems_v13.sh --debug ℹ Debug mode enabled ℹ Dry-run mode: false ℹ Checking sudo credentials... ℹ Ollama version: ollama version is 0.7.0 === Ollama Process Status === Ollama version : ollama version is 0.7.0 Active models (ollama ps) : 1 Runner processes detected : 2 === Detailed Process Information === [ollama ps output] NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2-fixed:latest 18245823b634 17 GB 100% GPU 15 minutes from now [running processes] 1 1176728 1176728 Wed May 21 14:21:48 2025 /usr/local/bin/ollama serve 1176728 1177075 1176728 Wed May 21 14:23:34 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3 c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34231 1176728 1193547 1176728 Wed May 21 16:52:40 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3 c1fb181b2eef2e948689d55d046 --ctx-size 24576 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 1 --port 42989 🛑 PROBLEM DETECTED Found 2 runner processes but only 1 active models! This indicates unloaded models still occupying resources. Oldest runner process (zombie candidate): 1176728 1177075 1176728 Wed May 21 14:23:34 2025 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3 c1fb181b2eef2e948689d55d046 --ctx-size 49152 --batch-size 512 --n-gpu-layers 28 --threads 32 --parallel 2 --port 34231 Do you want to restart Ollama service to clean up? (y/N) y ℹ Raw date extracted: 'Wed May 21 14:23:34 2025' ℹ Calculated exact log start time: 2025-05-21 13:53:34 Calculated log start time (-30min): 2025-05-21 13:53:34 Restarting Ollama service... ℹ Actually performing service restart Capturing logs from 2025-05-21 13:53:34... ✓ Service successfully restarted Logs saved to: ollama-2025-05-21_17-07-39.log.gz View logs now? (y/N) Keep log file? (Y/n) Script completed ``` And the logs. [ollama-2025-05-21_17-07-39.zip](https://github.com/user-attachments/files/20370812/ollama-2025-05-21_17-07-39.zip)
Author
Owner

@somera commented on GitHub (May 22, 2025):

@dhiltgen will the fix be a part of v0.7.1 or v0.7.2?

<!-- gh-comment-id:2902632751 --> @somera commented on GitHub (May 22, 2025): @dhiltgen will the fix be a part of v0.7.1 or v0.7.2?
Author
Owner

@dhiltgen commented on GitHub (May 23, 2025):

The fix will be in v0.7.1

<!-- gh-comment-id:2905000803 --> @dhiltgen commented on GitHub (May 23, 2025): The fix will be in v0.7.1
Author
Owner

@somera commented on GitHub (May 27, 2025):

@dhiltgen v0.7.1 is now running for ~2 days and it looks good. Thx.

<!-- gh-comment-id:2912239813 --> @somera commented on GitHub (May 27, 2025): @dhiltgen v0.7.1 is now running for ~2 days and it looks good. Thx.
Author
Owner

@somera commented on GitHub (Jun 5, 2025):

I haven't seen the problem since the update. Thanks.

<!-- gh-comment-id:2943622097 --> @somera commented on GitHub (Jun 5, 2025): I haven't seen the problem since the update. Thanks.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53370