[GH-ISSUE #9617] Stopping misbehaving model after some amount of time #32034

Closed
opened 2026-04-22 12:54:31 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @ckuethe on GitHub (Mar 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9617

Combination bug and feature request, I think.

On multiple occasions a model (usually starcoder2) seems to get stuck, even with simple tasks like list all the numbers from 1 to 250, then exit. - this exact instruction has caused starcoderv2 to spin on my machine for the last 16 hours. Using ollama ps I see that it's trying to stop (starcoder2:latest 9f4ae0aff61e 3.0 GB 100% GPU Stopping...) but this too has been waiting for the last 16 hours.

I'd like to see a maximum runtime option - either per model or global - added so that I can stop a model after some amount of time when it's clear that my request is not going to complete, as well as having a flag to ollama stop to just stop the misbehaving model right now, like pkill -9 -f "ollama runner" would do.

Right now it looks like stop sends an empty generate request with a timeout of 0 which seems to allow the model to gracefully unload. We're past that now - I want ollama to immediately terminate that runner.

  • ollama 0.5.12 (w/ ROCm)
  • Ubuntu 24.04.2 LTS
  • Kernel 6.8.0-54-generic x86_64
  • AMD Ryzen 7 2700X Eight-Core Processor
  • ROCm 6.3.3.60303-74~24.04
  • Radeon RX 7900 XTX, gfx1100
Mar 02 14:25:34 ryzen ollama[1815]: time=2025-03-02T14:25:34.971-08:00 level=INFO source=amd_linux.go:386 msg="amdgpu is supported" gpu=GPU-d8607413b0e3a90b gpu_type=gfx1100
Mar 02 14:25:34 ryzen ollama[1815]: time=2025-03-02T14:25:34.994-08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-d8607413b0e3a90b library=rocm variant="" compute=gfx1100 driver=6.10 name=1002:744c total="20.0 GiB" available="19.9 GiB"
Originally created by @ckuethe on GitHub (Mar 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9617 Combination bug and feature request, I think. On multiple occasions a model (usually starcoder2) seems to get stuck, even with simple tasks like `list all the numbers from 1 to 250, then exit.` - this exact instruction has caused starcoderv2 to spin on my machine for the last 16 hours. Using `ollama ps` I see that it's trying to stop (`starcoder2:latest 9f4ae0aff61e 3.0 GB 100% GPU Stopping...`) but this too has been waiting for the last 16 hours. I'd like to see a maximum runtime option - either per model or global - added so that I can stop a model after some amount of time when it's clear that my request is not going to complete, as well as having a flag to `ollama stop` to just stop the misbehaving model right now, like `pkill -9 -f "ollama runner"` would do. Right now it looks like `stop` sends an empty generate request with a timeout of 0 which seems to allow the model to gracefully unload. We're past that now - I want ollama to immediately terminate that runner. * ollama 0.5.12 (w/ ROCm) * Ubuntu 24.04.2 LTS * Kernel 6.8.0-54-generic x86_64 * `AMD Ryzen 7 2700X Eight-Core Processor` * ROCm 6.3.3.60303-74~24.04 * Radeon RX 7900 XTX, gfx1100 ``` Mar 02 14:25:34 ryzen ollama[1815]: time=2025-03-02T14:25:34.971-08:00 level=INFO source=amd_linux.go:386 msg="amdgpu is supported" gpu=GPU-d8607413b0e3a90b gpu_type=gfx1100 Mar 02 14:25:34 ryzen ollama[1815]: time=2025-03-02T14:25:34.994-08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-d8607413b0e3a90b library=rocm variant="" compute=gfx1100 driver=6.10 name=1002:744c total="20.0 GiB" available="19.9 GiB" ```
GiteaMirror added the needs more info label 2026-04-22 12:54:31 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 10, 2025):

num_predict

<!-- gh-comment-id:2710126306 --> @rick-github commented on GitHub (Mar 10, 2025): [`num_predict`](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values:~:text=stop%20%22AI%20assistant%3A%22-,num_predict,-Maximum%20number%20of)
Author
Owner

@ckuethe commented on GitHub (Mar 10, 2025):

Thanks @rick-github - that's close, but I'm asking about a limit on time rather than tokens.

I'm having a lovely chat with deepseek-r1 right now, swapping big essays, it's being responsive. There's no need to kill that runner. It's the models that get themselves stuck and unresponsive that I'd like to be able to force quit.

I'll see if I can't get some logs of WTF is going on while starcoder seems stuck.

<!-- gh-comment-id:2711434376 --> @ckuethe commented on GitHub (Mar 10, 2025): Thanks @rick-github - that's close, but I'm asking about a limit on time rather than tokens. I'm having a lovely chat with deepseek-r1 right now, swapping big essays, it's being responsive. There's no need to kill that runner. It's the models that get themselves stuck and unresponsive that I'd like to be able to force quit. I'll see if I can't get some logs of WTF is going on while starcoder seems stuck.
Author
Owner

@rick-github commented on GitHub (Mar 10, 2025):

starcoder is likely losing coherence and generating tokens without hitting an end-of-sequence token. If you enable debugging with OLLAMA_DEBUG=1 in the server environment you will likely see a bunch of shifting log lines.

<!-- gh-comment-id:2711501406 --> @rick-github commented on GitHub (Mar 10, 2025): starcoder is likely losing coherence and generating tokens without hitting an end-of-sequence token. If you enable debugging with `OLLAMA_DEBUG=1` in the server environment you will likely see a bunch of `shifting` log lines.
Author
Owner

@dhiltgen commented on GitHub (Apr 9, 2025):

There seems to be a race somewhere in the scheduler under heavy load, possibly related to clients closing connections prematurely. If people are still seeing models get stuck in a "Stopping..." state in the ollama ps output and the model never actually unloads, please try running the server with OLLAMA_DEBUG=1 and share the logs including the model load, and eventual stuck state.

<!-- gh-comment-id:2791069025 --> @dhiltgen commented on GitHub (Apr 9, 2025): There seems to be a race somewhere in the scheduler under heavy load, possibly related to clients closing connections prematurely. If people are still seeing models get stuck in a "Stopping..." state in the `ollama ps` output and the model never actually unloads, please try running the server with OLLAMA_DEBUG=1 and share the logs including the model load, and eventual stuck state.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32034