[GH-ISSUE #2792] Subsequent generation requests hang after successful generation request with num_predict: 0 #48198

Closed
opened 2026-04-28 07:08:14 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @stanier on GitHub (Feb 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2792

If you submit a generation request with num_predict: 0, the request will be handled successfully but all subsequent generation requests will hang indefinitely regardless of their num_predict values.

Below is an example of how to reproduce this behavior:

curl http://localhost:11434/api/generate -d '{
    "model": "vicuna:13b-16k",
    "template":"Hello world!",
    "stream":false,
    "num_predict":0
}'

Any requests following this will not return until the daemon has been restarted.

Also to note is that the daemon might be in an unresponsive state following this request, I've had to kill it each time it's happened to me so far, but haven't tested this against typical behavior for signals sent mid-generation or otherwise.

I think it also affects raw mode, and neither num_keep or num_ctx seem to be related.

Originally created by @stanier on GitHub (Feb 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2792 If you submit a generation request with `num_predict: 0`, the request will be handled successfully but all subsequent generation requests will hang indefinitely regardless of their `num_predict` values. Below is an example of how to reproduce this behavior: ``` curl http://localhost:11434/api/generate -d '{ "model": "vicuna:13b-16k", "template":"Hello world!", "stream":false, "num_predict":0 }' ``` Any requests following this will not return until the daemon has been restarted. Also to note is that the daemon might be in an unresponsive state following this request, I've had to `kill` it each time it's happened to me so far, but haven't tested this against typical behavior for signals sent mid-generation or otherwise. I think it also affects `raw` mode, and neither `num_keep` or `num_ctx` seem to be related.
Author
Owner

@jmorganca commented on GitHub (May 10, 2024):

This should be fixed now. However, setting template might mean the model isn't using it's correct prompt template which could lead to infinitely long generations (which will stop automatically after ~10k tokens currently, and less in the future)

<!-- gh-comment-id:2103676275 --> @jmorganca commented on GitHub (May 10, 2024): This should be fixed now. However, setting `template` might mean the model isn't using it's correct prompt template which could lead to infinitely long generations (which will stop automatically after ~10k tokens currently, and less in the future)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48198