[GH-ISSUE #11636] Retries with same text hang continually without modification #33449

Closed
opened 2026-04-22 16:08:06 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @leonletto on GitHub (Aug 1, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11636

What is the issue?

I'm not sure what category of bug this is but I have been fighting it for a long time ( months ) and finally worked around it today.

Steps to recreate:
1 You send an /api/generate POST call and the model hangs during inference causing a timeout ( for whatever reason ) .
2 you send the exact same call retrying, but inference hangs again
3 repeat ...

I think its something about the caching in the internal state causing it to repeat and hang in the same place but thats just me thinking out loud.

In the past I was restarting my whole chain and sometimes ollama to fix this but I got it working today.

If you send an /api/generate POST call and the model hangs during inference causing a timeout, before sending the retry you need to modify the original in a small way so that Ollama and/or the model think its a brand new call. Then it will recompute everything and you are good to go.

I guess this is a bug report and a workaround in one :)

Leon

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @leonletto on GitHub (Aug 1, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11636 ### What is the issue? I'm not sure what category of bug this is but I have been fighting it for a long time ( months ) and finally worked around it today. Steps to recreate: 1 You send an /api/generate POST call and the model hangs during inference causing a timeout ( for whatever reason ) . 2 you send the exact same call retrying, but inference hangs again 3 repeat ... I think its something about the caching in the internal state causing it to repeat and hang in the same place but thats just me thinking out loud. In the past I was restarting my whole chain and sometimes ollama to fix this but I got it working today. If you send an /api/generate POST call and the model hangs during inference causing a timeout, before sending the retry you need to modify the original in a small way so that Ollama and/or the model think its a brand new call. Then it will recompute everything and you are good to go. I guess this is a bug report and a workaround in one :) Leon ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-22 16:08:06 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 2, 2025):

Example text?

<!-- gh-comment-id:3146048606 --> @rick-github commented on GitHub (Aug 2, 2025): Example text?
Author
Owner

@leonletto commented on GitHub (Aug 2, 2025):

I did a whole bunch of testing today to recreate the bug and I was able to do that. Ir turns out that if you reuse sessions over many calls ( as you would think would be efficient ), and because I am sending large prompts, I am getting HTTP session state corruption.

So my solution solved it even though I did not know why before. Anyway - looks like not a bug in Ollama but in plain old HTTP.

The end result - if you are in production and sending many prompts, don't reuse sessions. The time to setup a new http session is so small compared to inference time its not worth the chance of errors.

<!-- gh-comment-id:3146846359 --> @leonletto commented on GitHub (Aug 2, 2025): I did a whole bunch of testing today to recreate the bug and I was able to do that. Ir turns out that if you reuse sessions over many calls ( as you would think would be efficient ), and because I am sending large prompts, I am getting HTTP session state corruption. So my solution solved it even though I did not know why before. Anyway - looks like not a bug in Ollama but in plain old HTTP. The end result - if you are in production and sending many prompts, don't reuse sessions. The time to setup a new http session is so small compared to inference time its not worth the chance of errors.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33449