[GH-ISSUE #1901] "api/generate" stalls after some queries #63130

Closed
opened 2026-05-03 12:15:42 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @oderwat on GitHub (Jan 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1901

I have a strange phenomenon and can't get rid of it without a workaround:

When I call "api/generate" with the same model regularly every some seconds (5s-15s) the API suddenly stops responding after 15-20 calls (which seems to depend on the model size?).

This is reproducible with different models and with both: A WSL2 based server and my iMac based server (I could try it with an M1 Air too but didn't so far). When I run it on the iMac I have high CPU consumption while the API does not return the call. See this CPU display (it shows some of the last working queries until it freezes and does not reply):

Snipaste_2024-01-10_13-51-59

When switching models for the generation or just create an embedding (using the endpoint) with a tiny model and an empty prompt in between, it does work endlessly with the same prompts and code.

I am using current main and also tried to go back some commits, but it seems that this also happens with older commits.

Is there anything I can do to get more information to find out what the problem may be?

Specialities: I use OLLAMA_HOST=0.0.0.0:11434 OLLAMA_ORIGINS="*" on the server and call the API from JavaScript (actually WASM) using the fetch API. I did not try it with another type of HTTP client yet (and can't for this special applications use case).

Originally created by @oderwat on GitHub (Jan 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1901 I have a strange phenomenon and can't get rid of it without a workaround: When I call "api/generate" with the same model regularly every some seconds (5s-15s) the API suddenly stops responding after 15-20 calls (which seems to depend on the model size?). This is reproducible with different models and with both: A WSL2 based server and my iMac based server (I could try it with an M1 Air too but didn't so far). When I run it on the iMac I have high CPU consumption while the API does not return the call. See this CPU display (it shows some of the last working queries until it freezes and does not reply): ![Snipaste_2024-01-10_13-51-59](https://github.com/jmorganca/ollama/assets/719156/f43bdac7-b162-446b-bbb1-77a757c2ec5a) When switching models for the generation or just create an embedding (using the endpoint) with a tiny model and an empty prompt in between, it does work endlessly with the same prompts and code. I am using current main and also tried to go back some commits, but it seems that this also happens with older commits. Is there anything I can do to get more information to find out what the problem may be? Specialities: I use `OLLAMA_HOST=0.0.0.0:11434 OLLAMA_ORIGINS="*"` on the server and call the API from JavaScript (actually WASM) using the fetch API. I did not try it with another type of HTTP client yet (and can't for this special applications use case).
GiteaMirror added the performancebug labels 2026-05-03 12:15:42 -05:00
Author
Owner

@igorschlum commented on GitHub (Jan 10, 2024):

Hi @oderwat Could you tell if you are using 0.1.19?
Thanks

<!-- gh-comment-id:1885178871 --> @igorschlum commented on GitHub (Jan 10, 2024): Hi @oderwat Could you tell if you are using 0.1.19? Thanks
Author
Owner

@oderwat commented on GitHub (Jan 10, 2024):

@igorschlum I am a Go developer and use the current main branch (34344d801c). I am out of the office soon, but I can verify the behavior with a release version later tonight.

Edit: This is the v0.1.19 release commit. But I will check with a binary later to make sure it is the same with that too.

<!-- gh-comment-id:1885191642 --> @oderwat commented on GitHub (Jan 10, 2024): @igorschlum I am a Go developer and use the current main branch (34344d801ccb2ea1a9a25bbc69576fc9f82211ae). I am out of the office soon, but I can verify the behavior with a release version later tonight. Edit: This is the v0.1.19 release commit. But I will check with a binary later to make sure it is the same with that too.
Author
Owner

@IAMBUDE commented on GitHub (Jan 10, 2024):

Might be related to #1863

<!-- gh-comment-id:1885301989 --> @IAMBUDE commented on GitHub (Jan 10, 2024): Might be related to #1863
Author
Owner

@oderwat commented on GitHub (Jan 10, 2024):

@IAMBUDE Yes

I can confirm that installing v0.1.17 gets rid of my problem with hanging queries. It also seems like the generations are faster on my WSL2 machine with RTX 3090 (0.8s-1.5s vs 1.5s-3.5s). I need to double-check that though.

<!-- gh-comment-id:1885574677 --> @oderwat commented on GitHub (Jan 10, 2024): @IAMBUDE Yes I can confirm that installing v0.1.17 gets rid of my problem with hanging queries. It also seems like the generations are faster on my WSL2 machine with RTX 3090 (0.8s-1.5s vs 1.5s-3.5s). I need to double-check that though.
Author
Owner

@pdevine commented on GitHub (Mar 13, 2024):

Going to go ahead and close the issue.

<!-- gh-comment-id:1996108705 --> @pdevine commented on GitHub (Mar 13, 2024): Going to go ahead and close the issue.
Author
Owner

@igorschlum commented on GitHub (Mar 14, 2024):

@oderwat it would be appreciated if you could confirm whether the issue as been resolved with the current build. If not, please reopen the issue and provide more details to facilitate replication of the issue.
Best,
Igor

<!-- gh-comment-id:1996585934 --> @igorschlum commented on GitHub (Mar 14, 2024): @oderwat it would be appreciated if you could confirm whether the issue as been resolved with the current build. If not, please reopen the issue and provide more details to facilitate replication of the issue. Best, Igor
Author
Owner

@oderwat commented on GitHub (Mar 14, 2024):

@igorschlum I did not run into this with current versions anymore.

<!-- gh-comment-id:1997296275 --> @oderwat commented on GitHub (Mar 14, 2024): @igorschlum I did not run into this with current versions anymore.
Author
Owner

@igorschlum commented on GitHub (Mar 14, 2024):

OK, thanks.

<!-- gh-comment-id:1997402571 --> @igorschlum commented on GitHub (Mar 14, 2024): OK, thanks.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63130