[GH-ISSUE #3225] Setting up HTTP Server Timeouts / Connection Management #64024

Closed
opened 2026-05-03 15:53:47 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @maxwell-bland on GitHub (Mar 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3225

Hi, ollama is great, but when spawning mulltiple curl requests with a lot of context, e.g. using ollama's server for nvim's autocomplete, starting and killing multiple small curl requests, such as

curl http://localhost:11434/api/generate -d '{
"model": "codellama:7b-code",
"prompt": "Why is the sky blue?"
}'

Bogs down the server and eventually kills it (either that, or https://github.com/tzachar/cmp-ai is sending bad characters, or should honestly be rewritten to use some sort of direct pipe-based ollama run command). Where the prompt is more significant on a slow machine, the server becomes totally unusable.

6ad414f31e/server/routes.go (L1120)

Might need to get worked out so that it has optional timeouts, keepalive limits, something. I guess what I'm saying here is because the API/webserver route is offered, we must expect some devs to use that instead of classical UNIX I/O )-: , since maybe it is more portable, and should have a couple of flags for managing/killing requests to the HTTP server.

I don't have a ton of time to work on this right now, but would be interested in thoughts or help adding flags so that projects like cmp-ai can trudge onward.

Originally created by @maxwell-bland on GitHub (Mar 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3225 Hi, ollama is great, but when spawning mulltiple curl requests with a lot of context, e.g. using ollama's server for nvim's autocomplete, starting and killing multiple small curl requests, such as curl http://localhost:11434/api/generate -d '{ "model": "codellama:7b-code", "prompt": "Why is the sky blue?" }' Bogs down the server and eventually kills it (either that, or https://github.com/tzachar/cmp-ai is sending bad characters, or should honestly be rewritten to use some sort of direct pipe-based `ollama run` command). Where the prompt is more significant on a slow machine, the server becomes totally unusable. https://github.com/ollama/ollama/blob/6ad414f31e5f35b823b51cb08ce4fddb9ff07aac/server/routes.go#L1120 Might need to get worked out so that it has optional timeouts, keepalive limits, something. I guess what I'm saying here is because the API/webserver route is offered, we must expect some devs to use that instead of classical UNIX I/O )-: , since maybe it is more portable, and should have a couple of flags for managing/killing requests to the HTTP server. I don't have a ton of time to work on this right now, but would be interested in thoughts or help adding flags so that projects like cmp-ai can trudge onward.
GiteaMirror added the feature requestneeds more info labels 2026-05-03 15:53:58 -05:00
Author
Owner

@dhiltgen commented on GitHub (Nov 6, 2024):

With the updates we've made to the scheduling algorithm, is this still problematic on the latest version?

<!-- gh-comment-id:2460427518 --> @dhiltgen commented on GitHub (Nov 6, 2024): With the updates we've made to the scheduling algorithm, is this still problematic on the latest version?
Author
Owner

@pdevine commented on GitHub (Jan 12, 2025):

I'll close the issue, but we can reopen if it's still a problem.

<!-- gh-comment-id:2585500069 --> @pdevine commented on GitHub (Jan 12, 2025): I'll close the issue, but we can reopen if it's still a problem.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64024