[GH-ISSUE #6904] Option to know number of running request in ollama #30128

Closed
opened 2026-04-22 09:36:06 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @Jegatheesh001 on GitHub (Sep 21, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6904

Option to know number of running request in ollama

Originally created by @Jegatheesh001 on GitHub (Sep 21, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6904 Option to know number of running request in ollama
GiteaMirror added the feature request label 2026-04-22 09:36:06 -05:00
Author
Owner

@byarbrough commented on GitHub (Sep 21, 2024):

@Jegatheesh001 please provide more detail on why you want this information. What problem are you trying to solve and how do you envision accessing "the number of requests running?"

<!-- gh-comment-id:2365361617 --> @byarbrough commented on GitHub (Sep 21, 2024): @Jegatheesh001 please provide more detail on why you want this information. What problem are you trying to solve and how do you envision accessing "the number of requests running?"
Author
Owner

@Gomez12 commented on GitHub (Sep 22, 2024):

For me I have wondered about this as well.
I have ollama setup so that it should run 8 in parallel, I have a client-program setup so that it does 8 parallel requests.

But the client also does things to the responses which also takes time, although the llm takes most of the time I am still left wondering if Ollama is really answering 8 requests, or only 6 because there are always 2 client-threads doing other things...

And with 1 client I could collect the info from the client but this becomes more complicated if you have multiple clients etc.

Basically I would like to know how efficient I am using the llm.

<!-- gh-comment-id:2366463298 --> @Gomez12 commented on GitHub (Sep 22, 2024): For me I have wondered about this as well. I have ollama setup so that it should run 8 in parallel, I have a client-program setup so that it does 8 parallel requests. But the client also does things to the responses which also takes time, although the llm takes most of the time I am still left wondering if Ollama is really answering 8 requests, or only 6 because there are always 2 client-threads doing other things... And with 1 client I could collect the info from the client but this becomes more complicated if you have multiple clients etc. Basically I would like to know how efficient I am using the llm.
Author
Owner

@jessegross commented on GitHub (Sep 25, 2024):

I believe that this is the same idea as in #2004

Assuming so, let's track it there and add any additional information.

<!-- gh-comment-id:2372621106 --> @jessegross commented on GitHub (Sep 25, 2024): I believe that this is the same idea as in #2004 Assuming so, let's track it there and add any additional information.
Author
Owner

@Quadav commented on GitHub (Sep 16, 2025):

I believe that this is the same idea as in #2004

Assuming so, let's track it there and add any additional information.

I believe that these are 2 different things - amount of requests in concurrency that are running now, and the amount of requests that are waiting in the queue

<!-- gh-comment-id:3298039689 --> @Quadav commented on GitHub (Sep 16, 2025): > I believe that this is the same idea as in [#2004](https://github.com/ollama/ollama/issues/2004) > > Assuming so, let's track it there and add any additional information. I believe that these are 2 different things - amount of requests in concurrency that are running now, and the amount of requests that are waiting in the queue
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30128