[GH-ISSUE #6670] expose slots data through API #66234

Closed
opened 2026-05-04 01:06:10 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @commit4ever on GitHub (Sep 6, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6670

hI

Can the information that can be seen in the logs be exposed through /slots api per server/port ? We need this to manage queuing in our load balancer. This has been exposed by llama cpp already. https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-slots-returns-the-current-slots-processing-state

DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="140031137009664" timestamp=1725608401
DEBUG [process_single_task] slot data | n_idle_slots=3 n_processing_slots=0 task_id=0 tid="140031137009664" timestamp=1725608401

Many thanks!

Originally created by @commit4ever on GitHub (Sep 6, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6670 hI Can the information that can be seen in the logs be exposed through /slots api per server/port ? We need this to manage queuing in our load balancer. This has been exposed by llama cpp already. https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-slots-returns-the-current-slots-processing-state DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="140031137009664" timestamp=1725608401 DEBUG [process_single_task] slot data | n_idle_slots=3 n_processing_slots=0 task_id=0 tid="140031137009664" timestamp=1725608401 Many thanks!
GiteaMirror added the feature request label 2026-05-04 01:06:11 -05:00
Author
Owner

@dhiltgen commented on GitHub (Sep 6, 2024):

We're aiming to switch to a new server implementation written in go, which wont use the same slot construct as server.cpp. That said, this feels like a variation on #2004 which we should be able to start looking at once we've completed the transition.

<!-- gh-comment-id:2334331571 --> @dhiltgen commented on GitHub (Sep 6, 2024): We're aiming to switch to a [new server implementation written in go](https://github.com/ollama/ollama/pull/5034), which wont use the same slot construct as server.cpp. That said, this feels like a variation on #2004 which we should be able to start looking at once we've completed the transition.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66234