[GH-ISSUE #4448] Streaming Chat Completion via OpenAI API should support stream option to include Usage #64816

Closed
opened 2026-05-03 18:53:05 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @odrobnik on GitHub (May 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4448

In streaming mode the OpenAI chat completion has a new parameter to include Usage information after the Chunks. You just add a { "include_usage": true } to the request.

Then the final chunks will look like this:

...
data: {"id":"chatcmpl-9P4UJf7DEdyXVro2VOMRMT9qKR0bC","object":"chat.completion.chunk","created":1715762479,"model":"gpt-3.5-turbo-0125","system_fingerprint":null,"choices":[{"index":1,"delta":{},"logprobs":null,"finish_reason":"stop"}],"usage":null}
data: {"id":"chatcmpl-9P4UJf7DEdyXVro2VOMRMT9qKR0bC","object":"chat.completion.chunk","created":1715762479,"model":"gpt-3.5-turbo-0125","system_fingerprint":null,"choices":[{"index":2,"delta":{},"logprobs":null,"finish_reason":"stop"}],"usage":null}
data: {"id":"chatcmpl-9P4UJf7DEdyXVro2VOMRMT9qKR0bC","object":"chat.completion.chunk","created":1715762479,"model":"gpt-3.5-turbo-0125","system_fingerprint":null,"choices":[],"usage":{"prompt_tokens":24,"completion_tokens":58,"total_tokens":82}}
data: [DONE]

The final chunk contains no choices, but a usage:

"usage":{"prompt_tokens":24,"completion_tokens":58,"total_tokens":82}

This usage is over all the generations from this stream.

Originally created by @odrobnik on GitHub (May 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4448 In streaming mode the OpenAI chat completion has a new parameter to include Usage information after the Chunks. You just add a `{ "include_usage": true }` to the request. Then the final chunks will look like this: ``` ... data: {"id":"chatcmpl-9P4UJf7DEdyXVro2VOMRMT9qKR0bC","object":"chat.completion.chunk","created":1715762479,"model":"gpt-3.5-turbo-0125","system_fingerprint":null,"choices":[{"index":1,"delta":{},"logprobs":null,"finish_reason":"stop"}],"usage":null} data: {"id":"chatcmpl-9P4UJf7DEdyXVro2VOMRMT9qKR0bC","object":"chat.completion.chunk","created":1715762479,"model":"gpt-3.5-turbo-0125","system_fingerprint":null,"choices":[{"index":2,"delta":{},"logprobs":null,"finish_reason":"stop"}],"usage":null} data: {"id":"chatcmpl-9P4UJf7DEdyXVro2VOMRMT9qKR0bC","object":"chat.completion.chunk","created":1715762479,"model":"gpt-3.5-turbo-0125","system_fingerprint":null,"choices":[],"usage":{"prompt_tokens":24,"completion_tokens":58,"total_tokens":82}} data: [DONE] ``` The final chunk contains no choices, but a `usage`: ``` "usage":{"prompt_tokens":24,"completion_tokens":58,"total_tokens":82} ``` This usage is over all the generations from this stream.
GiteaMirror added the feature request label 2026-05-03 18:53:05 -05:00
Author
Owner

@jeremychone commented on GitHub (Jul 17, 2024):

I second this one. This is quite missing.

Not sure if we should use the native Ollama API rather than the OpenAI compatibility layer, as it seems to have the prompt_eval_count (input_tokens) and eval_count (output_tokens) in the final response.

I am okay with creating a custom adapter for Ollama with its native API, but not sure if that aligns with Ollama's focus or direction.

<!-- gh-comment-id:2234551126 --> @jeremychone commented on GitHub (Jul 17, 2024): I second this one. This is quite missing. Not sure if we should use the native Ollama API rather than the OpenAI compatibility layer, as it seems to have the `prompt_eval_count` (`input_tokens`) and `eval_count` (`output_tokens`) in the final response. I am okay with creating a custom adapter for Ollama with its native API, but not sure if that aligns with Ollama's focus or direction.
Author
Owner

@liamwh commented on GitHub (Sep 3, 2024):

Can this issue now be closed since this has been merged? https://github.com/lobehub/lobe-chat/issues/3179

<!-- gh-comment-id:2326845584 --> @liamwh commented on GitHub (Sep 3, 2024): Can this issue now be closed since this has been merged? https://github.com/lobehub/lobe-chat/issues/3179
Author
Owner

@jmtatsch commented on GitHub (May 5, 2025):

@odrobnik I can see the token statistics for an ollama served model now in open-webui so it must have been fixed meanwhile.

<!-- gh-comment-id:2851123935 --> @jmtatsch commented on GitHub (May 5, 2025): @odrobnik I can see the token statistics for an ollama served model now in open-webui so it must have been fixed meanwhile.
Author
Owner

@unkhz commented on GitHub (Jun 28, 2025):

Can confirm that usage is reported as long as the stream_options: { include_usage: true} is set

<!-- gh-comment-id:3015948659 --> @unkhz commented on GitHub (Jun 28, 2025): Can confirm that usage is reported as long as the `stream_options: { include_usage: true}` is set
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64816