[GH-ISSUE #5200] Add support for stream_options #65304

Closed
opened 2026-05-03 20:26:04 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @igo on GitHub (Jun 21, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5200

OpenAI added support for token stats in a streamed response. Would be great to have similar feature in Ollama.

https://community.openai.com/t/usage-stats-now-available-when-using-streaming-with-the-chat-completions-api-or-completions-api/738156

https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options

Originally created by @igo on GitHub (Jun 21, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5200 OpenAI added support for token stats in a streamed response. Would be great to have similar feature in Ollama. https://community.openai.com/t/usage-stats-now-available-when-using-streaming-with-the-chat-completions-api-or-completions-api/738156 https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options
GiteaMirror added the feature requestapi labels 2026-05-03 20:26:10 -05:00
Author
Owner

@yawn commented on GitHub (Aug 17, 2024):

Agreed, that would be fantastic! To reproduce:

from openai import OpenAI

def main():

    client = OpenAI(
        api_key="null",
        base_url="http://localhost:11434/v1",
    )

    model = "llama3.1:8b-instruct-q8_0"

    res = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Say this is a test",
            }
        ],
        model=model,
        stream=False,
    )

    assert res is not None, "(no stream) result is present"
    assert res.usage is not None, "(no stream) result contains usage information"
    assert res.usage.prompt_tokens > 0, "(no stream) result usage information token count is present"

    stream = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Say this is a test",
            }
        ],
        model=model,
        stream=True,
        stream_options={"include_usage": True}
    )

    *_, last = stream

    assert last is not None, "(stream) last chunk is present"
    assert last.usage is not None, "(stream) last chunk contains usage information"
    assert last.usage.prompt_tokens > 0, "(stream) last chunk usage information token count is present"

To run this against OpenAI, change the model to something valid (e.g. gpt-3.5-turbo), remove the base_url and either add a valid API key or remove the property and export a valid OPENAI_API_KEY.

To enable HTTP logs, export OPENAI_LOG=debug.

When using ollama version 0.3.6 the 2nd last streaming assertion fails currently: AssertionError: (stream) last chunk contains usage information.

<!-- gh-comment-id:2294900600 --> @yawn commented on GitHub (Aug 17, 2024): Agreed, that would be fantastic! To reproduce: ```python from openai import OpenAI def main(): client = OpenAI( api_key="null", base_url="http://localhost:11434/v1", ) model = "llama3.1:8b-instruct-q8_0" res = client.chat.completions.create( messages=[ { "role": "user", "content": "Say this is a test", } ], model=model, stream=False, ) assert res is not None, "(no stream) result is present" assert res.usage is not None, "(no stream) result contains usage information" assert res.usage.prompt_tokens > 0, "(no stream) result usage information token count is present" stream = client.chat.completions.create( messages=[ { "role": "user", "content": "Say this is a test", } ], model=model, stream=True, stream_options={"include_usage": True} ) *_, last = stream assert last is not None, "(stream) last chunk is present" assert last.usage is not None, "(stream) last chunk contains usage information" assert last.usage.prompt_tokens > 0, "(stream) last chunk usage information token count is present" ``` To run this against OpenAI, change the `model` to something valid (e.g. `gpt-3.5-turbo`), remove the `base_url` and either add a valid API key or remove the property and export a valid `OPENAI_API_KEY`. To enable HTTP logs, export `OPENAI_LOG=debug`. When using ollama version `0.3.6` the 2nd last streaming assertion fails currently: `AssertionError: (stream) last chunk contains usage information`.
Author
Owner

@codefromthecrypt commented on GitHub (Sep 13, 2024):

note sure if this PR to vllm helps progress this or not. This is a big deal for statistics https://github.com/vllm-project/vllm/pull/5135

<!-- gh-comment-id:2347884776 --> @codefromthecrypt commented on GitHub (Sep 13, 2024): note sure if this PR to vllm helps progress this or not. This is a big deal for statistics https://github.com/vllm-project/vllm/pull/5135
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65304