[GH-ISSUE #11022] "prompt_eval_count" in response does not include JSON schema() #7269

Closed
opened 2026-04-12 19:19:12 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @P2T10N on GitHub (Jun 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11022

What is the issue?

The prompt_eval_count in the Ollama output response does not seem to include the json_schema() part of the input. This makes it difficult to accurately calculate the input token count, especially when using structured outputs defined by Pydantic models.

Without the json_schema() being accounted for in prompt_eval_count, it becomes challenging to set the model's context length (num_ctx) effectively, especially as the overall context (including the schema) grows. This can lead to unexpected truncation or inefficient use of the model's context window.

Is there a recommended method to calculate the total input token count, including the json_schema(), when using Ollama?

Here's a code snippet demonstrating the issue:

from ollama import chat
from pydantic import BaseModel, Field
from utils.moderation_output_format import ModerationOutput

# Assuming ModerationOutput is a Pydantic BaseModel as an example
class ModerationOutput(BaseModel):
    is_safe: bool = Field(..., description="Whether the content is safe or not.")
    reason: str = Field(..., description="Reason for the safety assessment.")

class Country(BaseModel):
    name: str = Field(..., description="a b c d e f g h i j k l m n o p q r s t u v w x y z")
    capital: str = Field(..., description="a b c d e f g h i j k l m n o p q r s t u v w x y z")
    languages: list[str] = Field(..., description="a b c d e f g h i j k l m n o p q r s t u v w x y z")

response = chat(
    messages=[
        {
            'role': 'user',
            'content': 'Tell me about Canada.',
        }
    ],
    model='gemma3:4b',
    format=ModerationOutput.model_json_schema(), # Here, ModerationOutput.model_json_schema() is passed
    options={
        "num_ctx": 2048
    },
)

print(response.prompt_eval_count)

Relevant log output

14

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.9.0

Originally created by @P2T10N on GitHub (Jun 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11022 ### What is the issue? The `prompt_eval_count` in the Ollama output response does not seem to include the `json_schema()` part of the input. This makes it difficult to accurately calculate the input token count, especially when using structured outputs defined by Pydantic models. Without the `json_schema()` being accounted for in `prompt_eval_count`, it becomes challenging to set the model's context length (`num_ctx`) effectively, especially as the overall context (including the schema) grows. This can lead to unexpected truncation or inefficient use of the model's context window. Is there a recommended method to calculate the total input token count, including the `json_schema()`, when using Ollama? Here's a code snippet demonstrating the issue: ``` from ollama import chat from pydantic import BaseModel, Field from utils.moderation_output_format import ModerationOutput # Assuming ModerationOutput is a Pydantic BaseModel as an example class ModerationOutput(BaseModel): is_safe: bool = Field(..., description="Whether the content is safe or not.") reason: str = Field(..., description="Reason for the safety assessment.") class Country(BaseModel): name: str = Field(..., description="a b c d e f g h i j k l m n o p q r s t u v w x y z") capital: str = Field(..., description="a b c d e f g h i j k l m n o p q r s t u v w x y z") languages: list[str] = Field(..., description="a b c d e f g h i j k l m n o p q r s t u v w x y z") response = chat( messages=[ { 'role': 'user', 'content': 'Tell me about Canada.', } ], model='gemma3:4b', format=ModerationOutput.model_json_schema(), # Here, ModerationOutput.model_json_schema() is passed options={ "num_ctx": 2048 }, ) print(response.prompt_eval_count) ``` ### Relevant log output ```shell 14 ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.9.0
GiteaMirror added the bug label 2026-04-12 19:19:12 -05:00
Author
Owner

@rick-github commented on GitHub (Jun 9, 2025):

The schema is not included in the prompt token count because it's not part of the prompt. It's used to create a GBNF that controls the generated tokens. It also sounds like you want to dynamically adjust num_ctx. Be aware that every time num_ctx changes, the model is reloaded.

<!-- gh-comment-id:2954975479 --> @rick-github commented on GitHub (Jun 9, 2025): The schema is not included in the prompt token count because it's not part of the prompt. It's used to create a GBNF that controls the generated tokens. It also sounds like you want to dynamically adjust `num_ctx`. Be aware that every time `num_ctx` changes, the model is reloaded.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7269