[GH-ISSUE #6672] Inconsistent prompt_eval_count for Large Prompts in Ollama Python Library #4199

Closed
opened 2026-04-12 15:07:53 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @surajyadav91 on GitHub (Sep 6, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6672

What is the issue?

Inconsistent prompt_eval_count for Large Prompts in Ollama Python Library

For larger prompts, when using the Ollama Python library with the llama3.1:8b-instruct-fp16 model, the prompt_eval_count remains constant at fixed value (1026) tokens, even when the input prompt size varies significantly. This behavior is observed when using the ollama.chat() method.

def classify_incident(row):
    full_prompt = (
        prompt_template + 
        row['user_message'] 
    )

    response = ollama.chat(model=model, options={'temperature' : 0.01}, messages=[
            {
            'role': 'user',
            'content': full_prompt
            }
            ])
    total_token = (response['prompt_eval_count'], response['eval_count'], 
                   response['prompt_eval_count'] + response['eval_count'])
    
    print(f'Tokens: {total_token}\n'
          f'Total_prompt_length: {len(full_prompt)}\n'
          f'{"=" * 50}\n')

Sample output:

Tokens: (1026, 15, 1041)
Total_prompt_length: 57788

Tokens: (1026, 20, 1046)
Total_prompt_length: 57172

Tokens: (1026, 18, 1044)
Total_prompt_length: 57744

Current Behavior

  • prompt_eval_count consistently returns same value (1026), regardless of the actual prompt length.
  • eval_count (output tokens) varies as expected. (this might also give fixed value once larger text is generated )

Expected Behavior

  • prompt_eval_count should accurately reflect the number of tokens in the input prompt.
  • The value should change dynamically based on the input size and content.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.3.9

Originally created by @surajyadav91 on GitHub (Sep 6, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6672 ### What is the issue? ### Inconsistent `prompt_eval_count` for Large Prompts in Ollama Python Library For larger prompts, when using the Ollama Python library with the `llama3.1:8b-instruct-fp16` model, the `prompt_eval_count` remains constant at fixed value (1026) tokens, even when the input prompt size varies significantly. This behavior is observed when using the `ollama.chat()` method. ```python def classify_incident(row): full_prompt = ( prompt_template + row['user_message'] ) response = ollama.chat(model=model, options={'temperature' : 0.01}, messages=[ { 'role': 'user', 'content': full_prompt } ]) total_token = (response['prompt_eval_count'], response['eval_count'], response['prompt_eval_count'] + response['eval_count']) print(f'Tokens: {total_token}\n' f'Total_prompt_length: {len(full_prompt)}\n' f'{"=" * 50}\n') ``` ## Sample output: Tokens: (1026, 15, 1041) Total_prompt_length: 57788 Tokens: (1026, 20, 1046) Total_prompt_length: 57172 Tokens: (1026, 18, 1044) Total_prompt_length: 57744 ## Current Behavior - `prompt_eval_count` consistently returns same value (1026), regardless of the actual prompt length. - `eval_count` (output tokens) varies as expected. (this might also give fixed value once larger text is generated ) ## Expected Behavior - `prompt_eval_count` should accurately reflect the number of tokens in the input prompt. - The value should change dynamically based on the input size and content. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.3.9
GiteaMirror added the bug label 2026-04-12 15:07:53 -05:00
Author
Owner

@surajyadav91 commented on GitHub (Sep 6, 2024):

raising this issue in Ollama-python https://github.com/ollama/ollama-python/issues/271

<!-- gh-comment-id:2333642564 --> @surajyadav91 commented on GitHub (Sep 6, 2024): raising this issue in Ollama-python https://github.com/ollama/ollama-python/issues/271
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4199