[GH-ISSUE #3427] prompt_eval_count in api is broken #27871

Closed
opened 2026-04-22 05:30:46 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @drazdra on GitHub (Mar 31, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3427

What is the issue?

prompt_eval_count parameter is absent on some calls, on other calls it returns wrong information.

  1. i tried /api/chat with "stablelm2", no system prompt, prompt="hi".

in result there is no field "prompt_eval_count" most of the time. sometimes it's there, randomly, but rarely.

  1. when you have small num_ctx and the supplied prompt (content of all messages in /api/chat) exceeds the num_ctx size, the prompt_eval_count may either be absent or provide wrong information.

i believe it returns the amount of tokens that could fit the context window, instead of the whole context prompt that was sent.

thanks :).

What did you expect to see?

expected behavior:

  1. there should always be prompt_eval_count.
  2. it should report the count of submitted tokens, not the processed ones.
  3. optionally there maybe one more parameter returned, showing the amount of processed tokens.

Steps to reproduce

use /api/chat to send array of messages, limit num_ctx, send longer content than fits into the context window defined with num_ctx, check the values returned in final json reply with done==true.

Are there any recent changes that introduced the issue?

No response

OS

Linux

Architecture

amd64

Platform

No response

Ollama version

0.1.30

GPU

Other

GPU info

cpu only

CPU

AMD

Other software

No response

Originally created by @drazdra on GitHub (Mar 31, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3427 ### What is the issue? prompt_eval_count parameter is absent on some calls, on other calls it returns wrong information. 1. i tried /api/chat with "stablelm2", no system prompt, prompt="hi". in result there is no field "prompt_eval_count" most of the time. sometimes it's there, randomly, but rarely. 2. when you have small num_ctx and the supplied prompt (content of all messages in /api/chat) exceeds the num_ctx size, the prompt_eval_count may either be absent or provide wrong information. i believe it returns the amount of tokens that could fit the context window, instead of the whole context prompt that was sent. thanks :). ### What did you expect to see? expected behavior: 1. there should always be prompt_eval_count. 2. it should report the count of submitted tokens, not the processed ones. 3. optionally there maybe one more parameter returned, showing the amount of processed tokens. ### Steps to reproduce use /api/chat to send array of messages, limit num_ctx, send longer content than fits into the context window defined with num_ctx, check the values returned in final json reply with done==true. ### Are there any recent changes that introduced the issue? _No response_ ### OS Linux ### Architecture amd64 ### Platform _No response_ ### Ollama version 0.1.30 ### GPU Other ### GPU info cpu only ### CPU AMD ### Other software _No response_
GiteaMirror added the bug label 2026-04-22 05:30:46 -05:00
Author
Owner

@tarbard commented on GitHub (Mar 31, 2024):

I've had this problem too. I think it happens when the prompt is cached. I've also seen happen with eval_count though that was much rarer so it might be unrelated.

<!-- gh-comment-id:2028802054 --> @tarbard commented on GitHub (Mar 31, 2024): I've had this problem too. I think it happens when the prompt is cached. I've also seen happen with eval_count though that was much rarer so it might be unrelated.
Author
Owner

@jmorganca commented on GitHub (Mar 31, 2024):

Hi there, yes it will be 0 if the prompt is cached. Will most likely revisit this. However it's definitely a bug that that the field disappears altogether - thanks for flagging!

<!-- gh-comment-id:2028824977 --> @jmorganca commented on GitHub (Mar 31, 2024): Hi there, yes it will be 0 if the prompt is cached. Will most likely revisit this. However it's definitely a bug that that the field disappears altogether - thanks for flagging!
Author
Owner

@drazdra commented on GitHub (Mar 31, 2024):

Hi there, yes it will be 0 if the prompt is cached. Will most likely revisit this. However it's definitely a bug that that the field disappears altogether - thanks for flagging!

the issue is that i wanted to use it to mark the parts of the prompt that do not fit into the context window. but it's impossible with this behavior. it would be good if the cache could remember the amount of tokens cached and returned the proper values :). thanks for reply.

<!-- gh-comment-id:2028828281 --> @drazdra commented on GitHub (Mar 31, 2024): > Hi there, yes it will be 0 if the prompt is cached. Will most likely revisit this. However it's definitely a bug that that the field disappears altogether - thanks for flagging! the issue is that i wanted to use it to mark the parts of the prompt that do not fit into the context window. but it's impossible with this behavior. it would be good if the cache could remember the amount of tokens cached and returned the proper values :). thanks for reply.
Author
Owner

@gregnwosu commented on GitHub (Apr 8, 2024):

how can i disable prompt caching

<!-- gh-comment-id:2042381744 --> @gregnwosu commented on GitHub (Apr 8, 2024): how can i disable prompt caching
Author
Owner

@jmorganca commented on GitHub (Jun 4, 2024):

Merging with #2068

<!-- gh-comment-id:2146753073 --> @jmorganca commented on GitHub (Jun 4, 2024): Merging with #2068
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27871