completion statistics completely wrong #3117

Closed
opened 2025-11-11 15:22:59 -06:00 by GiteaMirror · 4 comments
Owner

Originally created by @Master-Pr0grammer on GitHub (Dec 27, 2024).

Response Statistics Reporting Inaccuracies

Description

The response statistics displayed in the UI appear to be incorrectly scaled and inconsistent. Token generation speeds are reported roughly 10x higher than actual observed speeds, and duration measurements are wildly off.

Version Information

  • Open WebUI Version: v0.5.1
  • Ollama Version: 0.5.4

Observed Behavior

Example statistics from a response:

response_token/s: 4432.13
prompt_token/s: 14666.67
total_duration: 858553.61
load_duration: 814776.33
prompt_eval_count: 11
prompt_eval_duration: 7500
eval_count: 16
eval_duration: 36100
approximate_total: "0hOm8s"

Issues:

  1. Token generation speed is reported as ~4,400 tokens/s, but actual observed speed is ~40 tokens/s
  2. Duration measurements are wildly wrong, and are inconsistent with the reported "approximate_total" time which is accurate
  3. All speed metrics appear to be scaled up by a factor of approximately 10x
  4. Duration units seem inconsistent (some values might be in ms while others in μs)

Expected Behavior

  • Token generation speeds should match visually observed speeds (~40 tokens/s in this case)
  • Duration measurements should be consistent in their units
  • The approximate_total time should align with other duration measurements

Additional Context

  • This worked correctly in a previous version of Open WebUI just before the recent v0.5.1
  • The issue is consistent across different prompts and model sizes
  • Hardware: NVIDIA GeForce GTX 1080 Ti

Steps to Reproduce

  1. Start Open WebUI (pip package) and Ollama
  2. Load any model
  3. Send a prompt that generates a moderate-length response
  4. Observe the reported statistics in the UI
  5. Compare the reported token/s with the visually observed generation speed

Impact

This makes it difficult to accurately benchmark model performance and could lead to confusion when trying to optimize model settings.

Originally created by @Master-Pr0grammer on GitHub (Dec 27, 2024). ## Response Statistics Reporting Inaccuracies ### Description The response statistics displayed in the UI appear to be incorrectly scaled and inconsistent. Token generation speeds are reported roughly 10x higher than actual observed speeds, and duration measurements are wildly off. ### Version Information - Open WebUI Version: v0.5.1 - Ollama Version: 0.5.4 ### Observed Behavior Example statistics from a response: ``` response_token/s: 4432.13 prompt_token/s: 14666.67 total_duration: 858553.61 load_duration: 814776.33 prompt_eval_count: 11 prompt_eval_duration: 7500 eval_count: 16 eval_duration: 36100 approximate_total: "0hOm8s" ``` Issues: 1. Token generation speed is reported as ~4,400 tokens/s, but actual observed speed is ~40 tokens/s 2. Duration measurements are wildly wrong, and are inconsistent with the reported "approximate_total" time which is accurate 3. All speed metrics appear to be scaled up by a factor of approximately 10x 4. Duration units seem inconsistent (some values might be in ms while others in μs) ### Expected Behavior - Token generation speeds should match visually observed speeds (~40 tokens/s in this case) - Duration measurements should be consistent in their units - The approximate_total time should align with other duration measurements ### Additional Context - This worked correctly in a previous version of Open WebUI just before the recent v0.5.1 - The issue is consistent across different prompts and model sizes - Hardware: NVIDIA GeForce GTX 1080 Ti ### Steps to Reproduce 1. Start Open WebUI (pip package) and Ollama 2. Load any model 3. Send a prompt that generates a moderate-length response 4. Observe the reported statistics in the UI 5. Compare the reported token/s with the visually observed generation speed ### Impact This makes it difficult to accurately benchmark model performance and could lead to confusion when trying to optimize model settings.
Author
Owner

@Master-Pr0grammer commented on GitHub (Dec 27, 2024):

For additional context, this is the usage reported by ollama from the same model with the same prompt:

total duration:       600.813205ms
load duration:        22.364032ms
prompt eval count:    11 token(s)
prompt eval duration: 96ms
prompt eval rate:     114.58 tokens/s
eval count:           22 token(s)
eval duration:        481ms
eval rate:            45.74 tokens/s
@Master-Pr0grammer commented on GitHub (Dec 27, 2024): For additional context, this is the usage reported by ollama from the same model with the same prompt: ``` total duration: 600.813205ms load duration: 22.364032ms prompt eval count: 11 token(s) prompt eval duration: 96ms prompt eval rate: 114.58 tokens/s eval count: 22 token(s) eval duration: 481ms eval rate: 45.74 tokens/s ```
Author
Owner

@tjbck commented on GitHub (Dec 27, 2024):

Fixed on dev!

@tjbck commented on GitHub (Dec 27, 2024): Fixed on dev!
Author
Owner

@Master-Pr0grammer commented on GitHub (Dec 27, 2024):

@tjbck oh ok, I saw that there was an update that was just released, v0.5.2, that claimed to fix the usage stats in the change logs, however I am still having issues with incorrect stats, they are still completely wrong.

It seems like the calculation did change since some of the numbers are now on the other end of the scale (T/s is severely underestimated now), but all of the numbers are still way off.

I posted the stats I get from ollama above, This is what I now get out of openwebUI

response_token/s: 4.33
prompt_token/s: 21.57
total_duration: 569911289
load_duration: 31760960
prompt_eval_count: 11
prompt_eval_duration: 51000000
eval_count: 21
eval_duration: 485000000
approximate_total: "0h0mOs"

Is this the dev version you were talking about? or does the dev version you are referring to still need to be merged and pushed as a new release/update?

@Master-Pr0grammer commented on GitHub (Dec 27, 2024): @tjbck oh ok, I saw that there was an update that was just released, v0.5.2, that claimed to fix the usage stats in the change logs, however I am still having issues with incorrect stats, they are still completely wrong. It seems like the calculation did change since some of the numbers are now on the other end of the scale (T/s is severely underestimated now), but all of the numbers are still way off. I posted the stats I get from ollama above, This is what I now get out of openwebUI ``` response_token/s: 4.33 prompt_token/s: 21.57 total_duration: 569911289 load_duration: 31760960 prompt_eval_count: 11 prompt_eval_duration: 51000000 eval_count: 21 eval_duration: 485000000 approximate_total: "0h0mOs" ``` Is this the dev version you were talking about? or does the dev version you are referring to still need to be merged and pushed as a new release/update?
Author
Owner

@tjbck commented on GitHub (Dec 30, 2024):

Please try the latest:dev and let me know if that resolves an issue for you!

@tjbck commented on GitHub (Dec 30, 2024): Please try the latest`:dev` and let me know if that resolves an issue for you!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#3117