[PR #11159] Add model eval metrics to /metrics #13457

Open
opened 2026-04-13 00:27:51 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/11159

State: open
Merged: No


Building upon #6537 I added the following metrics:

# HELP ollama_eval_duration_total The prompt evaluation duration in seconds.
# TYPE ollama_eval_duration_total counter
ollama_eval_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 1.383350906
# HELP ollama_eval_total The number of token evaluated.
# TYPE ollama_eval_total counter
ollama_eval_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 103
# HELP ollama_load_duration_total The request load duration in seconds.
# TYPE ollama_load_duration_total counter
ollama_load_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 1.324180685
# HELP ollama_prompt_eval_duration_total The prompt evaluation duration in seconds.
# TYPE ollama_prompt_eval_duration_total counter
ollama_prompt_eval_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 0.046077471
# HELP ollama_prompt_eval_total The number of prompt token evaluated.
# TYPE ollama_prompt_eval_total counter
ollama_prompt_eval_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 7
# HELP ollama_total_duration_total The request total duration in seconds.
# TYPE ollama_total_duration_total counter
ollama_total_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 2.754225864

Which are just the same I got on ./ollama run qwen2.5-coder:1.5b-base --verbose 'How much is 2+3' :

total duration:       2.754225864s
load duration:        1.324180685s
prompt eval count:    7 token(s)
prompt eval duration: 46.077471ms
prompt eval rate:     151.92 tokens/s
eval count:           103 token(s)
eval duration:        1.383350906s
eval rate:            74.46 tokens/s

(all metrics also contain labels otel_scope_name="ollama",otel_scope_version="0.55.0" but I removed it in this dump for brevity)

**Original Pull Request:** https://github.com/ollama/ollama/pull/11159 **State:** open **Merged:** No --- Building upon #6537 I added the following metrics: ``` # HELP ollama_eval_duration_total The prompt evaluation duration in seconds. # TYPE ollama_eval_duration_total counter ollama_eval_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 1.383350906 # HELP ollama_eval_total The number of token evaluated. # TYPE ollama_eval_total counter ollama_eval_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 103 # HELP ollama_load_duration_total The request load duration in seconds. # TYPE ollama_load_duration_total counter ollama_load_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 1.324180685 # HELP ollama_prompt_eval_duration_total The prompt evaluation duration in seconds. # TYPE ollama_prompt_eval_duration_total counter ollama_prompt_eval_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 0.046077471 # HELP ollama_prompt_eval_total The number of prompt token evaluated. # TYPE ollama_prompt_eval_total counter ollama_prompt_eval_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 7 # HELP ollama_total_duration_total The request total duration in seconds. # TYPE ollama_total_duration_total counter ollama_total_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 2.754225864 ``` Which are just the same I got on `./ollama run qwen2.5-coder:1.5b-base --verbose 'How much is 2+3' `: ``` total duration: 2.754225864s load duration: 1.324180685s prompt eval count: 7 token(s) prompt eval duration: 46.077471ms prompt eval rate: 151.92 tokens/s eval count: 103 token(s) eval duration: 1.383350906s eval rate: 74.46 tokens/s ``` (all metrics also contain labels `otel_scope_name="ollama",otel_scope_version="0.55.0"` but I removed it in this dump for brevity)
GiteaMirror added the pull-request label 2026-04-13 00:27:51 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13457