[PR #11159] Add model eval metrics to /metrics #23997

Open
opened 2026-04-19 17:19:34 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11159
Author: @lapo-luchini
Created: 6/22/2025
Status: 🔄 Open

Base: mainHead: add-otel-metrics


📝 Commits (10+)

  • c9c0e24 metrics: Add metrics endpoint and basic request metrics
  • 01250de Merge branch 'main' into add-otel-metrics
  • ce78530 Add all ollama run --verbose metrics to OTel.
  • cf9abf5 Add build version (and start date).
  • e99128a Add same metrics in ChatHandler as in GenerateHandler.
  • 8c7702d Add some of the metrics in EmbedHandler too.
  • 4e18d92 Merge tag 'v0.11.3' into add-otel-metrics
  • 0ee5ff7 Merge tag 'v0.12.10' into add-otel-metrics
  • 7d47554 Merge tag 'v0.12.11' into add-otel-metrics
  • fa8a7ce Merge tag 'v0.13.5' into add-otel-metrics

📊 Changes

6 files changed (+417 additions, -6 deletions)

View changed files

📝 go.mod (+15 -1)
📝 go.sum (+35 -5)
📝 server/routes.go (+95 -0)
📝 server/routes_test.go (+39 -0)
telemetry/metrics.go (+154 -0)
telemetry/metrics_test.go (+79 -0)

📄 Description

Building upon #6537 I added the following metrics:

# HELP ollama_eval_duration_total The prompt evaluation duration in seconds.
# TYPE ollama_eval_duration_total counter
ollama_eval_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 1.383350906
# HELP ollama_eval_total The number of token evaluated.
# TYPE ollama_eval_total counter
ollama_eval_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 103
# HELP ollama_load_duration_total The request load duration in seconds.
# TYPE ollama_load_duration_total counter
ollama_load_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 1.324180685
# HELP ollama_prompt_eval_duration_total The prompt evaluation duration in seconds.
# TYPE ollama_prompt_eval_duration_total counter
ollama_prompt_eval_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 0.046077471
# HELP ollama_prompt_eval_total The number of prompt token evaluated.
# TYPE ollama_prompt_eval_total counter
ollama_prompt_eval_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 7
# HELP ollama_total_duration_total The request total duration in seconds.
# TYPE ollama_total_duration_total counter
ollama_total_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 2.754225864

Which are just the same I got on ./ollama run qwen2.5-coder:1.5b-base --verbose 'How much is 2+3' :

total duration:       2.754225864s
load duration:        1.324180685s
prompt eval count:    7 token(s)
prompt eval duration: 46.077471ms
prompt eval rate:     151.92 tokens/s
eval count:           103 token(s)
eval duration:        1.383350906s
eval rate:            74.46 tokens/s

(all metrics also contain labels otel_scope_name="ollama",otel_scope_version="0.55.0" but I removed it in this dump for brevity)


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11159 **Author:** [@lapo-luchini](https://github.com/lapo-luchini) **Created:** 6/22/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `add-otel-metrics` --- ### 📝 Commits (10+) - [`c9c0e24`](https://github.com/ollama/ollama/commit/c9c0e24a87e5821e1ca7234a4bfa79244a13fe84) metrics: Add metrics endpoint and basic request metrics - [`01250de`](https://github.com/ollama/ollama/commit/01250de1011011bbd30550cd1bab2e60e231502f) Merge branch 'main' into add-otel-metrics - [`ce78530`](https://github.com/ollama/ollama/commit/ce7853091c5666aff5f29a9cee831e3a50cbd64d) Add all `ollama run --verbose` metrics to OTel. - [`cf9abf5`](https://github.com/ollama/ollama/commit/cf9abf5001b4f55bb9fafdcece3dbb95bf82bd34) Add build version (and start date). - [`e99128a`](https://github.com/ollama/ollama/commit/e99128a22363aeb75c458b87952f4a66a4de771e) Add same metrics in `ChatHandler` as in `GenerateHandler`. - [`8c7702d`](https://github.com/ollama/ollama/commit/8c7702dacc3b22c8b338a1fb7f6e7f42129fe2dc) Add some of the metrics in `EmbedHandler` too. - [`4e18d92`](https://github.com/ollama/ollama/commit/4e18d92c28515b9030e4d78ea4de8f306ca670af) Merge tag 'v0.11.3' into add-otel-metrics - [`0ee5ff7`](https://github.com/ollama/ollama/commit/0ee5ff70c998fcb05d38242919ab3ef9f7ef0999) Merge tag 'v0.12.10' into add-otel-metrics - [`7d47554`](https://github.com/ollama/ollama/commit/7d475546464065d36ff4e01a42358ac255f7f119) Merge tag 'v0.12.11' into add-otel-metrics - [`fa8a7ce`](https://github.com/ollama/ollama/commit/fa8a7ce0950211f86a8a929109ba93fdd79651cf) Merge tag 'v0.13.5' into add-otel-metrics ### 📊 Changes **6 files changed** (+417 additions, -6 deletions) <details> <summary>View changed files</summary> 📝 `go.mod` (+15 -1) 📝 `go.sum` (+35 -5) 📝 `server/routes.go` (+95 -0) 📝 `server/routes_test.go` (+39 -0) ➕ `telemetry/metrics.go` (+154 -0) ➕ `telemetry/metrics_test.go` (+79 -0) </details> ### 📄 Description Building upon #6537 I added the following metrics: ``` # HELP ollama_eval_duration_total The prompt evaluation duration in seconds. # TYPE ollama_eval_duration_total counter ollama_eval_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 1.383350906 # HELP ollama_eval_total The number of token evaluated. # TYPE ollama_eval_total counter ollama_eval_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 103 # HELP ollama_load_duration_total The request load duration in seconds. # TYPE ollama_load_duration_total counter ollama_load_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 1.324180685 # HELP ollama_prompt_eval_duration_total The prompt evaluation duration in seconds. # TYPE ollama_prompt_eval_duration_total counter ollama_prompt_eval_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 0.046077471 # HELP ollama_prompt_eval_total The number of prompt token evaluated. # TYPE ollama_prompt_eval_total counter ollama_prompt_eval_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 7 # HELP ollama_total_duration_total The request total duration in seconds. # TYPE ollama_total_duration_total counter ollama_total_duration_total{model="qwen2.5-coder:1.5b-base",reason="stop"} 2.754225864 ``` Which are just the same I got on `./ollama run qwen2.5-coder:1.5b-base --verbose 'How much is 2+3' `: ``` total duration: 2.754225864s load duration: 1.324180685s prompt eval count: 7 token(s) prompt eval duration: 46.077471ms prompt eval rate: 151.92 tokens/s eval count: 103 token(s) eval duration: 1.383350906s eval rate: 74.46 tokens/s ``` (all metrics also contain labels `otel_scope_name="ollama",otel_scope_version="0.55.0"` but I removed it in this dump for brevity) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:19:35 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#23997