[PR #12525] [MERGED] runner: update metrics #13858

Closed
opened 2026-04-13 00:38:44 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12525
Author: @mxyng
Created: 10/7/2025
Status: Merged
Merged: 10/9/2025
Merged by: @mxyng

Base: mainHead: mxyng/metrics


📝 Commits (2)

  • 7d26b07 llamarunner: update metrics
  • 4386671 ollamarunner: measure only active time

📊 Changes

2 files changed (+69 additions, -52 deletions)

View changed files

📝 runner/llamarunner/runner.go (+25 -25)
📝 runner/ollamarunner/runner.go (+44 -27)

📄 Description

this change updates how metrics are collected. until now, performance metrics, specifically initial input processing and subsequent generation durations, were collected by taking the timestamp when creating a new sequence, the first token generation, and completing generation. the processing duration is taken as first token generation sub sequence creation while generation is taken as completing generation sub first token generation.

while this approach is an accurate end-to-end metric of processing and generation, it's not comparable to other tools which only measure the active, i.e. decode, duration.

this change updates the metrics to only capture decode duration so it can be more directly compared to other tools

note, the ollamarunner changes doesn't have the exact same behaviour as the llamarunner. since ollama builds and executes the graph in separate goroutines, the duration is estimated by taking timestamp at various points minus any duration not accounted for in llamarunner such as sampling


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12525 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 10/7/2025 **Status:** ✅ Merged **Merged:** 10/9/2025 **Merged by:** [@mxyng](https://github.com/mxyng) **Base:** `main` ← **Head:** `mxyng/metrics` --- ### 📝 Commits (2) - [`7d26b07`](https://github.com/ollama/ollama/commit/7d26b070d21c0b190ed3caf9bfef5a3d098f488d) llamarunner: update metrics - [`4386671`](https://github.com/ollama/ollama/commit/4386671940085c99b0cba415d74fa95d59b41bb1) ollamarunner: measure only active time ### 📊 Changes **2 files changed** (+69 additions, -52 deletions) <details> <summary>View changed files</summary> 📝 `runner/llamarunner/runner.go` (+25 -25) 📝 `runner/ollamarunner/runner.go` (+44 -27) </details> ### 📄 Description this change updates how metrics are collected. until now, performance metrics, specifically initial input processing and subsequent generation durations, were collected by taking the timestamp when creating a new sequence, the first token generation, and completing generation. the processing duration is taken as first token generation sub sequence creation while generation is taken as completing generation sub first token generation. while this approach is an accurate end-to-end metric of processing and generation, it's not comparable to other tools which only measure the active, i.e. decode, duration. this change updates the metrics to only capture decode duration so it can be more directly compared to other tools note, the ollamarunner changes doesn't have the exact same behaviour as the llamarunner. since ollama builds and executes the graph in separate goroutines, the duration is estimated by taking timestamp at various points minus any duration not accounted for in llamarunner such as sampling --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:38:44 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13858