[PR #12742] [MERGED] llamarunner: Record the time for all batches during prompt processing #60633

Closed
opened 2026-04-29 15:42:47 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12742
Author: @jessegross
Created: 10/22/2025
Status: Merged
Merged: 10/22/2025
Merged by: @jessegross

Base: mainHead: jessegross/llama_time


📝 Commits (1)

  • f9282bd llamarunner: Record the time for all batches during prompt processing

📊 Changes

1 file changed (+12 additions, -2 deletions)

View changed files

📝 runner/llamarunner/runner.go (+12 -2)

📄 Description

Currently, we only record the time for the last batch when processing the prompt. This results in unrealistically high numbers for the old llama runner.

Before:
total duration: 31.273112939s
load duration: 4.97054657s
prompt eval count: 32768 token(s)
prompt eval duration: 235.137439ms
prompt eval rate: 139356.80 tokens/s
eval count: 1873 token(s)
eval duration: 18.173182374s
eval rate: 103.06 tokens/s

After:
total duration: 30.024798033s
load duration: 4.758588663s
prompt eval count: 32768 token(s)
prompt eval duration: 7.779621548s
prompt eval rate: 4212.03 tokens/s
eval count: 1769 token(s)
eval duration: 17.148014223s
eval rate: 103.16 tokens/s


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12742 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 10/22/2025 **Status:** ✅ Merged **Merged:** 10/22/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/llama_time` --- ### 📝 Commits (1) - [`f9282bd`](https://github.com/ollama/ollama/commit/f9282bd97d99ccb61594347e8326453b80acc8bf) llamarunner: Record the time for all batches during prompt processing ### 📊 Changes **1 file changed** (+12 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `runner/llamarunner/runner.go` (+12 -2) </details> ### 📄 Description Currently, we only record the time for the last batch when processing the prompt. This results in unrealistically high numbers for the old llama runner. Before: total duration: 31.273112939s load duration: 4.97054657s prompt eval count: 32768 token(s) prompt eval duration: 235.137439ms prompt eval rate: 139356.80 tokens/s eval count: 1873 token(s) eval duration: 18.173182374s eval rate: 103.06 tokens/s After: total duration: 30.024798033s load duration: 4.758588663s prompt eval count: 32768 token(s) prompt eval duration: 7.779621548s prompt eval rate: 4212.03 tokens/s eval count: 1769 token(s) eval duration: 17.148014223s eval rate: 103.16 tokens/s --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 15:42:47 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#60633