[PR #8830] [CLOSED] benchmark: compare backend graph computation times #44035

Closed
opened 2026-04-24 23:34:53 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/8830
Author: @BruceMacD
Created: 2/5/2025
Status: Closed

Base: mainHead: brucemacd/new_runner_graph_bench


📝 Commits (1)

  • 057cc54 benchmark: compare backend graph computation times

📊 Changes

5 files changed (+225 additions, -1 deletions)

View changed files

benchmark/ggml_backend_benchmark_test.go (+86 -0)
📝 envconfig/config.go (+2 -0)
📝 kvcache/causal_test.go (+4 -0)
📝 ml/backend.go (+26 -1)
📝 ml/backend/ggml/ggml.go (+107 -0)

📄 Description

Depends on #8301

This PR adds bindings to track execution time of individual operations during LLM forward passes, helping identify performance bottlenecks and optimization opportunities in our computation graphs.

Run from the benchmark dir with:

 go test -bench=. -m $MODEL_NAME ./...

What changed

  • Added CGo bindings to interface with the native graph runtime's timing instrumentation
  • Created Go structs to categorize and track different operation types (compute nodes, views, reshapes, etc.)
  • Added methods to collect timing data for the full operation sequence
  • Added OLLAMA_BENCHMARK environment variable that collects graph compute timing details.

Why this matters

When optimizing LLM inference, understanding where time is spent is crucial. During development, this profiling is useful for finding bottlenecks and comparing performance across back-ends.

Example findings

I built a logging utility (not included in this PR) that produces detailed timing breakdowns. The logging utility is currently separate as it adds some complexity. Happy to bring it into this PR if there's interest in having standardized profiling output. Let me know if you'd find that useful.

Operation Sequence Analysis
==========================
Step Type         Operation                                Duration
----------------------------------------------------------------------
0    Compute      Computation node_0                          0.435 ms
1    Compute      Computation node_1                          0.211 ms
2    Compute      Computation node_2                          0.206 ms
3    Compute      Computation node_3                          0.218 ms
4    Reshape      Reshape                                     0.046 ms
...
639  Compute      Computation node_639                        0.350 ms
640  Compute      Computation node_640                        0.400 ms
641  Compute      Computation node_641                        0.183 ms
642  Compute      Computation node_642                        0.211 ms
643  Compute      Computation node_643                        1.518 ms
644  Compute      Computation node_644                        3.664 ms

Operation Type Summary:
----------------------------------------------------------------------
Type         Count    Total (ms)   Avg (ms)     Max (ms)
----------------------------------------------------------------------
Compute      341      98.600       0.289        3.664
View         176      25.206       0.143        0.539
Permute      64       10.335       0.161        0.583
Reshape      64       2.782        0.043        0.063

Total execution time: 136.923 ms

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/8830 **Author:** [@BruceMacD](https://github.com/BruceMacD) **Created:** 2/5/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `brucemacd/new_runner_graph_bench` --- ### 📝 Commits (1) - [`057cc54`](https://github.com/ollama/ollama/commit/057cc54b6666c9f9c7320fe811dba2a5f3a6ff79) benchmark: compare backend graph computation times ### 📊 Changes **5 files changed** (+225 additions, -1 deletions) <details> <summary>View changed files</summary> ➕ `benchmark/ggml_backend_benchmark_test.go` (+86 -0) 📝 `envconfig/config.go` (+2 -0) 📝 `kvcache/causal_test.go` (+4 -0) 📝 `ml/backend.go` (+26 -1) 📝 `ml/backend/ggml/ggml.go` (+107 -0) </details> ### 📄 Description Depends on #8301 This PR adds bindings to track execution time of individual operations during LLM forward passes, helping identify performance bottlenecks and optimization opportunities in our computation graphs. Run from the benchmark dir with: ```bash go test -bench=. -m $MODEL_NAME ./... ``` ## What changed - Added CGo bindings to interface with the native graph runtime's timing instrumentation - Created Go structs to categorize and track different operation types (compute nodes, views, reshapes, etc.) - Added methods to collect timing data for the full operation sequence - Added `OLLAMA_BENCHMARK` environment variable that collects graph compute timing details. ## Why this matters When optimizing LLM inference, understanding where time is spent is crucial. During development, this profiling is useful for finding bottlenecks and comparing performance across back-ends. ## Example findings I built a logging utility (not included in this PR) that produces detailed timing breakdowns. The logging utility is currently separate as it adds some complexity. Happy to bring it into this PR if there's interest in having standardized profiling output. Let me know if you'd find that useful. ```bash Operation Sequence Analysis ========================== Step Type Operation Duration ---------------------------------------------------------------------- 0 Compute Computation node_0 0.435 ms 1 Compute Computation node_1 0.211 ms 2 Compute Computation node_2 0.206 ms 3 Compute Computation node_3 0.218 ms 4 Reshape Reshape 0.046 ms ... 639 Compute Computation node_639 0.350 ms 640 Compute Computation node_640 0.400 ms 641 Compute Computation node_641 0.183 ms 642 Compute Computation node_642 0.211 ms 643 Compute Computation node_643 1.518 ms 644 Compute Computation node_644 3.664 ms Operation Type Summary: ---------------------------------------------------------------------- Type Count Total (ms) Avg (ms) Max (ms) ---------------------------------------------------------------------- Compute 341 98.600 0.289 3.664 View 176 25.206 0.143 0.539 Permute 64 10.335 0.161 0.583 Reshape 64 2.782 0.043 0.063 Total execution time: 136.923 ms ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-24 23:34:53 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#44035