[PR #9374] [MERGED] sample: improve ollama engine sampler performance #75226

Closed
opened 2026-05-05 07:39:44 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9374
Author: @ParthSareen
Created: 2/26/2025
Status: Merged
Merged: 3/7/2025
Merged by: @ParthSareen

Base: mainHead: parth/topk-transform-optimization


📝 Commits (10+)

📊 Changes

7 files changed (+548 additions, -307 deletions)

View changed files

📝 go.mod (+1 -1)
📝 runner/ollamarunner/runner.go (+9 -1)
📝 sample/samplers.go (+91 -65)
sample/samplers_benchmark_test.go (+104 -0)
📝 sample/samplers_test.go (+35 -120)
📝 sample/transforms.go (+157 -74)
📝 sample/transforms_test.go (+151 -46)

📄 Description

This change bring in various interface cleanups along with greatly improving the performance of the sampler.

Tested with llama3.2 on local machine.
Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
Without topK performance is ~ 110 tokens/s


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9374 **Author:** [@ParthSareen](https://github.com/ParthSareen) **Created:** 2/26/2025 **Status:** ✅ Merged **Merged:** 3/7/2025 **Merged by:** [@ParthSareen](https://github.com/ParthSareen) **Base:** `main` ← **Head:** `parth/topk-transform-optimization` --- ### 📝 Commits (10+) - [`a76cfb8`](https://github.com/ollama/ollama/commit/a76cfb865e8a96b38f452502ee6fb5dd25bcb9cf) performance improvements! somewhat - [`35f0cf4`](https://github.com/ollama/ollama/commit/35f0cf42b3c8813134ba33242f9df49263eb6f24) abstract tokens and sort for top p - [`435fa30`](https://github.com/ollama/ollama/commit/435fa300787177b3f850eb1fbdcbbc9c4b3fed66) partial sort try - [`3d66d77`](https://github.com/ollama/ollama/commit/3d66d771f4b61089639067dbcc6ac38c8845a58c) minimize size of heap - 104 tps - [`8083ddd`](https://github.com/ollama/ollama/commit/8083ddd8b3594f68544dd806ea5faba4f896d741) add benchmarks - bruce - [`47a81dc`](https://github.com/ollama/ollama/commit/47a81dc3f7b32dba1cfbb94f1e3358bbeecc5809) 125 tps - [`3e35fb5`](https://github.com/ollama/ollama/commit/3e35fb57543f1f481194233580a3f00fa244095d) 136 - [`afddc0a`](https://github.com/ollama/ollama/commit/afddc0ae330b7d16ff325c6f14ebd0d8b3dd0faa) non topk improvements - [`3347374`](https://github.com/ollama/ollama/commit/33473741737fe192d0961dbd7c184e198fdce07a) update benchmark - [`e1fd75a`](https://github.com/ollama/ollama/commit/e1fd75a73a83740df75fa7ff05ea797379c2d159) update tests ### 📊 Changes **7 files changed** (+548 additions, -307 deletions) <details> <summary>View changed files</summary> 📝 `go.mod` (+1 -1) 📝 `runner/ollamarunner/runner.go` (+9 -1) 📝 `sample/samplers.go` (+91 -65) ➕ `sample/samplers_benchmark_test.go` (+104 -0) 📝 `sample/samplers_test.go` (+35 -120) 📝 `sample/transforms.go` (+157 -74) 📝 `sample/transforms_test.go` (+151 -46) </details> ### 📄 Description This change bring in various interface cleanups along with greatly improving the performance of the sampler. Tested with llama3.2 on local machine. Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled. Without topK performance is ~ 110 tokens/s --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 07:39:44 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#75226