[PR #8410] [MERGED] sample: add sampling package for new engine #12702

Closed
opened 2026-04-13 00:07:27 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/8410
Author: @ParthSareen
Created: 1/14/2025
Status: Merged
Merged: 2/25/2025
Merged by: @ParthSareen

Base: mainHead: parth/next-sampling


📝 Commits (5)

  • 7790297 sample: update to use unified sampler interface
  • 226af79 sample: update test with want/got
  • f668e25 sample: abstract creating sampler to sampling package
  • 39ea815 name change for cumulative sum
  • e2e700a Add todos referencing github issues

📊 Changes

7 files changed (+599 additions, -126 deletions)

View changed files

📝 runner/ollamarunner/runner.go (+22 -39)
sample/greedy.go (+0 -13)
sample/sample.go (+0 -74)
sample/samplers.go (+139 -0)
sample/samplers_test.go (+238 -0)
sample/transforms.go (+120 -0)
sample/transforms_test.go (+80 -0)

📄 Description

This package introduces a first pass at the sampler for the new engine.

Usage

sampler, err := sample.NewSampler(
		req.Temperature,
		req.TopK,
		req.TopP,
		req.MinP,
		req.Seed,
	)
token, err := seq.sampler.Sample(logits[seq.iBatch*vocabSize : (seq.iBatch+1)*vocabSize])
if err != nil {
	return fmt.Errorf("failed to sample token: %w", err)
}

Known optimizations to be considered (as follow-up):

  • Possibly caching the softmax call -> if this is done multiple times for a token then the result should just be cached
  • TopK does a simple sort - could be using a min heap with k nodes
  • TopK should also be a performance improving sampling technique - we should be trimming the amount of vocab the sampler has to go through. This could be tracked for a token through tracking valid indices at each step as a set or an ordered map
  • Fused transformations - to reduce passes over logits common samplers can be fused together for performance

References:


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/8410 **Author:** [@ParthSareen](https://github.com/ParthSareen) **Created:** 1/14/2025 **Status:** ✅ Merged **Merged:** 2/25/2025 **Merged by:** [@ParthSareen](https://github.com/ParthSareen) **Base:** `main` ← **Head:** `parth/next-sampling` --- ### 📝 Commits (5) - [`7790297`](https://github.com/ollama/ollama/commit/77902978f6f8ca14cdb5407f36d1ca2f5a269f41) sample: update to use unified sampler interface - [`226af79`](https://github.com/ollama/ollama/commit/226af792e58f9288b2e7ad4b8ca734e13a94e07a) sample: update test with want/got - [`f668e25`](https://github.com/ollama/ollama/commit/f668e25be71e4e62b1591ab9e995ba7a98b650d3) sample: abstract creating sampler to sampling package - [`39ea815`](https://github.com/ollama/ollama/commit/39ea815553e770537b54d8d02a35e923c5cf7699) name change for cumulative sum - [`e2e700a`](https://github.com/ollama/ollama/commit/e2e700a963b82a55b800a1ea1b3ea2dbac7124db) Add todos referencing github issues ### 📊 Changes **7 files changed** (+599 additions, -126 deletions) <details> <summary>View changed files</summary> 📝 `runner/ollamarunner/runner.go` (+22 -39) ➖ `sample/greedy.go` (+0 -13) ➖ `sample/sample.go` (+0 -74) ➕ `sample/samplers.go` (+139 -0) ➕ `sample/samplers_test.go` (+238 -0) ➕ `sample/transforms.go` (+120 -0) ➕ `sample/transforms_test.go` (+80 -0) </details> ### 📄 Description This package introduces a first pass at the sampler for the new engine. ## Usage ```go sampler, err := sample.NewSampler( req.Temperature, req.TopK, req.TopP, req.MinP, req.Seed, ) token, err := seq.sampler.Sample(logits[seq.iBatch*vocabSize : (seq.iBatch+1)*vocabSize]) if err != nil { return fmt.Errorf("failed to sample token: %w", err) } ``` Known optimizations to be considered (as follow-up): - [ ] Possibly caching the softmax call -> if this is done multiple times for a token then the result should just be cached - [x] TopK does a simple sort - could be using a min heap with k nodes - [ ] TopK should also be a performance improving sampling technique - we should be trimming the amount of vocab the sampler has to go through. This could be tracked for a token through tracking valid indices at each step as a set or an ordered map - [ ] Fused transformations - to reduce passes over logits common samplers can be fused together for performance References: - Common sampling methods: https://huyenchip.com/2024/01/16/sampling.html#top_k - Min p sampling: https://arxiv.org/pdf/2407.01082 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:07:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#12702