[PR #15199] mlxrunner: tokenize prompts in request handler goroutines #25610

Open
opened 2026-04-19 18:18:30 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15199
Author: @jessegross
Created: 4/1/2026
Status: 🔄 Open

Base: mainHead: jessegross/tokenize


📝 Commits (2)

  • 845b7b2 mlx: improve thread safety of array management
  • 22e8abe mlxrunner: tokenize prompts in request handler goroutines

📊 Changes

7 files changed (+82 additions, -57 deletions)

View changed files

📝 x/mlxrunner/mlx/array.go (+32 -19)
📝 x/mlxrunner/mlx/compile.go (+1 -1)
📝 x/mlxrunner/mlx/fast.go (+4 -4)
📝 x/mlxrunner/mlx/nn.go (+5 -5)
📝 x/mlxrunner/pipeline.go (+27 -24)
📝 x/mlxrunner/runner.go (+8 -4)
📝 x/mlxrunner/server.go (+5 -0)

📄 Description

Move tokenization out of the single GPU processing goroutine and into each request's HTTP handler goroutine. This allows the next request's prompt to be tokenized on the CPU while the current request is executing on the GPU.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15199 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 4/1/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `jessegross/tokenize` --- ### 📝 Commits (2) - [`845b7b2`](https://github.com/ollama/ollama/commit/845b7b29c4f4338f56d5b7fd336ef3d981038798) mlx: improve thread safety of array management - [`22e8abe`](https://github.com/ollama/ollama/commit/22e8abe666436a7f2c187604c33fece0bc84da6f) mlxrunner: tokenize prompts in request handler goroutines ### 📊 Changes **7 files changed** (+82 additions, -57 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/mlx/array.go` (+32 -19) 📝 `x/mlxrunner/mlx/compile.go` (+1 -1) 📝 `x/mlxrunner/mlx/fast.go` (+4 -4) 📝 `x/mlxrunner/mlx/nn.go` (+5 -5) 📝 `x/mlxrunner/pipeline.go` (+27 -24) 📝 `x/mlxrunner/runner.go` (+8 -4) 📝 `x/mlxrunner/server.go` (+5 -0) </details> ### 📄 Description Move tokenization out of the single GPU processing goroutine and into each request's HTTP handler goroutine. This allows the next request's prompt to be tokenized on the CPU while the current request is executing on the GPU. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 18:18:30 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#25610