[PR #15199] [MERGED] mlxrunner: tokenize prompts in request handler goroutines #61767

Closed
opened 2026-04-29 16:47:13 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15199
Author: @jessegross
Created: 4/1/2026
Status: Merged
Merged: 4/21/2026
Merged by: @jessegross

Base: mainHead: jessegross/tokenize


📝 Commits (2)

  • 03e4836 mlx: improve thread safety of array management
  • 6d6bd64 mlxrunner: tokenize prompts in request handler goroutines

📊 Changes

8 files changed (+83 additions, -58 deletions)

View changed files

📝 x/mlxrunner/client.go (+1 -1)
📝 x/mlxrunner/mlx/array.go (+32 -19)
📝 x/mlxrunner/mlx/compile.go (+1 -1)
📝 x/mlxrunner/mlx/fast.go (+4 -4)
📝 x/mlxrunner/mlx/nn.go (+5 -5)
📝 x/mlxrunner/pipeline.go (+27 -24)
📝 x/mlxrunner/runner.go (+8 -4)
📝 x/mlxrunner/server.go (+5 -0)

📄 Description

Move tokenization out of the single GPU processing goroutine and into each request's HTTP handler goroutine. This allows the next request's prompt to be tokenized on the CPU while the current request is executing on the GPU.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15199 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 4/1/2026 **Status:** ✅ Merged **Merged:** 4/21/2026 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/tokenize` --- ### 📝 Commits (2) - [`03e4836`](https://github.com/ollama/ollama/commit/03e483674570fc46e53c90a1dfda543b7bf8dbf6) mlx: improve thread safety of array management - [`6d6bd64`](https://github.com/ollama/ollama/commit/6d6bd64fb64634b0e1bea9ed0647dc457c25d58d) mlxrunner: tokenize prompts in request handler goroutines ### 📊 Changes **8 files changed** (+83 additions, -58 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/client.go` (+1 -1) 📝 `x/mlxrunner/mlx/array.go` (+32 -19) 📝 `x/mlxrunner/mlx/compile.go` (+1 -1) 📝 `x/mlxrunner/mlx/fast.go` (+4 -4) 📝 `x/mlxrunner/mlx/nn.go` (+5 -5) 📝 `x/mlxrunner/pipeline.go` (+27 -24) 📝 `x/mlxrunner/runner.go` (+8 -4) 📝 `x/mlxrunner/server.go` (+5 -0) </details> ### 📄 Description Move tokenization out of the single GPU processing goroutine and into each request's HTTP handler goroutine. This allows the next request's prompt to be tokenized on the CPU while the current request is executing on the GPU. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 16:47:13 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#61767