[PR #15199] mlxrunner: tokenize prompts in request handler goroutines #15070

Open
opened 2026-04-13 01:09:43 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15199
Author: @jessegross
Created: 4/1/2026
Status: 🔄 Open

Base: mainHead: jessegross/tokenize


📝 Commits (2)

  • 82b0205 mlx: make array management thread-safe
  • f474a63 mlxrunner: tokenize prompts in request handler goroutines

📊 Changes

6 files changed (+78 additions, -56 deletions)

View changed files

📝 x/mlxrunner/mlx/array.go (+29 -19)
📝 x/mlxrunner/mlx/fast.go (+4 -4)
📝 x/mlxrunner/mlx/nn.go (+5 -5)
📝 x/mlxrunner/pipeline.go (+27 -24)
📝 x/mlxrunner/runner.go (+8 -4)
📝 x/mlxrunner/server.go (+5 -0)

📄 Description

Move tokenization out of the single GPU processing goroutine and into each request's HTTP handler goroutine. This allows the next request's prompt to be tokenized on the CPU while the current request is executing on the GPU.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15199 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 4/1/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `jessegross/tokenize` --- ### 📝 Commits (2) - [`82b0205`](https://github.com/ollama/ollama/commit/82b0205061086a34114d28ed1fed6ad3bc3ce324) mlx: make array management thread-safe - [`f474a63`](https://github.com/ollama/ollama/commit/f474a632ab0653b7748d60211216ad40f63b753f) mlxrunner: tokenize prompts in request handler goroutines ### 📊 Changes **6 files changed** (+78 additions, -56 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/mlx/array.go` (+29 -19) 📝 `x/mlxrunner/mlx/fast.go` (+4 -4) 📝 `x/mlxrunner/mlx/nn.go` (+5 -5) 📝 `x/mlxrunner/pipeline.go` (+27 -24) 📝 `x/mlxrunner/runner.go` (+8 -4) 📝 `x/mlxrunner/server.go` (+5 -0) </details> ### 📄 Description Move tokenization out of the single GPU processing goroutine and into each request's HTTP handler goroutine. This allows the next request's prompt to be tokenized on the CPU while the current request is executing on the GPU. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 01:09:43 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#15070