[PR #8739] [CLOSED] model: benchmark bpe text processing #38634

Closed
opened 2026-04-22 23:18:41 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/8739
Author: @BruceMacD
Created: 1/31/2025
Status: Closed

Base: mxyng/nextHead: brucemacd/next-bpe-bench


📝 Commits (4)

📊 Changes

58 files changed (+4009 additions, -474 deletions)

View changed files

cache/cache.go (+63 -0)
📝 convert/convert.go (+16 -16)
📝 convert/convert_bert.go (+5 -5)
📝 convert/convert_commandr.go (+5 -5)
📝 convert/convert_gemma.go (+5 -5)
📝 convert/convert_gemma2.go (+2 -4)
📝 convert/convert_gemma2_adapter.go (+5 -5)
📝 convert/convert_llama.go (+6 -6)
📝 convert/convert_llama_adapter.go (+5 -5)
📝 convert/convert_mixtral.go (+5 -5)
📝 convert/convert_phi3.go (+7 -7)
📝 convert/convert_qwen2.go (+5 -5)
📝 convert/convert_test.go (+6 -6)
📝 fs/ggml/ggml.go (+111 -95)
📝 fs/ggml/gguf.go (+6 -7)
📝 fs/ggml/type.go (+2 -7)
📝 fs/util/bufioutil/buffer_seeker.go (+0 -0)
📝 fs/util/bufioutil/buffer_seeker_test.go (+0 -0)
llm/ggla.go (+0 -149)
llm/ggml_test.go (+0 -1)

...and 38 more files

📄 Description

Added benchmark tests for BPE tokenization that revealed two main bottlenecks: regex operations in the split function (3.56% CPU time) and string allocations during decoding (3.00% CPU time). The benchmarks include various test cases from simple tokens to complex text with special characters. Regex caching and pre-allocated string builders could improve performance, but that should be a different change to prevent conflicts in part of the code-base that is changing a lot.

Running the benchmarks

To run the benchmarks and view performance profiles:

# Install graphviz (required for visualization)
brew install graphviz  # macOS
sudo apt-get install graphviz  # Ubuntu/Debian

# Run benchmarks and generate profiles
go test -bench=. -cpuprofile=cpu.prof -benchmem
go tool pprof -http=:8080 cpu.prof  # Opens visual interface in browser

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/8739 **Author:** [@BruceMacD](https://github.com/BruceMacD) **Created:** 1/31/2025 **Status:** ❌ Closed **Base:** `mxyng/next` ← **Head:** `brucemacd/next-bpe-bench` --- ### 📝 Commits (4) - [`6a41201`](https://github.com/ollama/ollama/commit/6a4120143f87239c6c4a40bc582a7f7669f14bfe) next - [`b21482e`](https://github.com/ollama/ollama/commit/b21482e4a98f15e2cd1505b38a92fe25411a2b5f) fix linter - [`aff6d84`](https://github.com/ollama/ollama/commit/aff6d84e173a927bbf6196980aa743ba881d719e) model: benchmark bpe text processing - [`fa74ae7`](https://github.com/ollama/ollama/commit/fa74ae72142be7683ed968cc2f5a11c5935e373a) use range ### 📊 Changes **58 files changed** (+4009 additions, -474 deletions) <details> <summary>View changed files</summary> ➕ `cache/cache.go` (+63 -0) 📝 `convert/convert.go` (+16 -16) 📝 `convert/convert_bert.go` (+5 -5) 📝 `convert/convert_commandr.go` (+5 -5) 📝 `convert/convert_gemma.go` (+5 -5) 📝 `convert/convert_gemma2.go` (+2 -4) 📝 `convert/convert_gemma2_adapter.go` (+5 -5) 📝 `convert/convert_llama.go` (+6 -6) 📝 `convert/convert_llama_adapter.go` (+5 -5) 📝 `convert/convert_mixtral.go` (+5 -5) 📝 `convert/convert_phi3.go` (+7 -7) 📝 `convert/convert_qwen2.go` (+5 -5) 📝 `convert/convert_test.go` (+6 -6) 📝 `fs/ggml/ggml.go` (+111 -95) 📝 `fs/ggml/gguf.go` (+6 -7) 📝 `fs/ggml/type.go` (+2 -7) 📝 `fs/util/bufioutil/buffer_seeker.go` (+0 -0) 📝 `fs/util/bufioutil/buffer_seeker_test.go` (+0 -0) ➖ `llm/ggla.go` (+0 -149) ➖ `llm/ggml_test.go` (+0 -1) _...and 38 more files_ </details> ### 📄 Description Added benchmark tests for BPE tokenization that revealed two main bottlenecks: regex operations in the split function (3.56% CPU time) and string allocations during decoding (3.00% CPU time). The benchmarks include various test cases from simple tokens to complex text with special characters. Regex caching and pre-allocated string builders could improve performance, but that should be a different change to prevent conflicts in part of the code-base that is changing a lot. # Running the benchmarks To run the benchmarks and view performance profiles: ```bash # Install graphviz (required for visualization) brew install graphviz # macOS sudo apt-get install graphviz # Ubuntu/Debian # Run benchmarks and generate profiles go test -bench=. -cpuprofile=cpu.prof -benchmem go tool pprof -http=:8080 cpu.prof # Opens visual interface in browser ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-22 23:18:41 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#38634