[PR #9053] [CLOSED] Wire up new tokenizers for new engine #11579

Closed
opened 2025-11-12 16:17:23 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9053
Author: @dhiltgen
Created: 2/12/2025
Status: Closed

Base: jessegross/new_runnerHead: new_runner_tokenization


📝 Commits (10+)

  • bb18323 next
  • 624bfb0 fix linter
  • 109ad1d refactor prcess text tests
  • 78144e4 model: benchmark bpe split
  • 4a5d19e remove unused file
  • 3c65319 ml: update Dump to handle precision
  • ec73cc4 backend: Don't return an error on Close
  • 75910c9 backend: Consistently use int (vs. int64) for tensor shapes
  • c3d231b backend: Support graph computation that does not return an output
  • 3da3fc0 backend: API to support full precision matmul

📊 Changes

83 files changed (+478266 additions, -689 deletions)

View changed files

📝 cmd/cmd.go (+5 -2)
📝 cmd/runner/main.go (+1 -1)
📝 convert/convert.go (+16 -16)
📝 convert/convert_bert.go (+5 -5)
📝 convert/convert_commandr.go (+5 -5)
📝 convert/convert_gemma.go (+5 -5)
📝 convert/convert_gemma2.go (+2 -4)
📝 convert/convert_gemma2_adapter.go (+5 -5)
📝 convert/convert_llama.go (+6 -6)
📝 convert/convert_llama_adapter.go (+5 -5)
📝 convert/convert_mixtral.go (+5 -5)
📝 convert/convert_phi3.go (+7 -7)
📝 convert/convert_qwen2.go (+5 -5)
📝 convert/convert_test.go (+6 -6)
📝 envconfig/config.go (+18 -0)
📝 fs/ggml/ggml.go (+111 -95)
📝 fs/ggml/gguf.go (+6 -7)
📝 fs/ggml/type.go (+2 -7)
📝 fs/util/bufioutil/buffer_seeker.go (+0 -0)
📝 fs/util/bufioutil/buffer_seeker_test.go (+0 -0)

...and 63 more files

📄 Description

Replaced by #9113

If we're loading the new engine, utilize the new model text processor instead of calling into cgo wrappers for llama.cpp. This also cleans up some tech debt from the older tokenization flow for the C++ server which was no longer used.

This also adjusts the grammar handling logic to pass through to the new engine instead of utilizing the cgo schema to grammar call.

Update: I've added a commit that wires up the ability to default to the new engine and fallback to the old engine if the model isn't supported. This requires the model definitions to detect unsupported models and reject them. Architecture alone is insufficient as many models are "llama" architecture.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9053 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 2/12/2025 **Status:** ❌ Closed **Base:** `jessegross/new_runner` ← **Head:** `new_runner_tokenization` --- ### 📝 Commits (10+) - [`bb18323`](https://github.com/ollama/ollama/commit/bb18323784519f8b1a92b4d73b0fe8d6b3a79ea1) next - [`624bfb0`](https://github.com/ollama/ollama/commit/624bfb0b1105e5ad305a2886ac35c219c1f46645) fix linter - [`109ad1d`](https://github.com/ollama/ollama/commit/109ad1da0f946244e9bb78038db9966f5bc59deb) refactor prcess text tests - [`78144e4`](https://github.com/ollama/ollama/commit/78144e4686ec2a78e9a59f97e1cd722a668290e3) model: benchmark bpe split - [`4a5d19e`](https://github.com/ollama/ollama/commit/4a5d19e38e9ad9ddb96c4b9db7cdfe7c78efb1b4) remove unused file - [`3c65319`](https://github.com/ollama/ollama/commit/3c653195f4838b83f938faa7c9b68306a9a00a0c) ml: update Dump to handle precision - [`ec73cc4`](https://github.com/ollama/ollama/commit/ec73cc46be8056de2c95dd6f717ed94df9fbf42d) backend: Don't return an error on Close - [`75910c9`](https://github.com/ollama/ollama/commit/75910c982076e3981b076f4b4d2fabe4691b6006) backend: Consistently use int (vs. int64) for tensor shapes - [`c3d231b`](https://github.com/ollama/ollama/commit/c3d231b5b39383390345c3cb6b84d6b62d782401) backend: Support graph computation that does not return an output - [`3da3fc0`](https://github.com/ollama/ollama/commit/3da3fc0d619264fa5e2d1b2d1e1b77286c02deb8) backend: API to support full precision matmul ### 📊 Changes **83 files changed** (+478266 additions, -689 deletions) <details> <summary>View changed files</summary> 📝 `cmd/cmd.go` (+5 -2) 📝 `cmd/runner/main.go` (+1 -1) 📝 `convert/convert.go` (+16 -16) 📝 `convert/convert_bert.go` (+5 -5) 📝 `convert/convert_commandr.go` (+5 -5) 📝 `convert/convert_gemma.go` (+5 -5) 📝 `convert/convert_gemma2.go` (+2 -4) 📝 `convert/convert_gemma2_adapter.go` (+5 -5) 📝 `convert/convert_llama.go` (+6 -6) 📝 `convert/convert_llama_adapter.go` (+5 -5) 📝 `convert/convert_mixtral.go` (+5 -5) 📝 `convert/convert_phi3.go` (+7 -7) 📝 `convert/convert_qwen2.go` (+5 -5) 📝 `convert/convert_test.go` (+6 -6) 📝 `envconfig/config.go` (+18 -0) 📝 `fs/ggml/ggml.go` (+111 -95) 📝 `fs/ggml/gguf.go` (+6 -7) 📝 `fs/ggml/type.go` (+2 -7) 📝 `fs/util/bufioutil/buffer_seeker.go` (+0 -0) 📝 `fs/util/bufioutil/buffer_seeker_test.go` (+0 -0) _...and 63 more files_ </details> ### 📄 Description Replaced by #9113 If we're loading the new engine, utilize the new model text processor instead of calling into cgo wrappers for llama.cpp. This also cleans up some tech debt from the older tokenization flow for the C++ server which was no longer used. This also adjusts the grammar handling logic to pass through to the new engine instead of utilizing the cgo schema to grammar call. Update: I've added a commit that wires up the ability to default to the new engine and fallback to the old engine if the model isn't supported. This requires the model definitions to detect unsupported models and reject them. Architecture alone is insufficient as many models are "llama" architecture. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-12 16:17:23 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#11579