[PR #4761] [MERGED] revert tokenize ffi #11585

Closed
opened 2026-04-12 23:32:52 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4761
Author: @mxyng
Created: 6/1/2024
Status: Merged
Merged: 6/1/2024
Merged by: @jmorganca

Base: mainHead: mxyng/revert-tokenize


📝 Commits (3)

  • b95e8e8 Revert "use int32_t for call to tokenize (#4738)"
  • 8241054 Revert "vocab only"
  • 95bff5d Revert "use ffi for tokenizing/detokenizing"

📊 Changes

3 files changed (+144 additions, -72 deletions)

View changed files

📝 llm/ext_server/server.cpp (+43 -0)
📝 llm/llm.go (+0 -60)
📝 llm/server.go (+101 -12)

📄 Description

this change reverts the series of changes introduced to call tokenize/detokenize. there's a bug on windows specifically where it'll segfault loading deepseek-llm's pretokenizer regexp. the most likely candidate is unicode support differences in mingw used by cgo and msvc used by the subprocess


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4761 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 6/1/2024 **Status:** ✅ Merged **Merged:** 6/1/2024 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `mxyng/revert-tokenize` --- ### 📝 Commits (3) - [`b95e8e8`](https://github.com/ollama/ollama/commit/b95e8e8c12ff466542c0a433333e42846c837abe) Revert "use `int32_t` for call to tokenize (#4738)" - [`8241054`](https://github.com/ollama/ollama/commit/824105499078ccec7f33a8b7076325fa3a1a336d) Revert "vocab only" - [`95bff5d`](https://github.com/ollama/ollama/commit/95bff5d92fd367e3ec480e8aab1368af6e7a3030) Revert "use ffi for tokenizing/detokenizing" ### 📊 Changes **3 files changed** (+144 additions, -72 deletions) <details> <summary>View changed files</summary> 📝 `llm/ext_server/server.cpp` (+43 -0) 📝 `llm/llm.go` (+0 -60) 📝 `llm/server.go` (+101 -12) </details> ### 📄 Description this change reverts the series of changes introduced to call tokenize/detokenize. there's a bug on windows specifically where it'll segfault loading deepseek-llm's pretokenizer regexp. the most likely candidate is unicode support differences in mingw used by cgo and msvc used by the subprocess --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:32:52 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11585