[PR #10096] [MERGED] llama: remove model loading for grammar #13146

Closed
opened 2026-04-13 00:19:06 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10096
Author: @ParthSareen
Created: 4/2/2025
Status: Merged
Merged: 4/24/2025
Merged by: @ParthSareen

Base: mainHead: parth/sampling-remove-model-loading-for-grammar


📝 Commits (10+)

  • e5cdd3d llama: remove model loading from grammar
  • ee1e529 model: expose vocabulary to use for sampling
  • f484bc3 runner: use new grammar interface
  • e1053c8 sample: use grammar interface without modedl loading
  • 1ff5a09 llama: cleanup unused grammar with model loading
  • 683ba9a server: improving grammar for JSON
  • 638cd62 llama: fix naming in grammar
  • 0a6ce62 sample: add nil check for grammar
  • d156588 cleanup
  • e47a9c9 llama: shim to using ollama_vocab instead of copy

📊 Changes

13 files changed (+514 additions, -100 deletions)

View changed files

📝 llama/llama.cpp/src/llama-grammar.cpp (+42 -7)
📝 llama/llama.cpp/src/llama-grammar.h (+14 -0)
📝 llama/llama.cpp/src/llama-sampling.cpp (+2 -2)
📝 llama/llama.go (+59 -35)
llama/patches/0021-add-ollama-vocab-for-grammar-support.patch (+207 -0)
📝 llama/sampling_ext.cpp (+47 -0)
📝 llama/sampling_ext.h (+6 -2)
📝 model/models/mistral3/model.go (+3 -0)
📝 model/process_text.go (+7 -0)
📝 model/process_text_spm.go (+4 -0)
📝 runner/ollamarunner/runner.go (+3 -10)
📝 sample/samplers.go (+24 -44)
📝 sample/samplers_test.go (+96 -0)

📄 Description

Removing model loading

Currently, the model vocab was loaded separately using the model path to instantiate a llama_sampler

Model loading for structured outputs is removed through passing in vocab directly for grammar package.

  • Grammar package was copied over to make changes to without influencing previous interaction with models.
  • Sampling-relevant functions were added to sampling_ext and llama.go for interfacing with cpp.
  • Modified interfaces in runner and sampling to use package without model loading.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10096 **Author:** [@ParthSareen](https://github.com/ParthSareen) **Created:** 4/2/2025 **Status:** ✅ Merged **Merged:** 4/24/2025 **Merged by:** [@ParthSareen](https://github.com/ParthSareen) **Base:** `main` ← **Head:** `parth/sampling-remove-model-loading-for-grammar` --- ### 📝 Commits (10+) - [`e5cdd3d`](https://github.com/ollama/ollama/commit/e5cdd3d1573789962096bfe71380db83f71254b3) llama: remove model loading from grammar - [`ee1e529`](https://github.com/ollama/ollama/commit/ee1e5297e82829d6c3434bb9ee6a5ad706236117) model: expose vocabulary to use for sampling - [`f484bc3`](https://github.com/ollama/ollama/commit/f484bc303dbd6e84c579c93f97ba7f119a45a3c9) runner: use new grammar interface - [`e1053c8`](https://github.com/ollama/ollama/commit/e1053c8c037dc4c411203c05300b05b261c29912) sample: use grammar interface without modedl loading - [`1ff5a09`](https://github.com/ollama/ollama/commit/1ff5a09e6c811622db21f9756b1aafbf2bc17ecd) llama: cleanup unused grammar with model loading - [`683ba9a`](https://github.com/ollama/ollama/commit/683ba9aa415239ee372772431eb4ceb818fc30b1) server: improving grammar for JSON - [`638cd62`](https://github.com/ollama/ollama/commit/638cd625a9118f9a857fceab393548ccf61aab92) llama: fix naming in grammar - [`0a6ce62`](https://github.com/ollama/ollama/commit/0a6ce629fb31370cc08d7b03af126aa9bdaa6722) sample: add nil check for grammar - [`d156588`](https://github.com/ollama/ollama/commit/d1565883d0d72b07424a18bbab375444bc515c02) cleanup - [`e47a9c9`](https://github.com/ollama/ollama/commit/e47a9c93a5d7397724de9d0ee0bfb52ed8e1a199) llama: shim to using ollama_vocab instead of copy ### 📊 Changes **13 files changed** (+514 additions, -100 deletions) <details> <summary>View changed files</summary> 📝 `llama/llama.cpp/src/llama-grammar.cpp` (+42 -7) 📝 `llama/llama.cpp/src/llama-grammar.h` (+14 -0) 📝 `llama/llama.cpp/src/llama-sampling.cpp` (+2 -2) 📝 `llama/llama.go` (+59 -35) ➕ `llama/patches/0021-add-ollama-vocab-for-grammar-support.patch` (+207 -0) 📝 `llama/sampling_ext.cpp` (+47 -0) 📝 `llama/sampling_ext.h` (+6 -2) 📝 `model/models/mistral3/model.go` (+3 -0) 📝 `model/process_text.go` (+7 -0) 📝 `model/process_text_spm.go` (+4 -0) 📝 `runner/ollamarunner/runner.go` (+3 -10) 📝 `sample/samplers.go` (+24 -44) 📝 `sample/samplers_test.go` (+96 -0) </details> ### 📄 Description ## Removing model loading Currently, the model vocab was loaded separately using the model path to instantiate a `llama_sampler` Model loading for structured outputs is removed through passing in vocab directly for grammar package. - Grammar package was copied over to make changes to without influencing previous interaction with models. - Sampling-relevant functions were added to `sampling_ext` and `llama.go` for interfacing with cpp. - Modified interfaces in runner and sampling to use package without model loading. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:19:06 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13146