[PR #9113] [MERGED] New engine: vision models and auto-fallback #75158

Closed
opened 2026-05-05 07:35:27 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9113
Author: @dhiltgen
Created: 2/14/2025
Status: Merged
Merged: 3/4/2025
Merged by: @dhiltgen

Base: mainHead: vision_models


📝 Commits (4)

  • b54cdde Include unified vision layers in memory prediction
  • 0fee4df Adjust CLI to handle both styles of vision model metadata
  • a0facb0 Wire up new tokenizers for new engine
  • a5bbe02 Lay foundation for auto selection of new engine

📊 Changes

10 files changed (+249 additions, -170 deletions)

View changed files

📝 cmd/cmd.go (+10 -4)
📝 fs/ggml/ggml.go (+37 -0)
📝 llm/memory.go (+3 -0)
📝 llm/server.go (+115 -163)
📝 model/model.go (+31 -0)
📝 model/model_test.go (+39 -0)
📝 model/models/llama/model.go (+6 -0)
📝 model/models/mllama/model.go (+6 -0)
📝 server/prompt.go (+1 -2)
📝 server/routes.go (+1 -1)

📄 Description

This combines #9025 and #9053 on main.

For newer vision models with a single gguf, include the projection memory estimates.

Note: I debated DRYing this out with memory.go projectorMemoryRequirements (which it is derived from) but they're different now, and may continue to evolve independently as we nail down the metadata formats, so having a distinct function felt like less friction.

This adjusts the CLI to be able to detect both styles of vision model.

If we're loading the new engine, utilize the new model text processor instead of calling into cgo wrappers for llama.cpp. This also cleans up some tech debt from the older tokenization flow for the C++ server which is no longer used.

This adjusts the grammar handling logic to pass through to the new engine instead of utilizing the cgo schema to grammar call.

This wires up the ability to default to the new engine and fallback to the old engine if the model isn't supported. This requires the model definitions to detect unsupported models and reject them. Architecture alone is insufficient as many models are "llama" architecture.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9113 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 2/14/2025 **Status:** ✅ Merged **Merged:** 3/4/2025 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `vision_models` --- ### 📝 Commits (4) - [`b54cdde`](https://github.com/ollama/ollama/commit/b54cdde377064ada50a4c2b1654778d8f9a516c2) Include unified vision layers in memory prediction - [`0fee4df`](https://github.com/ollama/ollama/commit/0fee4df524f598418b014c7a3f433bcb24290ef3) Adjust CLI to handle both styles of vision model metadata - [`a0facb0`](https://github.com/ollama/ollama/commit/a0facb0b60a9dda2046cec730fb6862841fb53c7) Wire up new tokenizers for new engine - [`a5bbe02`](https://github.com/ollama/ollama/commit/a5bbe02ce9d4e42f0e06b38a42d04daba1bebaa4) Lay foundation for auto selection of new engine ### 📊 Changes **10 files changed** (+249 additions, -170 deletions) <details> <summary>View changed files</summary> 📝 `cmd/cmd.go` (+10 -4) 📝 `fs/ggml/ggml.go` (+37 -0) 📝 `llm/memory.go` (+3 -0) 📝 `llm/server.go` (+115 -163) 📝 `model/model.go` (+31 -0) 📝 `model/model_test.go` (+39 -0) 📝 `model/models/llama/model.go` (+6 -0) 📝 `model/models/mllama/model.go` (+6 -0) 📝 `server/prompt.go` (+1 -2) 📝 `server/routes.go` (+1 -1) </details> ### 📄 Description This combines #9025 and #9053 on main. For newer vision models with a single gguf, include the projection memory estimates. Note: I debated DRYing this out with memory.go projectorMemoryRequirements (which it is derived from) but they're different now, and may continue to evolve independently as we nail down the metadata formats, so having a distinct function felt like less friction. This adjusts the CLI to be able to detect both styles of vision model. If we're loading the new engine, utilize the new model text processor instead of calling into cgo wrappers for llama.cpp. This also cleans up some tech debt from the older tokenization flow for the C++ server which is no longer used. This adjusts the grammar handling logic to pass through to the new engine instead of utilizing the cgo schema to grammar call. This wires up the ability to default to the new engine and fallback to the old engine if the model isn't supported. This requires the model definitions to detect unsupported models and reject them. Architecture alone is insufficient as many models are "llama" architecture. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 07:35:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#75158