[PR #13528] Prototype: online split vision model merging for ollama engine #40131

Open
opened 2026-04-23 01:06:28 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13528
Author: @cvrunmin
Created: 12/19/2025
Status: 🔄 Open

Base: mainHead: feat/online-merge-mmproj


📝 Commits (1)

  • e68d605 model: add tensor names in mmproj

📊 Changes

11 files changed (+145 additions, -30 deletions)

View changed files

📝 llm/server.go (+13 -3)
📝 model/model.go (+54 -11)
📝 model/model_test.go (+4 -4)
📝 model/models/gemma3/model.go (+4 -0)
📝 model/models/gemma3/model_vision.go (+3 -3)
📝 model/models/mistral3/model.go (+8 -4)
📝 model/models/mistral3/model_vision.go (+2 -2)
📝 model/models/qwen25vl/model.go (+23 -0)
📝 model/models/qwen3vl/model.go (+23 -0)
📝 model/models/qwen3vl/model_vision.go (+9 -2)
📝 runner/ollamarunner/runner.go (+2 -1)

📄 Description

Summery

This PR provides functionality for ollama engine to load split vision model (i.e. a model with base model weight and a "mmproj" weight) by merging them when they are loaded (online merging). As most (if not all) of quantized models in the community follows the convention from llama.cpp where multi-modal projector is saved into a separated file, while ollama engine will only load GGUF multi-modal models converted from ollama itself, merging is needed for community model to be enable to load in ollama engine.

Changes

  1. Add Model.PostPopulate(). This method allows models to perform model-specific loading step after ggml tensors are populated into the model. This is useful if llama.cpp has changed too much in the naming of projector tensors like in Qwen2.5VL and Qwen3VL, which cannot be simply adding alternative name for tensors.
  2. Add Model.IsOnlineProjectorMergingSupported(). This method provides information for serve server if the model to load is ready for projector merging or not.
  3. Add alternative names of gemma3 projector used in mmproj.
  4. Add alternative names of mistral3 projector used in mmproj.
  5. Add post populate logic for qwen2.5vl and qwen3vl for mmproj loading.

This PR is WIP

Note that this PR only contains information for mmproj projector merging as of today (19 Dec). Following functions are still working on progress:

  1. Merging metadata of projector into main model
  2. Allows GGML backend to load multiple model weight files

Which is also (mostly) covered by #13259, a PR that is for general gguf split model. This PR shall proceed after that PR is merged. Nevertheless, The functionality for this PR to work can be cherrypicked if that PR cannot be merged in short time. Please leave your comment if you have any suggestions or anything to share.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13528 **Author:** [@cvrunmin](https://github.com/cvrunmin) **Created:** 12/19/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `feat/online-merge-mmproj` --- ### 📝 Commits (1) - [`e68d605`](https://github.com/ollama/ollama/commit/e68d6054c1cd0472c6a26c02ae1c76954cf66f82) model: add tensor names in mmproj ### 📊 Changes **11 files changed** (+145 additions, -30 deletions) <details> <summary>View changed files</summary> 📝 `llm/server.go` (+13 -3) 📝 `model/model.go` (+54 -11) 📝 `model/model_test.go` (+4 -4) 📝 `model/models/gemma3/model.go` (+4 -0) 📝 `model/models/gemma3/model_vision.go` (+3 -3) 📝 `model/models/mistral3/model.go` (+8 -4) 📝 `model/models/mistral3/model_vision.go` (+2 -2) 📝 `model/models/qwen25vl/model.go` (+23 -0) 📝 `model/models/qwen3vl/model.go` (+23 -0) 📝 `model/models/qwen3vl/model_vision.go` (+9 -2) 📝 `runner/ollamarunner/runner.go` (+2 -1) </details> ### 📄 Description ## Summery This PR provides functionality for ollama engine to load split vision model (i.e. a model with base model weight and a "mmproj" weight) by merging them when they are loaded (online merging). As most (if not all) of quantized models in the community follows the convention from llama.cpp where multi-modal projector is saved into a separated file, while ollama engine will only load GGUF multi-modal models converted from ollama itself, merging is needed for community model to be enable to load in ollama engine. ## Changes 1. Add `Model.PostPopulate()`. This method allows models to perform model-specific loading step after ggml tensors are populated into the model. This is useful if llama.cpp has changed too much in the naming of projector tensors like in Qwen2.5VL and Qwen3VL, which cannot be simply adding alternative name for tensors. 2. Add `Model.IsOnlineProjectorMergingSupported()`. This method provides information for serve server if the model to load is ready for projector merging or not. 3. Add alternative names of gemma3 projector used in mmproj. 4. Add alternative names of mistral3 projector used in mmproj. 5. Add post populate logic for qwen2.5vl and qwen3vl for mmproj loading. ## This PR is WIP Note that this PR only contains information for mmproj projector merging as of today (19 Dec). Following functions are still working on progress: 1. Merging metadata of projector into main model 2. Allows GGML backend to load multiple model weight files Which is also (mostly) covered by #13259, a PR that is for general gguf split model. This PR shall proceed after that PR is merged. Nevertheless, The functionality for this PR to work can be cherrypicked if that PR cannot be merged in short time. Please leave your comment if you have any suggestions or anything to share. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-23 01:06:28 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#40131