[PR #13305] [CLOSED] feat: add M-RoPE support for Qwen2-VL and Qwen3-VL vision models #14158

Closed
opened 2026-04-13 00:46:53 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13305
Author: @iosub
Created: 12/2/2025
Status: Closed

Base: mainHead: feat/mrope-clean


📝 Commits (8)

  • a9c6818 Revert "vulkan: temporary cary of vulkan fixes (#12971)"
  • d2917b7 ggml update to b7087
  • 366ed3e fix argsort on metal
  • 9a4271c update to b7108
  • af56743 fix bakllava regression
  • 4fd4574 fix lint logic to only compare against merge base and ignore files that aren't touched in this PR.
  • 9b52964 update to b7209 - performance regressions...
  • 20fad26 feat: add M-RoPE support for Qwen2-VL and Qwen3-VL vision models

📊 Changes

307 files changed (+33674 additions, -23737 deletions)

View changed files

📝 .github/workflows/test.yaml (+1 -1)
📝 Makefile.sync (+1 -1)
📝 llama/build-info.cpp (+1 -1)
📝 llama/llama.cpp/.rsync-filter (+3 -0)
📝 llama/llama.cpp/common/common.cpp (+97 -6)
📝 llama/llama.cpp/common/common.h (+45 -6)
📝 llama/llama.cpp/common/json-schema-to-grammar.cpp (+23 -5)
📝 llama/llama.cpp/common/json-schema-to-grammar.h (+2 -0)
📝 llama/llama.cpp/common/log.cpp (+6 -0)
📝 llama/llama.cpp/common/log.h (+2 -0)
📝 llama/llama.cpp/common/sampling.cpp (+60 -6)
📝 llama/llama.cpp/include/llama.h (+25 -3)
📝 llama/llama.cpp/src/llama-arch.cpp (+235 -16)
📝 llama/llama.cpp/src/llama-arch.h (+28 -0)
📝 llama/llama.cpp/src/llama-batch.cpp (+63 -31)
📝 llama/llama.cpp/src/llama-batch.h (+12 -1)
📝 llama/llama.cpp/src/llama-chat.cpp (+32 -0)
📝 llama/llama.cpp/src/llama-chat.h (+1 -0)
📝 llama/llama.cpp/src/llama-context.cpp (+51 -19)
📝 llama/llama.cpp/src/llama-context.h (+5 -5)

...and 80 more files

📄 Description

Summary

This PR adds Multi-dimensional Rotary Position Embedding (M-RoPE) support for Qwen2-VL and Qwen3-VL vision-language models.

Dependency: This PR requires #12992 to be merged first (qwen3vl architecture support in llama.cpp).

Problem

Ollama sets only 1 position per token, but Qwen3-VL's M-RoPE expects 4 positions with 2D spatial encoding:

  • pos[0] = temporal (frame/image index)
  • pos[1] = y position (row in patch grid)
  • pos[2] = x position (column in patch grid)
  • pos[3] = unused (reserved)

Changes (5 files)

File Description
llama/llama.go NewBatchMRoPE(), AddImageMRoPE(), NEmbdInp(), UsesMRoPE()
runner/llamarunner/runner.go M-RoPE batch handling, numTokens vs numPos
runner/llamarunner/image.go BatchSize 8192 for M-RoPE models
runner/llamarunner/cache.go Clear KV cache for image prompts
llama/patches/0032-*.patch Fix n_embd vs n_embd_inp for vision embeddings

Testing

Tested with Qwen3-VL 2B and 8B split models - images are correctly described.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13305 **Author:** [@iosub](https://github.com/iosub) **Created:** 12/2/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `feat/mrope-clean` --- ### 📝 Commits (8) - [`a9c6818`](https://github.com/ollama/ollama/commit/a9c6818a2bc1e3aa7194b577a4fff24abdaa400e) Revert "vulkan: temporary cary of vulkan fixes (#12971)" - [`d2917b7`](https://github.com/ollama/ollama/commit/d2917b76aefa322dc2a5870ebdbdc361671f0ddf) ggml update to b7087 - [`366ed3e`](https://github.com/ollama/ollama/commit/366ed3e30f593c346aef6dd826527aa78f911e1c) fix argsort on metal - [`9a4271c`](https://github.com/ollama/ollama/commit/9a4271cafa81d37f45bf1ed9e9c4b70216f1ff2f) update to b7108 - [`af56743`](https://github.com/ollama/ollama/commit/af567437c959e54bafc2cded3fdfc2a3d7676f6e) fix bakllava regression - [`4fd4574`](https://github.com/ollama/ollama/commit/4fd45744f034b3212df22121b9524123fc5e2ca5) fix lint logic to only compare against merge base and ignore files that aren't touched in this PR. - [`9b52964`](https://github.com/ollama/ollama/commit/9b5296486bb9137bd919bb0b2e47d1e302e352d0) update to b7209 - performance regressions... - [`20fad26`](https://github.com/ollama/ollama/commit/20fad26f8990bd7fec23285a023988330fc96a6b) feat: add M-RoPE support for Qwen2-VL and Qwen3-VL vision models ### 📊 Changes **307 files changed** (+33674 additions, -23737 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/test.yaml` (+1 -1) 📝 `Makefile.sync` (+1 -1) 📝 `llama/build-info.cpp` (+1 -1) 📝 `llama/llama.cpp/.rsync-filter` (+3 -0) 📝 `llama/llama.cpp/common/common.cpp` (+97 -6) 📝 `llama/llama.cpp/common/common.h` (+45 -6) 📝 `llama/llama.cpp/common/json-schema-to-grammar.cpp` (+23 -5) 📝 `llama/llama.cpp/common/json-schema-to-grammar.h` (+2 -0) 📝 `llama/llama.cpp/common/log.cpp` (+6 -0) 📝 `llama/llama.cpp/common/log.h` (+2 -0) 📝 `llama/llama.cpp/common/sampling.cpp` (+60 -6) 📝 `llama/llama.cpp/include/llama.h` (+25 -3) 📝 `llama/llama.cpp/src/llama-arch.cpp` (+235 -16) 📝 `llama/llama.cpp/src/llama-arch.h` (+28 -0) 📝 `llama/llama.cpp/src/llama-batch.cpp` (+63 -31) 📝 `llama/llama.cpp/src/llama-batch.h` (+12 -1) 📝 `llama/llama.cpp/src/llama-chat.cpp` (+32 -0) 📝 `llama/llama.cpp/src/llama-chat.h` (+1 -0) 📝 `llama/llama.cpp/src/llama-context.cpp` (+51 -19) 📝 `llama/llama.cpp/src/llama-context.h` (+5 -5) _...and 80 more files_ </details> ### 📄 Description ## Summary This PR adds Multi-dimensional Rotary Position Embedding (M-RoPE) support for Qwen2-VL and Qwen3-VL vision-language models. > **Dependency**: This PR requires #12992 to be merged first (qwen3vl architecture support in llama.cpp). ## Problem Ollama sets only 1 position per token, but Qwen3-VL's M-RoPE expects 4 positions with 2D spatial encoding: - pos[0] = temporal (frame/image index) - pos[1] = y position (row in patch grid) - pos[2] = x position (column in patch grid) - pos[3] = unused (reserved) ## Changes (5 files) | File | Description | |------|-------------| | llama/llama.go | NewBatchMRoPE(), AddImageMRoPE(), NEmbdInp(), UsesMRoPE() | | runner/llamarunner/runner.go | M-RoPE batch handling, numTokens vs numPos | | runner/llamarunner/image.go | BatchSize 8192 for M-RoPE models | | runner/llamarunner/cache.go | Clear KV cache for image prompts | | llama/patches/0032-*.patch | Fix n_embd vs n_embd_inp for vision embeddings | ## Testing Tested with Qwen3-VL 2B and 8B split models - images are correctly described. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:46:53 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14158