[PR #14091] llama: add Youtu-VL vision model support #45756

Open
opened 2026-04-25 01:24:26 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14091
Author: @svlys
Created: 2/5/2026
Status: 🔄 Open

Base: mainHead: feat/youtu-vl-support


📝 Commits (2)

  • 3d57d04 llama: add Youtu-VL vision model support
  • 0a1c236 Merge branch 'ollama:main' into feat/youtu-vl-support

📊 Changes

12 files changed (+925 additions, -4 deletions)

View changed files

📝 llama/llama.cpp/src/llama-model.cpp (+6 -2)
📝 llama/llama.cpp/src/llama-vocab.cpp (+11 -0)
📝 llama/llama.cpp/src/llama-vocab.h (+1 -0)
📝 llama/llama.cpp/src/models/deepseek2.cpp (+1 -1)
📝 llama/llama.cpp/src/unicode.cpp (+26 -0)
📝 llama/llama.cpp/tools/mtmd/clip-impl.h (+3 -0)
📝 llama/llama.cpp/tools/mtmd/clip-model.h (+1 -0)
📝 llama/llama.cpp/tools/mtmd/clip.cpp (+113 -0)
📝 llama/llama.cpp/tools/mtmd/models/models.h (+5 -0)
llama/llama.cpp/tools/mtmd/models/youtuvl.cpp (+179 -0)
📝 llama/llama.cpp/tools/mtmd/mtmd.cpp (+1 -1)
llama/patches/0034-mtmd-add-Youtu-VL-model-support.patch (+578 -0)

📄 Description

Add support for Youtu-VL multimodal vision model:

  • Add youtuvl.cpp graph builder for vision processing
  • Add PROJECTOR_TYPE_YOUTUVL enum
  • Add wa_layer_indexes for window attention layers
  • Update clip.cpp, clip-impl.h, clip-model.h
  • Update llama-model.cpp, llama-vocab.cpp for tokenizer support
  • Add unicode support for Youtu-VL tokenizer

This is a temporary patch until upstream llama.cpp updates are synced.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14091 **Author:** [@svlys](https://github.com/svlys) **Created:** 2/5/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `feat/youtu-vl-support` --- ### 📝 Commits (2) - [`3d57d04`](https://github.com/ollama/ollama/commit/3d57d049d649d3f83471dee80470cda79b83e58d) llama: add Youtu-VL vision model support - [`0a1c236`](https://github.com/ollama/ollama/commit/0a1c236bf90a1aa647fcbaa7b924478b4ac09e6f) Merge branch 'ollama:main' into feat/youtu-vl-support ### 📊 Changes **12 files changed** (+925 additions, -4 deletions) <details> <summary>View changed files</summary> 📝 `llama/llama.cpp/src/llama-model.cpp` (+6 -2) 📝 `llama/llama.cpp/src/llama-vocab.cpp` (+11 -0) 📝 `llama/llama.cpp/src/llama-vocab.h` (+1 -0) 📝 `llama/llama.cpp/src/models/deepseek2.cpp` (+1 -1) 📝 `llama/llama.cpp/src/unicode.cpp` (+26 -0) 📝 `llama/llama.cpp/tools/mtmd/clip-impl.h` (+3 -0) 📝 `llama/llama.cpp/tools/mtmd/clip-model.h` (+1 -0) 📝 `llama/llama.cpp/tools/mtmd/clip.cpp` (+113 -0) 📝 `llama/llama.cpp/tools/mtmd/models/models.h` (+5 -0) ➕ `llama/llama.cpp/tools/mtmd/models/youtuvl.cpp` (+179 -0) 📝 `llama/llama.cpp/tools/mtmd/mtmd.cpp` (+1 -1) ➕ `llama/patches/0034-mtmd-add-Youtu-VL-model-support.patch` (+578 -0) </details> ### 📄 Description Add support for Youtu-VL multimodal vision model: - Add youtuvl.cpp graph builder for vision processing - Add PROJECTOR_TYPE_YOUTUVL enum - Add wa_layer_indexes for window attention layers - Update clip.cpp, clip-impl.h, clip-model.h - Update llama-model.cpp, llama-vocab.cpp for tokenizer support - Add unicode support for Youtu-VL tokenizer This is a temporary patch until upstream llama.cpp updates are synced. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 01:24:26 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#45756