[PR #10141] [MERGED] models: llama4 multimodal #75449

Closed
opened 2026-05-05 07:52:59 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10141
Author: @mxyng
Created: 4/5/2025
Status: Merged
Merged: 4/25/2025
Merged by: @mxyng

Base: mainHead: mxyng/llama4


📝 Commits (6)

📊 Changes

17 files changed (+1513 additions, -19 deletions)

View changed files

📝 convert/convert.go (+2 -0)
📝 convert/convert_llama.go (+10 -3)
convert/convert_llama4.go (+169 -0)
📝 convert/reader.go (+9 -7)
📝 convert/reader_safetensors.go (+15 -0)
📝 convert/reader_torch.go (+11 -0)
📝 fs/ggml/ggml.go (+6 -2)
📝 kvcache/causal.go (+13 -0)
📝 kvcache/causal_test.go (+68 -2)
📝 ml/backend.go (+4 -0)
📝 ml/backend/ggml/ggml.go (+34 -5)
model/models/llama4/model.go (+189 -0)
model/models/llama4/model_text.go (+259 -0)
model/models/llama4/model_vision.go (+256 -0)
model/models/llama4/process_image.go (+167 -0)
model/models/llama4/process_image_test.go (+300 -0)
📝 model/models/models.go (+1 -0)

📄 Description

This change adds Meta's Llama 4 model to Ollama. This has mainly been tested with the 16E Scout model.

Key model features:

  • Mixture of expert
  • Multimodal (vision, text)

Of note, there are a couple things of note:

  • The vision model is trained on tiled images which requires special handling for both the pixel values and rotary embeddings. Both require 5D tensors which is not supported by ggml so the operation has been split into multiple operations
  • The text model uses chunked attention mask for select layers and causal attention for other layers. This requires a minor change to kvcache.Causal similar to sliding window attention

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10141 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 4/5/2025 **Status:** ✅ Merged **Merged:** 4/25/2025 **Merged by:** [@mxyng](https://github.com/mxyng) **Base:** `main` ← **Head:** `mxyng/llama4` --- ### 📝 Commits (6) - [`729ab3b`](https://github.com/ollama/ollama/commit/729ab3b699fa85d5320790cc4003bfae54976195) llama4 - [`4de5ec2`](https://github.com/ollama/ollama/commit/4de5ec242634e447736005841f97cccccac07c40) image processing - [`34f93a3`](https://github.com/ollama/ollama/commit/34f93a3e7f6b144c86dcb6d074f65ef5e26f10b7) connect vision to text - [`30af287`](https://github.com/ollama/ollama/commit/30af287fc28227d28771026e28462102599065db) chunked attention - [`8b97f8a`](https://github.com/ollama/ollama/commit/8b97f8adb60e60859d256671ba6b8fb01f2008a6) fixes for maverick - [`e0c0311`](https://github.com/ollama/ollama/commit/e0c031124916bb48b24a4c9fc3738eb56b28b761) memory ### 📊 Changes **17 files changed** (+1513 additions, -19 deletions) <details> <summary>View changed files</summary> 📝 `convert/convert.go` (+2 -0) 📝 `convert/convert_llama.go` (+10 -3) ➕ `convert/convert_llama4.go` (+169 -0) 📝 `convert/reader.go` (+9 -7) 📝 `convert/reader_safetensors.go` (+15 -0) 📝 `convert/reader_torch.go` (+11 -0) 📝 `fs/ggml/ggml.go` (+6 -2) 📝 `kvcache/causal.go` (+13 -0) 📝 `kvcache/causal_test.go` (+68 -2) 📝 `ml/backend.go` (+4 -0) 📝 `ml/backend/ggml/ggml.go` (+34 -5) ➕ `model/models/llama4/model.go` (+189 -0) ➕ `model/models/llama4/model_text.go` (+259 -0) ➕ `model/models/llama4/model_vision.go` (+256 -0) ➕ `model/models/llama4/process_image.go` (+167 -0) ➕ `model/models/llama4/process_image_test.go` (+300 -0) 📝 `model/models/models.go` (+1 -0) </details> ### 📄 Description This change adds Meta's [Llama 4](https://huggingface.co/collections/meta-llama/llama-4-67f0c30d9fe03840bc9d0164) model to Ollama. This has mainly been tested with the 16E Scout model. Key model features: * Mixture of expert * Multimodal (vision, text) Of note, there are a couple things of note: * The vision model is trained on tiled images which requires special handling for both the pixel values and rotary embeddings. Both require 5D tensors which is not supported by ggml so the operation has been split into multiple operations * The text model uses chunked attention mask for select layers and causal attention for other layers. This requires a minor change to `kvcache.Causal` similar to sliding window attention --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 07:52:59 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#75449