[PR #6963] [MERGED] llama3.2 vision support #17545

Closed
opened 2026-04-16 06:05:54 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/6963
Author: @pdevine
Created: 9/25/2024
Status: Merged
Merged: 10/18/2024
Merged by: @pdevine

Base: mainHead: pdevine/imageproc


📝 Commits (1)

  • ea11cb4 image processing for llama3.2

📊 Changes

35 files changed (+3849 additions, -201 deletions)

View changed files

📝 cmd/cmd.go (+1 -2)
📝 cmd/interactive.go (+20 -27)
📝 convert/convert_test.go (+2 -2)
📝 go.mod (+1 -0)
📝 go.sum (+2 -0)
📝 llama/ggml-cuda.cu (+4 -0)
📝 llama/ggml-cuda/pad.cu (+46 -0)
📝 llama/ggml-cuda/pad.cuh (+1 -0)
📝 llama/ggml-metal.metal (+45 -0)
📝 llama/ggml-metal_darwin_arm64.m (+33 -0)
📝 llama/ggml.c (+91 -2)
📝 llama/ggml.h (+10 -0)
📝 llama/llama.cpp (+443 -13)
📝 llama/llama.go (+91 -5)
📝 llama/llama.h (+4 -0)
llama/mllama.cpp (+900 -0)
llama/mllama.h (+61 -0)
llama/patches/0010-add-mllama-support.patch (+690 -0)
llama/patches/0011-add-unpad-operator.patch (+409 -0)
📝 llama/runner/runner.go (+31 -3)

...and 15 more files

📄 Description

Image processing routines for being able to run llama3.2.

This will need to be refactored at some point to support other multimodal models as well.

EDIT: This now includes all of the code for getting vision support to work, and not just the image processing routines. It's still not 100% though, but good enough to test out and kick the tires.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/6963 **Author:** [@pdevine](https://github.com/pdevine) **Created:** 9/25/2024 **Status:** ✅ Merged **Merged:** 10/18/2024 **Merged by:** [@pdevine](https://github.com/pdevine) **Base:** `main` ← **Head:** `pdevine/imageproc` --- ### 📝 Commits (1) - [`ea11cb4`](https://github.com/ollama/ollama/commit/ea11cb4ff4aedfc1e9dcb9ffc94eecb78ee2bbb0) image processing for llama3.2 ### 📊 Changes **35 files changed** (+3849 additions, -201 deletions) <details> <summary>View changed files</summary> 📝 `cmd/cmd.go` (+1 -2) 📝 `cmd/interactive.go` (+20 -27) 📝 `convert/convert_test.go` (+2 -2) 📝 `go.mod` (+1 -0) 📝 `go.sum` (+2 -0) 📝 `llama/ggml-cuda.cu` (+4 -0) 📝 `llama/ggml-cuda/pad.cu` (+46 -0) 📝 `llama/ggml-cuda/pad.cuh` (+1 -0) 📝 `llama/ggml-metal.metal` (+45 -0) 📝 `llama/ggml-metal_darwin_arm64.m` (+33 -0) 📝 `llama/ggml.c` (+91 -2) 📝 `llama/ggml.h` (+10 -0) 📝 `llama/llama.cpp` (+443 -13) 📝 `llama/llama.go` (+91 -5) 📝 `llama/llama.h` (+4 -0) ➕ `llama/mllama.cpp` (+900 -0) ➕ `llama/mllama.h` (+61 -0) ➕ `llama/patches/0010-add-mllama-support.patch` (+690 -0) ➕ `llama/patches/0011-add-unpad-operator.patch` (+409 -0) 📝 `llama/runner/runner.go` (+31 -3) _...and 15 more files_ </details> ### 📄 Description Image processing routines for being able to run llama3.2. This will need to be refactored at some point to support other multimodal models as well. EDIT: This now includes all of the code for getting vision support to work, and not just the image processing routines. It's still not 100% though, but good enough to test out and kick the tires. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:05:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#17545