[PR #9672] [CLOSED] suport minicpmo and fix minicpmv #18306

Closed
opened 2026-04-16 06:31:28 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9672
Author: @tc-mb
Created: 3/12/2025
Status: Closed

Base: mainHead: support-minicpmo-and-fix-minicpmv


📝 Commits (10+)

📊 Changes

19 files changed (+11396 additions, -78 deletions)

View changed files

📝 api/types.go (+7 -0)
📝 cmd/cmd.go (+7 -1)
📝 cmd/interactive.go (+143 -43)
📝 cmd/interactive_test.go (+2 -2)
llama/llama.cpp/examples/llava/audio.cpp (+1447 -0)
llama/llama.cpp/examples/llava/audio.h (+48 -0)
llama/llama.cpp/examples/llava/audio_common.cpp (+341 -0)
llama/llama.cpp/examples/llava/audio_common.h (+170 -0)
📝 llama/llama.cpp/examples/llava/clip.cpp (+6 -2)
📝 llama/llama.cpp/examples/llava/clip.h (+2 -1)
llama/llama.cpp/examples/llava/dr_wav.h (+8815 -0)
📝 llama/llama.cpp/examples/llava/llava.cpp (+53 -1)
📝 llama/llama.cpp/examples/llava/llava.h (+4 -0)
📝 llama/llama.go (+84 -0)
📝 llm/server.go (+10 -4)
📝 runner/llamarunner/image.go (+37 -0)
📝 runner/llamarunner/runner.go (+195 -8)
📝 server/prompt.go (+17 -11)
📝 server/routes.go (+8 -5)

📄 Description

Hi Ollama community ,
I’d like to contribute a PR to the Ollama community that implements Omni capabilities using the MiniCPM-o 2.6 model. This enhancement introduces full-modal (Omni) understanding, enabling the model to process and interact with multiple types of inputs and outputs, including text input, audio input, video input, text output, and audio output.
Since Omni capabilities significantly change interaction logic, I would love to hear your thoughts on the implementation details and whether any adjustments are needed to align with Ollama’s overall design philosophy.
Additionally, I noticed that the schema of MiniCPM-V in Ollama appears to be misplaced, which could impact the model’s accuracy. I have addressed this issue in this PR as well. Looking forward to your feedback and suggestions!


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9672 **Author:** [@tc-mb](https://github.com/tc-mb) **Created:** 3/12/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `support-minicpmo-and-fix-minicpmv` --- ### 📝 Commits (10+) - [`6a69927`](https://github.com/ollama/ollama/commit/6a699274516edc3af70a33612157033a85297a3c) fix schema in minicpmv - [`bb2bcb4`](https://github.com/ollama/ollama/commit/bb2bcb43af577230d6d5cd35ffad045e3960c7d9) add video and audio type input - [`c79fb6c`](https://github.com/ollama/ollama/commit/c79fb6c3d99c63a26cc5d169a0777a8dbb750aa7) Merge branch 'main' into support-minicpmo-and-fix-minicpmv - [`3ce5419`](https://github.com/ollama/ollama/commit/3ce541943eb4f5ba59263073f90a574440f03cee) add minicpmv_version - [`65bac73`](https://github.com/ollama/ollama/commit/65bac73daf6a8bee589cad06b55301ffee85225e) fix code - [`62750c2`](https://github.com/ollama/ollama/commit/62750c2d86edc9249eb8a0547430ee7447d24911) update video schema - [`569de8c`](https://github.com/ollama/ollama/commit/569de8c45e1b712ab0d1f307d917f8c7830c40d8) add audio cpp - [`066ba19`](https://github.com/ollama/ollama/commit/066ba190ee903237384cad0ebcdeeceaec241c21) add audio support - [`fb9e95e`](https://github.com/ollama/ollama/commit/fb9e95e8f810daf9d5a39561d80622fc4d2e619a) fix cpp err - [`f577ba5`](https://github.com/ollama/ollama/commit/f577ba525b71a4d8a8c715be691a0479370849eb) fix audio ### 📊 Changes **19 files changed** (+11396 additions, -78 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+7 -0) 📝 `cmd/cmd.go` (+7 -1) 📝 `cmd/interactive.go` (+143 -43) 📝 `cmd/interactive_test.go` (+2 -2) ➕ `llama/llama.cpp/examples/llava/audio.cpp` (+1447 -0) ➕ `llama/llama.cpp/examples/llava/audio.h` (+48 -0) ➕ `llama/llama.cpp/examples/llava/audio_common.cpp` (+341 -0) ➕ `llama/llama.cpp/examples/llava/audio_common.h` (+170 -0) 📝 `llama/llama.cpp/examples/llava/clip.cpp` (+6 -2) 📝 `llama/llama.cpp/examples/llava/clip.h` (+2 -1) ➕ `llama/llama.cpp/examples/llava/dr_wav.h` (+8815 -0) 📝 `llama/llama.cpp/examples/llava/llava.cpp` (+53 -1) 📝 `llama/llama.cpp/examples/llava/llava.h` (+4 -0) 📝 `llama/llama.go` (+84 -0) 📝 `llm/server.go` (+10 -4) 📝 `runner/llamarunner/image.go` (+37 -0) 📝 `runner/llamarunner/runner.go` (+195 -8) 📝 `server/prompt.go` (+17 -11) 📝 `server/routes.go` (+8 -5) </details> ### 📄 Description Hi Ollama community , I’d like to contribute a PR to the Ollama community that implements Omni capabilities using the MiniCPM-o 2.6 model. This enhancement introduces full-modal (Omni) understanding, enabling the model to process and interact with multiple types of inputs and outputs, including text input, audio input, video input, text output, and audio output. Since Omni capabilities significantly change interaction logic, I would love to hear your thoughts on the implementation details and whether any adjustments are needed to align with Ollama’s overall design philosophy. Additionally, I noticed that the schema of MiniCPM-V in Ollama appears to be misplaced, which could impact the model’s accuracy. I have addressed this issue in this PR as well. Looking forward to your feedback and suggestions! --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:31:28 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#18306