[PR #6899] [MERGED] runner.go: Support for vision models #17524

Closed
opened 2026-04-16 06:05:01 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/6899
Author: @jessegross
Created: 9/21/2024
Status: Merged
Merged: 9/30/2024
Merged by: @jessegross

Base: jmorganca/llamaHead: jmorganca/llama-vision


📝 Commits (10+)

  • bc3a4d8 runner.go: Allocate batches for all sequences during init
  • 66f7c89 llama.go: Don't return nil from Tokenize on zero length input
  • 8758ed0 runner.go: Remove stop tokens from cache
  • 4aa0274 runner.go: Simplify flushing of pending tokens
  • e0d516a runner.go: Update TODOs
  • aec3771 runner.go: Don't panic when processing sequences
  • 0b0b5d2 runner.go: More accurately capture timings
  • 8e94322 runner.go: Support for vision models
  • cdb539c runner.go: Move Unicode checking code and add tests
  • 7e44cdd runner.go: Export external cache members

📊 Changes

7 files changed (+708 additions, -316 deletions)

View changed files

📝 llama/example/main.go (+24 -12)
📝 llama/llama.go (+51 -32)
📝 llama/runner/cache.go (+117 -54)
📝 llama/runner/cache_test.go (+159 -64)
📝 llama/runner/runner.go (+259 -154)
📝 llama/runner/stop.go (+30 -0)
📝 llama/runner/stop_test.go (+68 -0)

📄 Description

In addition to bringing feature parity with the C++ runner, this also incorporates several improvements:

  • Cache prompting works with images, avoiding the need to re-decode embeddings for every message in a conversation
  • Parallelism is supported, avoiding the need to restrict to one sequence at a time. (Though for now Ollama will not schedule them while we might need to fall back to the old runner.)

Co-authored-by: jmorganca jmorganca@gmail.com


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/6899 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 9/21/2024 **Status:** ✅ Merged **Merged:** 9/30/2024 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `jmorganca/llama` ← **Head:** `jmorganca/llama-vision` --- ### 📝 Commits (10+) - [`bc3a4d8`](https://github.com/ollama/ollama/commit/bc3a4d8cfe4c36866b528fd8a076472606f45a97) runner.go: Allocate batches for all sequences during init - [`66f7c89`](https://github.com/ollama/ollama/commit/66f7c89f08fd683f4523190c2d9add4b7c9a9c8c) llama.go: Don't return nil from Tokenize on zero length input - [`8758ed0`](https://github.com/ollama/ollama/commit/8758ed091dca2363143d3b7ecfb9516cb896395a) runner.go: Remove stop tokens from cache - [`4aa0274`](https://github.com/ollama/ollama/commit/4aa027420346b661bfa4e8ede5a50e8e031bbaec) runner.go: Simplify flushing of pending tokens - [`e0d516a`](https://github.com/ollama/ollama/commit/e0d516a71dc8b021b285756ca5f88501989762e8) runner.go: Update TODOs - [`aec3771`](https://github.com/ollama/ollama/commit/aec3771a3e61eff3c44f3c656b0873e89b0e8850) runner.go: Don't panic when processing sequences - [`0b0b5d2`](https://github.com/ollama/ollama/commit/0b0b5d2b8d9210fc6535429951c7102fa3e8b1ae) runner.go: More accurately capture timings - [`8e94322`](https://github.com/ollama/ollama/commit/8e9432232799f7a56a6660284fda6a5e3969e011) runner.go: Support for vision models - [`cdb539c`](https://github.com/ollama/ollama/commit/cdb539cf2a6a8aa24f5c2cad0ee7faf2dcb1799b) runner.go: Move Unicode checking code and add tests - [`7e44cdd`](https://github.com/ollama/ollama/commit/7e44cdd613a10246e8c4d746593f9430209edccf) runner.go: Export external cache members ### 📊 Changes **7 files changed** (+708 additions, -316 deletions) <details> <summary>View changed files</summary> 📝 `llama/example/main.go` (+24 -12) 📝 `llama/llama.go` (+51 -32) 📝 `llama/runner/cache.go` (+117 -54) 📝 `llama/runner/cache_test.go` (+159 -64) 📝 `llama/runner/runner.go` (+259 -154) 📝 `llama/runner/stop.go` (+30 -0) 📝 `llama/runner/stop_test.go` (+68 -0) </details> ### 📄 Description In addition to bringing feature parity with the C++ runner, this also incorporates several improvements: - Cache prompting works with images, avoiding the need to re-decode embeddings for every message in a conversation - Parallelism is supported, avoiding the need to restrict to one sequence at a time. (Though for now Ollama will not schedule them while we might need to fall back to the old runner.) Co-authored-by: jmorganca <jmorganca@gmail.com> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:05:01 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#17524