[PR #15935] Fix intermittent gemma4 crash from split image group (#15929) #77659

Open
opened 2026-05-05 10:20:09 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15935
Author: @ssam18
Created: 5/2/2026
Status: 🔄 Open

Base: mainHead: fix-gemma-image-batch-split


📝 Commits (1)

  • df2bc6e Fix intermittent gemma4 crash from split image group (#15929)

📊 Changes

3 files changed (+193 additions, -5 deletions)

View changed files

📝 model/models/gemma3/model.go (+7 -3)
📝 model/models/gemma4/model.go (+11 -2)
model/models/gemma4/model_posttokenize_test.go (+175 -0)

📄 Description

The cache prefix matcher in runner/ollamarunner/cache.go compares only Token and MultimodalHash, so it can match through the image begin token (which carries no hash) and leave the placeholder as the first uncached input. With nothing forcing the image group to stay together from the placeholder onward, the runner can pack another sequence's tokens before it under parallel load, leave the placeholder at a row position where 256 vision tokens no longer fit, and then crash inside hiddenState.View. The fix puts SameBatch on the placeholder itself in both gemma4 and gemma3 so the group stays in one batch regardless of where the cache prefix ends. Added a regression test in model/models/gemma4/model_posttokenize_test.go covering the 256 token case from the crash report and three other PostTokenize paths. Closes #15929


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15935 **Author:** [@ssam18](https://github.com/ssam18) **Created:** 5/2/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix-gemma-image-batch-split` --- ### 📝 Commits (1) - [`df2bc6e`](https://github.com/ollama/ollama/commit/df2bc6e9acdfdf7fc99d0e3f6b7e5b65af682221) Fix intermittent gemma4 crash from split image group (#15929) ### 📊 Changes **3 files changed** (+193 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `model/models/gemma3/model.go` (+7 -3) 📝 `model/models/gemma4/model.go` (+11 -2) ➕ `model/models/gemma4/model_posttokenize_test.go` (+175 -0) </details> ### 📄 Description The cache prefix matcher in runner/ollamarunner/cache.go compares only Token and MultimodalHash, so it can match through the image begin token (which carries no hash) and leave the placeholder as the first uncached input. With nothing forcing the image group to stay together from the placeholder onward, the runner can pack another sequence's tokens before it under parallel load, leave the placeholder at a row position where 256 vision tokens no longer fit, and then crash inside hiddenState.View. The fix puts SameBatch on the placeholder itself in both gemma4 and gemma3 so the group stays in one batch regardless of where the cache prefix ends. Added a regression test in model/models/gemma4/model_posttokenize_test.go covering the 256 token case from the crash report and three other PostTokenize paths. Closes #15929 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:20:09 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77659