[PR #14584] [MERGED] model/renderers/glmocr: inject image tags in renderer prompt path #20005

Closed
opened 2026-04-16 07:23:15 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14584
Author: @Victor-Quqi
Created: 3/3/2026
Status: Merged
Merged: 3/3/2026
Merged by: @jmorganca

Base: mainHead: fix/glm-ocr-image-tags


📝 Commits (1)

  • 1bfe7aa model/renderers: fix glm-ocr image tags in renderer prompts

📊 Changes

4 files changed (+150 additions, -3 deletions)

View changed files

📝 model/renderers/glmocr.go (+19 -2)
model/renderers/glmocr_test.go (+99 -0)
📝 model/renderers/renderer.go (+1 -1)
📝 server/prompt_test.go (+31 -0)

📄 Description

Summary

Fix glm-ocr image handling in renderer mode by injecting indexed image tags ([img-{n}]) into user content, matching the server renderer contract used by other multimodal renderers.

This resolves the regression where glm-ocr receives only text prompt content (no image placeholders), causing blank OCR output such as:


Problem

Since v0.17.1, glm-ocr can return empty markdown for image inputs while other VLM models still work.

In renderer mode (m.Config.Renderer != ""), server/chatPrompt does not rewrite message content with image tags and expects renderer-specific handling.
glm-ocr renderer did not prepend image markers, so image embeddings were not correctly wired into the rendered prompt flow.

Root Cause

GlmOcrRenderer ignored message.Images and only rendered message.Content, unlike other renderer-based multimodal implementations (qwen3vl, lfm2) that emit image placeholders.

Changes

  • model/renderers/glmocr.go
    • add useImgTags behavior
    • prepend [img-{offset}] for each image in user messages
    • maintain multi-turn image offset continuity
  • model/renderers/renderer.go
    • wire glm-ocr renderer with RenderImgTags (&GlmOcrRenderer{useImgTags: RenderImgTags})
  • model/renderers/glmocr_test.go (new)
    • add unit tests for single/multiple images, multi-turn offset, default/no-tag behavior, and no-image content passthrough
  • server/prompt_test.go
    • add regression test ensuring chatPrompt + glm-ocr renderer path includes image tags

Validation

  • go test ./model/renderers/...
  • go test ./server -run "TestChatPrompt|TestChatPromptRendererDoesNotRewriteMessageContent|TestChatPromptGLMOcrRendererAddsImageTags|TestGenerateWithImages"
  • local end-to-end check with glm-ocr:latest + image input (/api/generate and /api/chat) returns non-empty OCR output

Fixes #14474
Related (different root cause): #14117 / #14523
Duplicate report: #14494
Potentially mixed-symptom thread: #14498


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14584 **Author:** [@Victor-Quqi](https://github.com/Victor-Quqi) **Created:** 3/3/2026 **Status:** ✅ Merged **Merged:** 3/3/2026 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `fix/glm-ocr-image-tags` --- ### 📝 Commits (1) - [`1bfe7aa`](https://github.com/ollama/ollama/commit/1bfe7aa44463b64ca84394330df452237cc7cf74) model/renderers: fix glm-ocr image tags in renderer prompts ### 📊 Changes **4 files changed** (+150 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `model/renderers/glmocr.go` (+19 -2) ➕ `model/renderers/glmocr_test.go` (+99 -0) 📝 `model/renderers/renderer.go` (+1 -1) 📝 `server/prompt_test.go` (+31 -0) </details> ### 📄 Description ## Summary Fix `glm-ocr` image handling in renderer mode by injecting indexed image tags (`[img-{n}]`) into user content, matching the server renderer contract used by other multimodal renderers. This resolves the regression where `glm-ocr` receives only text prompt content (no image placeholders), causing blank OCR output such as: ```markdown ``` ## Problem Since `v0.17.1`, `glm-ocr` can return empty markdown for image inputs while other VLM models still work. In renderer mode (`m.Config.Renderer != ""`), `server/chatPrompt` does not rewrite message content with image tags and expects renderer-specific handling. `glm-ocr` renderer did not prepend image markers, so image embeddings were not correctly wired into the rendered prompt flow. ## Root Cause `GlmOcrRenderer` ignored `message.Images` and only rendered `message.Content`, unlike other renderer-based multimodal implementations (`qwen3vl`, `lfm2`) that emit image placeholders. ## Changes - `model/renderers/glmocr.go` - add `useImgTags` behavior - prepend `[img-{offset}]` for each image in user messages - maintain multi-turn image offset continuity - `model/renderers/renderer.go` - wire `glm-ocr` renderer with `RenderImgTags` (`&GlmOcrRenderer{useImgTags: RenderImgTags}`) - `model/renderers/glmocr_test.go` (new) - add unit tests for single/multiple images, multi-turn offset, default/no-tag behavior, and no-image content passthrough - `server/prompt_test.go` - add regression test ensuring `chatPrompt` + `glm-ocr` renderer path includes image tags ## Validation - `go test ./model/renderers/...` - `go test ./server -run "TestChatPrompt|TestChatPromptRendererDoesNotRewriteMessageContent|TestChatPromptGLMOcrRendererAddsImageTags|TestGenerateWithImages"` - local end-to-end check with `glm-ocr:latest` + image input (`/api/generate` and `/api/chat`) returns non-empty OCR output ## Related Fixes #14474 Related (different root cause): #14117 / #14523 Duplicate report: #14494 Potentially mixed-symptom thread: #14498 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 07:23:15 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#20005