[GH-ISSUE #22637] issue: Phantom File Objects from text responses (Gemini 3.1 Pro + NanoGPT + LiteLLM) #19773

Closed
opened 2026-04-20 02:16:46 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @jndao on GitHub (Mar 13, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22637

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.8.10

Ollama Version (if applicable)

No response

Operating System

Windows 11

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

NanoGPT Gemini 3.1 Pro should respond with text responses only when image generation is not selected.

Actual Behavior

OpenWebUI attempts to render an empty image with reasoning tokens (or the same message when a non-thinking endpoint is used) as the alt context. This results in a broken image with the same content in the image as the final chat message.

This continues to occur when vision, image generation AND file uploads are turned off on the model in openwebui.

Steps to Reproduce

  1. Add nanogpt Gemini 3.1 Pro as a model in OpenWebUI
  2. Say hello in a new chat (or any existing chat)
  3. Observe broken/empty image with erranous alt context.

This is occuring for all nanogpt endpoints, v1, v1thinking, v1legacy

Logs & Screenshots

Image

Additional Information

I attempted to remove this through a pipe (markdown, img and file scrubbing) to no avail. Something else must be going on in the chat rendering engine in openwebui.

Originally created by @jndao on GitHub (Mar 13, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/22637 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.8.10 ### Ollama Version (if applicable) _No response_ ### Operating System Windows 11 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior NanoGPT Gemini 3.1 Pro should respond with text responses only when image generation is not selected. ### Actual Behavior OpenWebUI attempts to render an empty image with reasoning tokens (or the same message when a non-thinking endpoint is used) as the alt context. This results in a broken image with the same content in the image as the final chat message. This continues to occur when vision, image generation AND file uploads are turned off on the model in openwebui. ### Steps to Reproduce 1. Add nanogpt Gemini 3.1 Pro as a model in OpenWebUI 2. Say hello in a new chat (or any existing chat) 3. Observe broken/empty image with erranous alt context. This is occuring for all nanogpt endpoints, v1, v1thinking, v1legacy ### Logs & Screenshots <img width="1817" height="299" alt="Image" src="https://github.com/user-attachments/assets/46cd0947-9583-422d-b89d-ad0e6fd6272a" /> ### Additional Information I attempted to remove this through a pipe (markdown, img and file scrubbing) to no avail. Something else must be going on in the chat rendering engine in openwebui.
GiteaMirror added the bug label 2026-04-20 02:16:46 -05:00
Author
Owner

@jndao commented on GitHub (Mar 13, 2026):

Example element rendered in opnenwebui

<img draggable="false" data-cy="image" src="/api/v1/files/<file_name_hash>/content" alt="<thining content>" class="rounded-lg">
<!-- gh-comment-id:4051557785 --> @jndao commented on GitHub (Mar 13, 2026): Example element rendered in opnenwebui ``` <img draggable="false" data-cy="image" src="/api/v1/files/<file_name_hash>/content" alt="<thining content>" class="rounded-lg"> ```
Author
Owner

@jndao commented on GitHub (Mar 13, 2026):

The following filter can be added to remove invalid PNG images from the stream (mostly AI generated)

"""
title: Smart Image Validator (Filter)
description: Validates image base64 data and only strips invalid/phantom images.
version: 1.0
"""

import re
from pydantic import BaseModel
from typing import Dict

class Filter:
    class Valves(BaseModel):
        pass

    def __init__(self):
        self.valves = self.Valves()
        
        # Valid base64 image prefixes (magic bytes encoded)
        # PNG: iVBORw0KGgo
        # JPEG: /9j/
        # GIF: R0lGOD
        # WebP: UklGR
        self.valid_prefixes = (
            "iVBORw0KGgo",  # PNG
            "/9j/",          # JPEG
            "R0lGOD",        # GIF
            "UklGR",         # WebP
        )

    def _is_valid_image(self, url: str) -> bool:
        """Check if a data URL contains valid image magic bytes"""
        if not url:
            return False
        
        # Handle data:image/xxx;base64,... format
        if url.startswith("data:image/"):
            # Extract the base64 part after the comma
            if ";base64," in url:
                base64_data = url.split(";base64,", 1)[1]
                # Check if it starts with valid magic bytes
                return base64_data.startswith(self.valid_prefixes)
            return False
        
        # Handle raw base64
        if url.startswith(self.valid_prefixes):
            return True
            
        # Handle regular URLs (http/https) - these are valid
        if url.startswith("http://") or url.startswith("https://"):
            return True
            
        return False

    def stream(self, event: dict) -> dict:
        """
        Intercept stream chunks and validate/strip invalid images.
        """
        if "choices" in event:
            for choice in event["choices"]:
                delta = choice.get("delta", {})
                
                # Check for images array
                if "images" in delta:
                    valid_images = []
                    for img in delta["images"]:
                        # Extract URL from various formats
                        url = None
                        if isinstance(img, dict):
                            if "image_url" in img:
                                url = img["image_url"].get("url", "")
                            elif "url" in img:
                                url = img["url"]
                        elif isinstance(img, str):
                            url = img
                        
                        # Only keep valid images
                        if url and self._is_valid_image(url):
                            valid_images.append(img)
                    
                    # Replace with filtered list (or delete if empty)
                    if valid_images:
                        delta["images"] = valid_images
                    else:
                        del delta["images"]
        
        return event

    def outlet(self, body: Dict, __user__: Dict) -> Dict:
        """Backup: Clean up any invalid file attachments"""
        if "messages" not in body:
            return body

        for message in body["messages"]:
            if "files" in message:
                # Could add validation here too if needed
                pass

        return body
<!-- gh-comment-id:4051751636 --> @jndao commented on GitHub (Mar 13, 2026): The following filter can be added to remove invalid PNG images from the stream (mostly AI generated) ```python """ title: Smart Image Validator (Filter) description: Validates image base64 data and only strips invalid/phantom images. version: 1.0 """ import re from pydantic import BaseModel from typing import Dict class Filter: class Valves(BaseModel): pass def __init__(self): self.valves = self.Valves() # Valid base64 image prefixes (magic bytes encoded) # PNG: iVBORw0KGgo # JPEG: /9j/ # GIF: R0lGOD # WebP: UklGR self.valid_prefixes = ( "iVBORw0KGgo", # PNG "/9j/", # JPEG "R0lGOD", # GIF "UklGR", # WebP ) def _is_valid_image(self, url: str) -> bool: """Check if a data URL contains valid image magic bytes""" if not url: return False # Handle data:image/xxx;base64,... format if url.startswith("data:image/"): # Extract the base64 part after the comma if ";base64," in url: base64_data = url.split(";base64,", 1)[1] # Check if it starts with valid magic bytes return base64_data.startswith(self.valid_prefixes) return False # Handle raw base64 if url.startswith(self.valid_prefixes): return True # Handle regular URLs (http/https) - these are valid if url.startswith("http://") or url.startswith("https://"): return True return False def stream(self, event: dict) -> dict: """ Intercept stream chunks and validate/strip invalid images. """ if "choices" in event: for choice in event["choices"]: delta = choice.get("delta", {}) # Check for images array if "images" in delta: valid_images = [] for img in delta["images"]: # Extract URL from various formats url = None if isinstance(img, dict): if "image_url" in img: url = img["image_url"].get("url", "") elif "url" in img: url = img["url"] elif isinstance(img, str): url = img # Only keep valid images if url and self._is_valid_image(url): valid_images.append(img) # Replace with filtered list (or delete if empty) if valid_images: delta["images"] = valid_images else: del delta["images"] return event def outlet(self, body: Dict, __user__: Dict) -> Dict: """Backup: Clean up any invalid file attachments""" if "messages" not in body: return body for message in body["messages"]: if "files" in message: # Could add validation here too if needed pass return body ```
Author
Owner

@jndao commented on GitHub (Mar 13, 2026):

The data encoded in the broken images appears to be protobuf data or similar... This could be a provider problem however, openwebui should not try to render invalid base64 image data.

<!-- gh-comment-id:4051763385 --> @jndao commented on GitHub (Mar 13, 2026): The data encoded in the broken images appears to be protobuf data or similar... This could be a provider problem however, openwebui should not try to render invalid base64 image data.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#19773