[GH-ISSUE #15886] issue: Notes - Chat can't used streaming Model from API. #33235

Closed
opened 2026-04-25 07:08:22 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @peuportier on GitHub (Jul 20, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/15886

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Pip Install

Open WebUI Version

0.6.18

Ollama Version (if applicable)

0.9.6

Operating System

MacOS 15.4.1

Browser (if applicable)

Safari 18.4 (20621.1.15.11.10)

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The chat should successfully connect to the model and stream responses without encountering a 422 Client Error. In normal chat no problem.

Actual Behavior

Description:
When attempting to use the chat functionality with a model that streams responses within the Notes tool, the system repeatedly returns a 422 Client Error: Unprocessable Entity for the URL https://api.mistral.ai/v1/chat/completions.

Steps to Reproduce

  • Open the Notes tool.
  • Initiate a chat with a model that streams responses.
  • Observe the error messages in the console or UI.

Logs & Screenshots

Image

Additional Information

This issue occurs specifically when trying to stream responses from the chat model.
The error suggests that the request being sent to the API endpoint is not properly formatted or is missing required parameters.

Originally created by @peuportier on GitHub (Jul 20, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/15886 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Pip Install ### Open WebUI Version 0.6.18 ### Ollama Version (if applicable) 0.9.6 ### Operating System MacOS 15.4.1 ### Browser (if applicable) Safari 18.4 (20621.1.15.11.10) ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior The chat should successfully connect to the model and stream responses without encountering a 422 Client Error. In normal chat no problem. ### Actual Behavior Description: When attempting to use the chat functionality with a model that streams responses within the Notes tool, the system repeatedly returns a 422 Client Error: Unprocessable Entity for the URL https://api.mistral.ai/v1/chat/completions. ### Steps to Reproduce - [ ] Open the Notes tool. - [ ] Initiate a chat with a model that streams responses. - [ ] Observe the error messages in the console or UI. ### Logs & Screenshots <img width="408" height="428" alt="Image" src="https://github.com/user-attachments/assets/72f8421a-d527-420b-856b-41e0228c532a" /> ### Additional Information This issue occurs specifically when trying to stream responses from the chat model. The error suggests that the request being sent to the API endpoint is not properly formatted or is missing required parameters.
GiteaMirror added the bug label 2026-04-25 07:08:22 -05:00
Author
Owner

@tjbck commented on GitHub (Jul 20, 2025):

Is this from direct connections?

<!-- gh-comment-id:3094470427 --> @tjbck commented on GitHub (Jul 20, 2025): Is this from direct connections?
Author
Owner

@peuportier commented on GitHub (Jul 21, 2025):

Hey @tjbck — hope you’re doing well and not too exhausted from all the projects!

Quick heads up:
I created a function and shared it with the community that lets you load Mistral models directly from the Mistral API (using your API key) into OWUI, and then stream responses from there.

Just to clarify:
This code makes a direct connection to the Mistral API using HTTPS requests (via the Python requests library). I’m not sure if you allow this kind of direct connection from the chat side, but wanted to check in and see if that’s okay, or if you have any restrictions around this.

Let me know, thanks again for all efforts throughout Open Webui. It's help our research a lot.

<!-- gh-comment-id:3095154806 --> @peuportier commented on GitHub (Jul 21, 2025): Hey @tjbck — hope you’re doing well and not too exhausted from all the projects! Quick heads up: I created a function and shared it with the community that lets you load Mistral models directly from the Mistral API (using your API key) into OWUI, and then stream responses from there. Just to clarify: This code makes a direct connection to the Mistral API using HTTPS requests (via the Python requests library). I’m not sure if you allow this kind of direct connection from the chat side, but wanted to check in and see if that’s okay, or if you have any restrictions around this. Let me know, thanks again for all efforts throughout Open Webui. It's help our research a lot.
Author
Owner

@rgaricano commented on GitHub (Jul 21, 2025):

It seem an error due to a malformed request,
if you wantn't share the whole function, could you share just the request call that is sent against mistral completions?

<!-- gh-comment-id:3095534497 --> @rgaricano commented on GitHub (Jul 21, 2025): It seem an error due to a malformed request, if you wantn't share the whole function, could you share just the request call that is sent against mistral completions?
Author
Owner

@peuportier commented on GitHub (Jul 22, 2025):

@rgaricano No problem , I can share the whole function. nothing to hide here . I thanks for any clue that can solve the issue.

`import os
import json
import requests
import time
from typing import List, Union, Dict, Generator # Added Generator import
from pydantic import BaseModel, Field
import base64

class Pipe:
class Valves(BaseModel):
"""Configuration for Mistral API."""

    MISTRAL_API_BASE_URL: str = Field(default="https://api.mistral.ai/v1")
    MISTRAL_API_KEY: str = Field(default="")

def __init__(self):
    self.debug_models = False
    self.debug_stream = True
    self.debug_errors = True
    self.type = "manifold"
    self.id = "mistral"
    self.name = "mistral/"
    # European server
    self.server = "https://api.mistral.ai"
    self.models_url = self.server + "/v1/models"
    self.chat_url = self.server + "/v1/chat/completions"
    self.temperature = 0.7
    self.top_p = 0.9
    self.max_tokens = 4096
    api_key = os.getenv("MISTRAL_API_KEY", "").strip()
    self.valves = self.Valves(MISTRAL_API_KEY=api_key)
    self.last_request_time: float = (
        0.0  # Initialize the last request time for rate limiting
    )
    self.rate_limit_reset: float = 0.0  # Initialize rate_limit_reset to 0
    self.rate_limit_interval: float = (
        30.0  # Set the rate limit interval in seconds (Open is 100 requests per hour)
    )
    self.models = ""

    # Not yet implemented!
    self.MAX_IMAGE_SIZE = 5 * 1024 * 1024  # 5MB per image
    self.image_url = ""

def _debug(self, message: str):
    """Prints debug messages if DEBUG is enabled."""
    if self.debug_errors:
        print(message)

def _get_headers(self) -> Dict[str, str]:
    """Returns the headers for API requests."""
    if not self.valves.MISTRAL_API_KEY:
        raise ValueError("MISTRAL_API_KEY is missing or invalid.")
    return {
        "Authorization": f"Bearer {self.valves.MISTRAL_API_KEY}",
        "Content-Type": "application/json",
    }

def _handle_response(self, response):
    """Handles the response from the API call."""
    if response.status_code == 200:
        return response.json()  # Assuming the response is in JSON format
    else:
        raise ValueError(f"Error with status code: {response.status_code}")

def get_mistral_models(self) -> List[Dict[str, Union[str, int, bool]]]:
    """Fetches available Mistral models, filters, and returns unique models."""
    if not self.valves.MISTRAL_API_KEY:
        raise ValueError("MISTRAL_API_KEY is missing or invalid.")
    headers = {
        "Authorization": f"Bearer {self.valves.MISTRAL_API_KEY}",
        "Content-Type": "application/json",
    }

    try:
        response = requests.get(f"{self.models_url}", headers=headers)
        response.raise_for_status()
        self.models = response.json()["data"]
    except requests.exceptions.RequestException as e:
        if self.debug_errors:
            print(f"API call failed: {e}")

    # Map to track unique models
    model_map = {}
    for model in self.models:
        # Check if the model has the `completion_chat` capability
        if not model["capabilities"].get("completion_chat", False):
            continue

        # Extract base ID and check if it's a "latest" version
        base_id = "-".join(model["id"].split("-")[:-1])
        is_latest = "latest" in model["id"] or "latest" in model["aliases"]

        # Update or add model to the map
        if base_id not in model_map or is_latest:
            model_map[base_id] = model

    # Prepare the final list of unique models
    unique_models = []
    for base_id, model in model_map.items():
        unique_models.append(
            {
                "id": model["id"],
                "name": model["name"],
                "capabilities": model["capabilities"],
                "description": model["description"],
                "max_context_length": model["max_context_length"],
                "aliases": model["aliases"],
                "deprecation": model["deprecation"],
                "default_model_temperature": model["default_model_temperature"],
                "type": model["type"],
            }
        )

    if self.debug_models:
        print("Unique Models:")
        for model in unique_models:
            print(f"ID: {model['id']}")
            print(f"Name: {model['name']}")
            print(f"Capabilities: {model['capabilities']}")
            print(f"Description: {model['description']}")
            print(f"Max Context Length: {model['max_context_length']}")
            print(f"Aliases: {model['aliases']}")
            print(f"Deprecation: {model['deprecation']}")
            print(
                f"Default Model Temperature: {model['default_model_temperature']}"
            )
            print(f"Type: {model['type']}")
            print("-" * 40)

    return unique_models

def pipes(self) -> List[dict]:
    """Returns a list of available models."""
    return self.get_mistral_models()

def pipe(self, body: dict) -> Union[str, Generator[str, None, None]]:
    """Handles a single request to the pipe."""
    try:
        model = body["model"].removeprefix("mistral.")
        messages = body["messages"]

        # Debugging the content of model and messages
        self._debug(f"Model: {model}")
        self._debug(f"Messages: {json.dumps(messages, indent=2)}")

        # Ensure the messages are in the correct format (list of dictionaries)
        if not all(isinstance(msg, dict) and "content" in msg for msg in messages):
            raise ValueError(
                "Each message must be a dictionary with a 'content' key."
            )

        stream = body.get("stream", False)

        if self.debug_stream:
            self._debug("Incoming body:")
            self._debug(json.dumps(body, indent=2))

        if stream:
            return self.stream_response(model, messages)
        return self.get_completion(model, messages)
    except KeyError as e:
        error_msg = f"Missing required key in body: {e}"
        self._debug(error_msg)
        return f"Error: {error_msg}"
    except Exception as e:
        self._debug(f"Error in pipe method: {e}")
        return f"Error: {e}"

def stream_response(
    self, model: str, messages: List[dict], retries: int = 5
) -> Generator[str, None, None]:
    """Streams a response from the Mistral API, handling rate limits."""
    url = f"{self.chat_url}"
    payload = {"model": model, "messages": messages, "stream": True}

    self._debug(f"Streaming response from {url}")
    self._debug(f"Payload: {json.dumps(payload, indent=2)}")

    for attempt in range(retries):
        try:
            response = requests.post(
                url, json=payload, headers=self._get_headers(), stream=True
            )
            response.raise_for_status()

            for line in response.iter_lines():
                if line:
                    try:
                        line_data = line.decode("utf-8").lstrip("data: ")
                        event = json.loads(line_data)

                        self._debug(f"Received stream event: {event}")

                        delta_content = (
                            event.get("choices", [{}])[0]
                            .get("delta", {})
                            .get("content")
                        )
                        if delta_content:
                            yield delta_content

                        if (
                            event.get("choices", [{}])[0].get("finish_reason")
                            == "stop"
                        ):
                            break
                    except json.JSONDecodeError:
                        self._debug(f"Failed to decode stream line: {line}")
                        continue
            return  # Exit after successful streaming
        except requests.RequestException as e:
            if response.status_code == 429 and attempt < retries - 1:
                wait_time = 2**attempt
                self._debug(
                    f"Rate limited (429). Retrying after {wait_time} seconds..."
                )
                time.sleep(wait_time)
            else:
                self._debug(f"Stream request failed: {e}")
                yield f"Error: {str(e)}"

def get_completion(self, model: str, messages: List[dict], retries: int = 3) -> str:
    """Fetches a single completion response, handling rate limits."""
    url = f"{self.chat_url}"
    payload = {"model": model, "messages": messages}

    for attempt in range(retries):
        try:
            self._debug(
                f"Attempt {attempt + 1}: Sending completion request to {url}"
            )
            response = requests.post(url, json=payload, headers=self._get_headers())
            data = self._handle_response(response)
            return data["choices"][0]["message"]["content"]
        except requests.RequestException as e:
            if response.status_code == 429 and attempt < retries - 1:
                wait_time = 2**attempt
                self._debug(
                    f"Rate limited (429). Retrying after {wait_time} seconds..."
                )
                time.sleep(wait_time)
            else:
                self._debug(f"Completion request failed: {e}")
                return f"Error: {str(e)}"

`

Hope this can help.

Thanks again

<!-- gh-comment-id:3102113625 --> @peuportier commented on GitHub (Jul 22, 2025): @rgaricano No problem , I can share the whole function. nothing to hide here . I thanks for any clue that can solve the issue. `import os import json import requests import time from typing import List, Union, Dict, Generator # Added Generator import from pydantic import BaseModel, Field import base64 class Pipe: class Valves(BaseModel): """Configuration for Mistral API.""" MISTRAL_API_BASE_URL: str = Field(default="https://api.mistral.ai/v1") MISTRAL_API_KEY: str = Field(default="") def __init__(self): self.debug_models = False self.debug_stream = True self.debug_errors = True self.type = "manifold" self.id = "mistral" self.name = "mistral/" # European server self.server = "https://api.mistral.ai" self.models_url = self.server + "/v1/models" self.chat_url = self.server + "/v1/chat/completions" self.temperature = 0.7 self.top_p = 0.9 self.max_tokens = 4096 api_key = os.getenv("MISTRAL_API_KEY", "").strip() self.valves = self.Valves(MISTRAL_API_KEY=api_key) self.last_request_time: float = ( 0.0 # Initialize the last request time for rate limiting ) self.rate_limit_reset: float = 0.0 # Initialize rate_limit_reset to 0 self.rate_limit_interval: float = ( 30.0 # Set the rate limit interval in seconds (Open is 100 requests per hour) ) self.models = "" # Not yet implemented! self.MAX_IMAGE_SIZE = 5 * 1024 * 1024 # 5MB per image self.image_url = "" def _debug(self, message: str): """Prints debug messages if DEBUG is enabled.""" if self.debug_errors: print(message) def _get_headers(self) -> Dict[str, str]: """Returns the headers for API requests.""" if not self.valves.MISTRAL_API_KEY: raise ValueError("MISTRAL_API_KEY is missing or invalid.") return { "Authorization": f"Bearer {self.valves.MISTRAL_API_KEY}", "Content-Type": "application/json", } def _handle_response(self, response): """Handles the response from the API call.""" if response.status_code == 200: return response.json() # Assuming the response is in JSON format else: raise ValueError(f"Error with status code: {response.status_code}") def get_mistral_models(self) -> List[Dict[str, Union[str, int, bool]]]: """Fetches available Mistral models, filters, and returns unique models.""" if not self.valves.MISTRAL_API_KEY: raise ValueError("MISTRAL_API_KEY is missing or invalid.") headers = { "Authorization": f"Bearer {self.valves.MISTRAL_API_KEY}", "Content-Type": "application/json", } try: response = requests.get(f"{self.models_url}", headers=headers) response.raise_for_status() self.models = response.json()["data"] except requests.exceptions.RequestException as e: if self.debug_errors: print(f"API call failed: {e}") # Map to track unique models model_map = {} for model in self.models: # Check if the model has the `completion_chat` capability if not model["capabilities"].get("completion_chat", False): continue # Extract base ID and check if it's a "latest" version base_id = "-".join(model["id"].split("-")[:-1]) is_latest = "latest" in model["id"] or "latest" in model["aliases"] # Update or add model to the map if base_id not in model_map or is_latest: model_map[base_id] = model # Prepare the final list of unique models unique_models = [] for base_id, model in model_map.items(): unique_models.append( { "id": model["id"], "name": model["name"], "capabilities": model["capabilities"], "description": model["description"], "max_context_length": model["max_context_length"], "aliases": model["aliases"], "deprecation": model["deprecation"], "default_model_temperature": model["default_model_temperature"], "type": model["type"], } ) if self.debug_models: print("Unique Models:") for model in unique_models: print(f"ID: {model['id']}") print(f"Name: {model['name']}") print(f"Capabilities: {model['capabilities']}") print(f"Description: {model['description']}") print(f"Max Context Length: {model['max_context_length']}") print(f"Aliases: {model['aliases']}") print(f"Deprecation: {model['deprecation']}") print( f"Default Model Temperature: {model['default_model_temperature']}" ) print(f"Type: {model['type']}") print("-" * 40) return unique_models def pipes(self) -> List[dict]: """Returns a list of available models.""" return self.get_mistral_models() def pipe(self, body: dict) -> Union[str, Generator[str, None, None]]: """Handles a single request to the pipe.""" try: model = body["model"].removeprefix("mistral.") messages = body["messages"] # Debugging the content of model and messages self._debug(f"Model: {model}") self._debug(f"Messages: {json.dumps(messages, indent=2)}") # Ensure the messages are in the correct format (list of dictionaries) if not all(isinstance(msg, dict) and "content" in msg for msg in messages): raise ValueError( "Each message must be a dictionary with a 'content' key." ) stream = body.get("stream", False) if self.debug_stream: self._debug("Incoming body:") self._debug(json.dumps(body, indent=2)) if stream: return self.stream_response(model, messages) return self.get_completion(model, messages) except KeyError as e: error_msg = f"Missing required key in body: {e}" self._debug(error_msg) return f"Error: {error_msg}" except Exception as e: self._debug(f"Error in pipe method: {e}") return f"Error: {e}" def stream_response( self, model: str, messages: List[dict], retries: int = 5 ) -> Generator[str, None, None]: """Streams a response from the Mistral API, handling rate limits.""" url = f"{self.chat_url}" payload = {"model": model, "messages": messages, "stream": True} self._debug(f"Streaming response from {url}") self._debug(f"Payload: {json.dumps(payload, indent=2)}") for attempt in range(retries): try: response = requests.post( url, json=payload, headers=self._get_headers(), stream=True ) response.raise_for_status() for line in response.iter_lines(): if line: try: line_data = line.decode("utf-8").lstrip("data: ") event = json.loads(line_data) self._debug(f"Received stream event: {event}") delta_content = ( event.get("choices", [{}])[0] .get("delta", {}) .get("content") ) if delta_content: yield delta_content if ( event.get("choices", [{}])[0].get("finish_reason") == "stop" ): break except json.JSONDecodeError: self._debug(f"Failed to decode stream line: {line}") continue return # Exit after successful streaming except requests.RequestException as e: if response.status_code == 429 and attempt < retries - 1: wait_time = 2**attempt self._debug( f"Rate limited (429). Retrying after {wait_time} seconds..." ) time.sleep(wait_time) else: self._debug(f"Stream request failed: {e}") yield f"Error: {str(e)}" def get_completion(self, model: str, messages: List[dict], retries: int = 3) -> str: """Fetches a single completion response, handling rate limits.""" url = f"{self.chat_url}" payload = {"model": model, "messages": messages} for attempt in range(retries): try: self._debug( f"Attempt {attempt + 1}: Sending completion request to {url}" ) response = requests.post(url, json=payload, headers=self._get_headers()) data = self._handle_response(response) return data["choices"][0]["message"]["content"] except requests.RequestException as e: if response.status_code == 429 and attempt < retries - 1: wait_time = 2**attempt self._debug( f"Rate limited (429). Retrying after {wait_time} seconds..." ) time.sleep(wait_time) else: self._debug(f"Completion request failed: {e}") return f"Error: {str(e)}" ` Hope this can help. Thanks again
Author
Owner

@rgaricano commented on GitHub (Jul 22, 2025):

Ok, probably it's because in Chat notes stream is true
5fbfe2bdca/src/lib/components/notes/NoteEditor/Chat.svelte (L191)
and files are sent:
5fbfe2bdca/src/lib/components/notes/NoteEditor/Chat.svelte (L170-L175)

but the mistral completion endpoint doesn't support stream when sending files:
5fbfe2bdca/backend/open_webui/retrieval/loaders/mistral.py (L252-L261)

<!-- gh-comment-id:3102213852 --> @rgaricano commented on GitHub (Jul 22, 2025): Ok, probably it's because in Chat notes stream is true https://github.com/open-webui/open-webui/blob/5fbfe2bdcadf5f157926f6551891e4dc0802b9f3/src/lib/components/notes/NoteEditor/Chat.svelte#L191 and files are sent: https://github.com/open-webui/open-webui/blob/5fbfe2bdcadf5f157926f6551891e4dc0802b9f3/src/lib/components/notes/NoteEditor/Chat.svelte#L170-L175 but the mistral completion endpoint doesn't support stream when sending files: https://github.com/open-webui/open-webui/blob/5fbfe2bdcadf5f157926f6551891e4dc0802b9f3/backend/open_webui/retrieval/loaders/mistral.py#L252-L261
Author
Owner

@rgaricano commented on GitHub (Jul 22, 2025):

Solutions?
set stream=false in notes chat to complain with mistral endpoint (not recommended) or do the check in your pipe to avoid this mistral issue, setting stream=false if there are files in the payload.

<!-- gh-comment-id:3102240285 --> @rgaricano commented on GitHub (Jul 22, 2025): Solutions? set stream=false in notes chat to complain with mistral endpoint (not recommended) or do the check in your pipe to avoid this mistral issue, setting stream=false if there are files in the payload.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#33235