bug: OpenAI API connection doesn't accept mimetype: application/x-ndjson ? #2602

New Issue

2025-11-11T15:10:27-06:00

GiteaMirror commented

2025-11-11 15:10:27 -06:00

Originally created by @fabigr8 on GitHub (Nov 9, 2024).

Bug Report

Installation Method

Docker

Environment

Open WebUI Version: : v0.3.35
Operating System:: Linux Ubuntu 22.04
Browser (if applicable):: Firefox

Confirmation:

I have read and followed all the instructions provided in the README.md.
I am on the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

produce a chat-output in web-ui

Actual Behavior:

Application produces no output.
Instead the loading image persists ( lines that represent an chat and get replaced with actual text after the model responses).

Description

Bug Summary:
I am using litserve to host models on a separate server using LitServe's OpenAI-spec with Streaming=True.
I ran into an issue that is verry similar to #4915 (but for the openAI API).

Open WebUI seem to have an issue when using OpenAI-connection, with model setting: streaming on and when its getting back application/x-ndjson from the served model (See the Docker Logs below).
Due to this, the chat can't process the model response and views no model answers.

Additionally, I tested LitServe with a minimalistic python client script using the openAI package and this is producing the correct output and processing the x-ndjson` responses from LitServe correctly (therefore I can eliminate an error on LitServe's side).

You find all scripts (LitServe example and Client-test) below.

Reproduction Details

Steps to Reproduce:

install litserve pip install litserve and other packages you may miss.
Run the LitServe example which creates an LitServe (OpenAI-spec) API server
Connect Open-web-ui and LitServe server:
- In the Open-web-ui Admin settings add a new OpenAI API connection: http://localhost:9000/v1
- with any-key (will not be checked)
Start a new chat with the new model (model named"lit" which is provided by the LitServe server).
Error occurs

Logs and Screenshots

Browser Console Logs:

17:14:47.427 submitPrompt what day is today? <empty string> [Chat.svelte:798:10](http://localhost:8080/src/lib/components/chat/Chat.svelte)
17:14:47.435 UserMessage mounted [UserMessage.svelte:85:10](http://localhost:8080/src/lib/components/chat/Messages/UserMessage.svelte)
17:14:47.521 UserMessage mounted [UserMessage.svelte:85:10](http://localhost:8080/src/lib/components/chat/Messages/UserMessage.svelte)
17:14:47.523 ResponseMessage mounted [ResponseMessage.svelte:468:10](http://localhost:8080/src/lib/components/chat/Messages/ResponseMessage.svelte)
17:14:47.523 modelId lit [Chat.svelte:963:12](http://localhost:8080/src/lib/components/chat/Chat.svelte)
17:14:47.757
Array []

length: 0

<prototype>: Array []

...
[Chat.svelte:1951:11](http://localhost:8080/src/lib/components/chat/Chat.svelte)

Docker Container Logs:

DEBUG [open_webui.apps.openai.main] {"stream": true, "model": "lit", "messages": [{"role": "user", "content": "lets test it"}]}
ERROR [open_webui.apps.openai.main] 200, message='Attempt to decode JSON with unexpected mimetype: application/x-ndjson', url='http://localhost:9000/v1/chat/completions'

Screenshots/Screen Recordings (if applicable):
not applicable

Additional Information

Here is a minimalist LitServe example w/o LLM that should send back and static text.
running this script will run a LitServe server on port:9000
If you run this on the same server as open web-ui you need to add http://localhost:9000/v1 with any api-key (not relevant) in the admin-setting, to connect to LitServe.
Also be sure that open web-ui docker runs with --network=host

LitServe-Code:


import litserve as ls
from fastapi import Depends
from threading import Thread
from datetime import datetime

from typing import Annotated, AsyncGenerator, Dict, Iterator, List, Literal, Optional, Union
from litserve.specs.openai import ChatCompletionRequest, ChatMessage, ChatMessage, Function, Tool

class LlamaIndexAPI(ls.LitAPI):
    def setup(self, device):
        #self.llm = CDMAgent()
        print("position 0")
    
    
    def decode_request(self, request: ChatCompletionRequest, context):
        print("************")
        print(request)
        print("~~~~~~~~~~~~")
        #print(f"context: {context}")
        #print("************")
        # set values for temperature, max_tokens, and top_p
        context["temperature"] = request.temperature
        context["max_tokens"] = request.max_tokens if request.max_tokens else 1024
        context["top_p"] = request.top_p
        messages = get_tools_prefix_messages(request.messages, request.tools)
        return messages
    
    
    def predict(self, messages, context):
        print("position 2")
        #print(messages)
        #print(context)
        for token in "This is a sample generated output".split():
            yield token


def models_ep():
    models = [
        {
            "id": "lit",
            "object": "model",
            "created": 1686935002,
            "owned_by": "XX"
        }
    ]
    return {
        "object": "list",
        "data": models
    }

#static help functions
def get_tools_prefix_messages(
    messages: List[ChatMessage], custom_tools: List[Tool] = None
) -> List[ChatMessage]:
    messages = messages.copy()
    content = ""
    current_date = datetime.now()
    formatted_date = current_date.strftime("%d %B %Y")
    date_str = f"""
Cutting Knowledge Date: December 2023
Today Date: {formatted_date}\n\n"""
    content += date_str

    if custom_tools:
        tools_prompt = get_system_prompt_for_custom_tools(custom_tools)
        content += tools_prompt

    if messages[0].role != "system":
        content += "You are a helpful Assistant."
        messages.insert(0, ChatMessage(role="system", content=content))
    else:
        content += messages[0].content
        messages[0].content = content

    return messages

# main function to start server 
if __name__ == "__main__":
    api = LlamaIndexAPI()
    server = ls.LitServer(api, spec=ls.OpenAISpec(), stream=True) 
    server.app.add_api_route(
        "/v1/models",
        models_ep,
        methods=["GET"],
        tags=["models"],
        dependencies=[Depends(server.setup_auth())], 
    )
    server.run(port=9000)

Additionally here is the minimalistic code to test litserve with openAI client lib:

def test_streaming_litserve():
    # Initialize the OpenAI client
    client = OpenAI(
        base_url="http://localhost:9000/v1",  # Specify your LitServe endpoint
        api_key="lit",  # The API key is required, but it can be a placeholder in this context
    )

    # Create a chat completion request
    response = client.chat.completions.create(
        model="lit",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is a prime number and how is it defined? "},
        ],
        stream=True,
        temperature=0,
    )
    # create variables to collect the stream of chunks
    collected_chunks = []
    collected_messages = []
    print("#################")
    print(response)
    print("#################")
    # iterate through the stream of events
    for chunk in response:
        print("*****************")
        print(chunk)
        print("*****************")

Originally created by @fabigr8 on GitHub (Nov 9, 2024). # Bug Report ## Installation Method Docker ## Environment - **Open WebUI Version:** : v0.3.35 - **Operating System:**: Linux Ubuntu 22.04 - **Browser (if applicable):**: Firefox **Confirmation:** - [X] I have read and followed all the instructions provided in the README.md. - [X] I am on the latest version of both Open WebUI and Ollama. - [X] I have included the browser console logs. - [X] I have included the Docker container logs. - [X] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: produce a chat-output in web-ui ## Actual Behavior: Application produces no output. Instead the loading image persists ( lines that represent an chat and get replaced with actual text after the model responses). ## Description **Bug Summary:** I am using litserve to host models on a separate server using LitServe's `OpenAI-spec` with `Streaming=True`. I ran into an issue that is verry similar to #4915 (but for the openAI API). Open WebUI seem to have an issue when using OpenAI-connection, with model setting: **streaming on** and when its getting back `application/x-ndjson` from the served model (See the Docker Logs below). Due to this, the chat can't process the model response and views no model answers. Additionally, I tested LitServe with a minimalistic python client script using the openAI package and this is producing the correct output and processing the x-ndjson` responses from LitServe correctly (therefore I can eliminate an error on LitServe's side). You find all scripts (LitServe example and Client-test) below. ## Reproduction Details **Steps to Reproduce:** - install litserve `pip install litserve` and other packages you may miss. - Run the LitServe example which creates an LitServe (OpenAI-spec) API server - Connect Open-web-ui and LitServe server: - In the Open-web-ui Admin settings add a new OpenAI API connection: `http://localhost:9000/v1` - with any-key (will not be checked) - Start a new chat with the new model (model named"lit" which is provided by the LitServe server). - Error occurs ## Logs and Screenshots **Browser Console Logs:** ``` 17:14:47.427 submitPrompt what day is today? <empty string> [Chat.svelte:798:10](http://localhost:8080/src/lib/components/chat/Chat.svelte) 17:14:47.435 UserMessage mounted [UserMessage.svelte:85:10](http://localhost:8080/src/lib/components/chat/Messages/UserMessage.svelte) 17:14:47.521 UserMessage mounted [UserMessage.svelte:85:10](http://localhost:8080/src/lib/components/chat/Messages/UserMessage.svelte) 17:14:47.523 ResponseMessage mounted [ResponseMessage.svelte:468:10](http://localhost:8080/src/lib/components/chat/Messages/ResponseMessage.svelte) 17:14:47.523 modelId lit [Chat.svelte:963:12](http://localhost:8080/src/lib/components/chat/Chat.svelte) 17:14:47.757 Array [] length: 0 <prototype>: Array [] ... [Chat.svelte:1951:11](http://localhost:8080/src/lib/components/chat/Chat.svelte) ``` **Docker Container Logs:** ``` DEBUG [open_webui.apps.openai.main] {"stream": true, "model": "lit", "messages": [{"role": "user", "content": "lets test it"}]} ERROR [open_webui.apps.openai.main] 200, message='Attempt to decode JSON with unexpected mimetype: application/x-ndjson', url='http://localhost:9000/v1/chat/completions' ``` **Screenshots/Screen Recordings (if applicable):** not applicable ## Additional Information Here is a minimalist LitServe example w/o LLM that should send back and static text. running this script will run a LitServe server on port:9000 If you run this on the same server as open web-ui you need to add `http://localhost:9000/v1` with any api-key (not relevant) in the admin-setting, to connect to LitServe. Also be sure that open web-ui docker runs with `--network=host` LitServe-Code: ``` import litserve as ls from fastapi import Depends from threading import Thread from datetime import datetime from typing import Annotated, AsyncGenerator, Dict, Iterator, List, Literal, Optional, Union from litserve.specs.openai import ChatCompletionRequest, ChatMessage, ChatMessage, Function, Tool class LlamaIndexAPI(ls.LitAPI): def setup(self, device): #self.llm = CDMAgent() print("position 0") def decode_request(self, request: ChatCompletionRequest, context): print("************") print(request) print("~~~~~~~~~~~~") #print(f"context: {context}") #print("************") # set values for temperature, max_tokens, and top_p context["temperature"] = request.temperature context["max_tokens"] = request.max_tokens if request.max_tokens else 1024 context["top_p"] = request.top_p messages = get_tools_prefix_messages(request.messages, request.tools) return messages def predict(self, messages, context): print("position 2") #print(messages) #print(context) for token in "This is a sample generated output".split(): yield token def models_ep(): models = [ { "id": "lit", "object": "model", "created": 1686935002, "owned_by": "XX" } ] return { "object": "list", "data": models } #static help functions def get_tools_prefix_messages( messages: List[ChatMessage], custom_tools: List[Tool] = None ) -> List[ChatMessage]: messages = messages.copy() content = "" current_date = datetime.now() formatted_date = current_date.strftime("%d %B %Y") date_str = f""" Cutting Knowledge Date: December 2023 Today Date: {formatted_date}\n\n""" content += date_str if custom_tools: tools_prompt = get_system_prompt_for_custom_tools(custom_tools) content += tools_prompt if messages[0].role != "system": content += "You are a helpful Assistant." messages.insert(0, ChatMessage(role="system", content=content)) else: content += messages[0].content messages[0].content = content return messages # main function to start server if __name__ == "__main__": api = LlamaIndexAPI() server = ls.LitServer(api, spec=ls.OpenAISpec(), stream=True) server.app.add_api_route( "/v1/models", models_ep, methods=["GET"], tags=["models"], dependencies=[Depends(server.setup_auth())], ) server.run(port=9000) ``` Additionally here is the minimalistic code to test litserve with openAI client lib: ``` def test_streaming_litserve(): # Initialize the OpenAI client client = OpenAI( base_url="http://localhost:9000/v1", # Specify your LitServe endpoint api_key="lit", # The API key is required, but it can be a placeholder in this context ) # Create a chat completion request response = client.chat.completions.create( model="lit", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is a prime number and how is it defined? "}, ], stream=True, temperature=0, ) # create variables to collect the stream of chunks collected_chunks = [] collected_messages = [] print("#################") print(response) print("#################") # iterate through the stream of events for chunk in response: print("*****************") print(chunk) print("*****************") ```

GiteaMirror closed this issue

2025-11-11 15:10:27 -06:00

GiteaMirror referenced this issue

2025-11-11 17:37:43 -06:00

[PR #2602] [MERGED] feat: add OpenAI generation stats #7842

GiteaMirror referenced this issue

2026-04-20 03:19:12 -05:00

[PR #2602] [MERGED] feat: add OpenAI generation stats #21046

GiteaMirror referenced this issue

2026-04-25 10:26:21 -05:00

[PR #2602] [MERGED] feat: add OpenAI generation stats #36676

GiteaMirror referenced this issue