mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-12 01:54:38 -05:00
ollama API streaming does not stream #778
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ProjectMoon on GitHub (May 1, 2024).
Bug Report
Description
Bug Summary:
If you set the stream parameter to
trueon the/ollama/api/chatendpoint, the OpenWebUI server proxies the request to ollama, but instead of returning the response in a streaming fashion expected by a client, it just dumps the entire stream back as one big response (including the newlines). This breaks clients that expect one little JSON chunk at a time.Steps to Reproduce:
Expected Behavior:
A series of JSON values streamed back, on each line.
Actual Behavior:
One single HTTP response, containing all of the chunks (properly formatted on each line, though).
Environment
Open WebUI Version: 0.1.122
Ollama (if applicable): 0.1.32
Operating System: Docker Container (on Gentoo Linux)
Reproduction Details
Confirmation:
Logs and Screenshots
Installation Method
Docker
Additional Information
The response itself is not incorrect, but because it's not properly streamed, this will break clients (like aichat) that assume it will get the chunks streamed one by one.
I looked at the code, and it seems like the ollama proxy backend is supposed to handle streaming:
This is from ollama main.py around line 875. But it doesn't seem to be respected? Maybe the form_data value isn't being set?
@ProjectMoon commented on GitHub (May 1, 2024):
Added some log statements in and around the
stream_content()function. The stream value is set fine onform_data, but for whatever reason,stream_content()seems to not be called, or is not executed.@ProjectMoon commented on GitHub (May 1, 2024):
Bit more debugging. It is actually hitting the stream clause, but for whatever reason this still results in one large response back to the client.
@cheahjs commented on GitHub (May 1, 2024):
Unable to reproduce, are you running Open WebUI behind a reverse proxy that is buffering responses?
@ProjectMoon commented on GitHub (May 1, 2024):
That is actually a very good point. It is running through a Cloudflare Tunnel.
@ProjectMoon commented on GitHub (May 1, 2024):
One possibility, though, is it possible to get Cloudflared to not buffer? Only thing I can find is if the response header has a specific
text/event-streamcontent type... otherwise it seems to buffer.@anatoliykmetyuk commented on GitHub (May 7, 2024):
@ProjectMoon have you managed to find a workaround on how to enable streaming via Cloudflare?