mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 02:48:13 -05:00
[GH-ISSUE #13027] feat: Asynchronous process_chat_payload in chat completion
#16789
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tth37 on GitHub (Apr 18, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13027
Originally assigned to: @tjbck on GitHub.
Check Existing Issues
Related: #13007
Problem Description
The
/api/chat/completionsendpoint supports two primary modes of operation:stream=False): Typically invoked via direct HTTP requests, this mode processes the entire request and returns the complete response in a single HTTP transaction.stream=True): Primarily used by the frontend UI via WebSocket, this mode is expected to return immediately with atask_id. Thistask_idallows the frontend to receive status updates, stream the response incrementally via the WebSocket connection, and crucially, enables early stopping of the generation process initiated by the user.While the asynchronous (
stream=True) mode functions as expected for standard chat interactions (returning thetask_idpromptly), this expected behavior breaks when features requiring substantial pre-processing, such as Web Search or Tool Use, are enabled. Instead of returning immediately, it waits for theprocess_chat_payloadphase (which includes potentially long-running operations like web searches or tool executions) to complete before returning the task_id.This synchronous behavior during the payload processing phase leads to two significant issues: (both reported in discussions)
task_iduntil after web search/tool execution finishes. This prevents users from stopping the request during this initial, potentially lengthy (30-60s+), phase.Cause Analysis
The chat completion process can be broadly divided into two phases:
process_chat_payload: Handles request preprocessing, including web searches, tool calls, and injecting results into the context for the language model.process_chat_response: Handles the actual generation of the AI response by LLM and streams results back via WebSocket.Currently,
process_chat_responseis correctly handled asynchronously usingcreate_task, as seen:b8fb4e528d/backend/open_webui/utils/middleware.py (L1209-L1210)However the
process_chat_payloadremains a synchronous function, user have to wait untilprocess_chat_payloadfinishes and then they can receive the backgroundtask_id. Things get worse when web search feature is enabled as it might take up to 30s-60s, in this period user cannot early stop the request, and facing the risk of connection timeout.Desired Solution you'd like
For asynchronous api calls, refactor the
chat_completionhandler inmain.pyto make the entire processing pipeline (both payload processing and response generation) asynchronous from the start. This can be achieved by wrapping all time-consuming logic within a single background task created immediately upon receiving the request. test_async_chat_completionFurther Considerations
This simple patch is technically working, however there might still lots of work to be done:
@gaby commented on GitHub (Apr 18, 2025):
@tth37 I believe this is fixed by this https://github.com/open-webui/open-webui/pull/12958
The call to do web-search was blocking. It will now be async
@tth37 commented on GitHub (Apr 18, 2025):
@gaby Are you sure? 🤔 I've conducted experiments and #12958 seems not addressing this issue,
/api/chat/completionwas still blocked until the search process finished.In my opinion
await run_in_threadpool(search_web)is still a synchronous operation. To put it in a separate thread was intended to prevent thesearch_webfrom blocking FastAPI server application's IO handling. (Besides the query generation / tool executing are still definitely synchronous)@gaby commented on GitHub (Apr 18, 2025):
@tth37 Yes but it wont block the asyncio-loop, which is a big problem.
You mean api/chat/completion should return even though the search is not done? If that's the case, then yes that PR doesn't fix that issue.
I do agree that process_chat_response needs more async. For example it can use
async for, and the function inside should be async@tth37 commented on GitHub (Apr 18, 2025):
Yes it is exactly what I mean. I think
/api/chat/completionshould return as soon as possible (as soon as the background task is created).@gaby commented on GitHub (Apr 18, 2025):
That's going to break things, how is the caller suppose to know there's more data?
@tth37 commented on GitHub (Apr 18, 2025):
There is always a websocket connection alive in the background, which is responsible for updating response status.
For now
/api/chat/completionis indeed returned before the AI response is fully generated, but strictly after the searching process is finished.@gaby commented on GitHub (Apr 18, 2025):
That case only applies to the UI. If you use that route via API it doesnt use WebSocket, it's just HTTP?
@tth37 commented on GitHub (Apr 18, 2025):
Yes that's the case where
streamparameter is set toFalse, in that case the handler will fallback to the original response.b8fb4e528d/backend/open_webui/utils/middleware.py (L2265-L2266)When
stream=False, the/api/chat/completionsis a synchronous API call, and it's working fine. This issue is targeting asynchronous API call, especially from UI. The synchronous handler can safely remain unchange.@tth37 commented on GitHub (Apr 18, 2025):
@gaby @rgaricano Thank you for your attention! I've updated the issue for better describing the problem.
@tth37 commented on GitHub (Apr 18, 2025):
You can easily enable the very basic version of this feature by applying change like this: test_async_chat_completion (Yet only support asynchronous calling together with websocket)
@tjbck commented on GitHub (Aug 18, 2025):
Addressed with
d6f709574ein dev!