[PR #13061] [CLOSED] perf: Asynchronous process_chat_payload in chat completion #9879

New Issue

GiteaMirror · 2025-11-11T18:33:56-06:00

GiteaMirror commented

2025-11-11 18:33:56 -06:00

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/13061
Author: @tth37
Created: 4/19/2025
Status: ❌ Closed

Base: dev ← Head: perf_async_payload_processing

📝 Commits (4)

89199f2 perf: Asynchronous process_chat_payload in chat completion
d7f5f20 fix: Error handling on task-cancelled
0b5de38 fix: Error handling when chatting with multiple models
4c9ff0a fix: Enhanced persistent error messages

📊 Changes

3 files changed (+72 additions, -18 deletions)

View changed files

📝 backend/open_webui/main.py (+55 -12)
📝 backend/open_webui/utils/middleware.py (+4 -6)
📝 src/lib/components/chat/Chat.svelte (+13 -0)

📄 Description

Problem Description

The complete problem description is detailed in #13027.

For short, when using the /api/chat/completions endpoint asynchronously (typically from browser Web UI), the server should return a task_id immediately so the UI can track process and allow early cancellation.

However, if features like Web Search or Tools are enabled, the server waits for this preprocessing (process_chat_payload) to finish before returning task_id, leading two main issues:

Delayed Cancellation: Users cannot stop the request during the long preprocessing phase because UI doesn't have the task_id yet.
Network Timeouts: Long wait of this endpoint increase the chance of requests failing due to network timeouts, commonly when using reverse proxies or intranet penetrations.

Solution

This PR changes the behavior for asynchronous requests:

Check if the request is async (event_emitter and event_caller exists) at the very beginning of the chat completion handler.
If it is an async request:
- Immediately create a background task that handle both process_chat_payload and process_chat_response
- Return the task_id to the client right away.
If it is not a streaming request:
- Keep the original behavior: process everything synchronously and return the StreamingResponse

Error handling

I extended the previously unhandled task-cancelled event. This event can be triggered by either asyncio.CancelledError, or when the process_chat_payload_and_response is running as a background job.

Formerly, when errors are encountered during process_chat_payload or chat_completion_handler, the /api/chat/completions endpoint would directly return the error message. Now that all the jobs are processed as a backend task, we must use WebSocket events to tell the browser about error details.

When the browser receives the task-cancelled event, it will mark currentMessage.done = true and re-fetch current taskIds, to ensure that the messages and the stop button behaves properly. This ensures the exact same user experience as the previous error handling logic, and robust error handling when chatting with multiple models.

Additional Messages

I've tried my best to keep the original structure of codebase, make minimal changes, and ensure backward-compatibility.

Tests

For sync requests, I used curl to test the chat completion endpoints, and they are behaving the same as original version:

curl -X POST "http://localhost:8080/api/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <my_token>" \
    -d '{
             "model": "deepseek-v3",
             "messages": [{
                   "role": "user",
                   "content": "hi"
              }],
              "stream": true
        }'

curl -X POST "http://localhost:8080/api/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <my_token>" \
    -d '{
             "model": "deepseek-v3",
             "messages": [{
                   "role": "user",
                   "content": "hi"
              }],
              "stream": false
        }'

For async requests, I tested the endpoint via browser Web UI. The endpoint returns with task_id very quickly and before the web search process ended. Upon receiving task_id, I am able to cancel the generation task at any time.

Error handling:

Error in process_chat_payload phase
Error in process_chat_response phase
task-cancelled error when user early stop a request
Stop button behavior when chatting with multiple models

More test cases / discussions are welcomed!

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/13061 **Author:** [@tth37](https://github.com/tth37) **Created:** 4/19/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `perf_async_payload_processing` --- ### 📝 Commits (4) - [`89199f2`](https://github.com/open-webui/open-webui/commit/89199f2fe70da4094af5c9b3271766b686f4506e) perf: Asynchronous `process_chat_payload` in chat completion - [`d7f5f20`](https://github.com/open-webui/open-webui/commit/d7f5f20b901735dc317dc1c98e86622a37e8396c) fix: Error handling on `task-cancelled` - [`0b5de38`](https://github.com/open-webui/open-webui/commit/0b5de38ae6f33eee4095a8752e23b48ef7a3a9f3) fix: Error handling when chatting with multiple models - [`4c9ff0a`](https://github.com/open-webui/open-webui/commit/4c9ff0a83b9bd7f7999e5573ee38e1143bfe4c5a) fix: Enhanced persistent error messages ### 📊 Changes **3 files changed** (+72 additions, -18 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/main.py` (+55 -12) 📝 `backend/open_webui/utils/middleware.py` (+4 -6) 📝 `src/lib/components/chat/Chat.svelte` (+13 -0) </details> ### 📄 Description ### Problem Description The complete problem description is detailed in #13027. For short, when using the `/api/chat/completions` endpoint asynchronously (typically from browser Web UI), the server should return a `task_id` immediately so the UI can track process and allow early cancellation. However, if features like Web Search or Tools are enabled, the server waits for this preprocessing (`process_chat_payload`) to finish **before** returning `task_id`, leading two main issues: 1. **Delayed Cancellation**: Users cannot stop the request during the long preprocessing phase because UI doesn't have the `task_id` yet. 2. **Network Timeouts**: Long wait of this endpoint increase the chance of requests failing due to network timeouts, _commonly_ when using reverse proxies or intranet penetrations. ![Image](https://github.com/user-attachments/assets/e89c46a1-b4d5-428e-9f78-6009369a5d6a) ### Solution This PR changes the behavior for asynchronous requests: 1. Check if the request is async (`event_emitter` and `event_caller` exists) at the very beginning of the chat completion handler. 2. If it *is* an async request: - Immediately create a background task that handle *both* `process_chat_payload` and `process_chat_response` - Return the `task_id` to the client right away. 3. If it *is not* a streaming request: - Keep the original behavior: process everything synchronously and return the StreamingResponse ![image](https://github.com/user-attachments/assets/6145000a-f01e-41bf-816f-da1c753588c1) ### Error handling I extended the previously unhandled `task-cancelled` event. This event can be triggered by either `asyncio.CancelledError`, or when the `process_chat_payload_and_response` is running as a background job. Formerly, when errors are encountered during `process_chat_payload` or `chat_completion_handler`, the `/api/chat/completions` endpoint would directly return the error message. Now that all the jobs are processed as a backend task, we must use WebSocket events to tell the browser about error details. When the browser receives the `task-cancelled` event, it will mark `currentMessage.done = true` and re-fetch current taskIds, to ensure that the messages and the stop button behaves properly. This ensures the exact same user experience as the previous error handling logic, and robust error handling when chatting with multiple models. ### Additional Messages I've tried my best to keep the original structure of codebase, make minimal changes, and ensure backward-compatibility. ### Tests For sync requests, I used `curl` to test the chat completion endpoints, and they are behaving the same as original version: ```bash curl -X POST "http://localhost:8080/api/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <my_token>" \ -d '{ "model": "deepseek-v3", "messages": [{ "role": "user", "content": "hi" }], "stream": true }' ``` ```bash curl -X POST "http://localhost:8080/api/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <my_token>" \ -d '{ "model": "deepseek-v3", "messages": [{ "role": "user", "content": "hi" }], "stream": false }' ``` For async requests, I tested the endpoint via browser Web UI. The endpoint returns with `task_id` very quickly and *before* the web search process ended. Upon receiving `task_id`, I am able to cancel the generation task at any time. Error handling: - [x] Error in `process_chat_payload` phase - [x] Error in `process_chat_response` phase - [x] `task-cancelled` error when user early stop a request - [x] Stop button behavior when chatting with multiple models ![image](https://github.com/user-attachments/assets/533f77bf-0e5b-44c1-bb51-0b40a1f83c40) More test cases / discussions are welcomed! --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

GiteaMirror added the pull-request label 2025-11-11 18:33:56 -06:00

GiteaMirror closed this issue

2025-11-11 18:33:56 -06:00

GiteaMirror referenced this issue

2026-04-20 04:08:43 -05:00

[PR #9879] [CLOSED] feat: Automatically expand and collapse reasoning blocks #22430

GiteaMirror referenced this issue

2026-04-25 11:15:18 -05:00

[PR #9879] [CLOSED] feat: Automatically expand and collapse reasoning blocks #38060

GiteaMirror referenced this issue

2026-04-29 19:52:38 -05:00

[PR #9879] [CLOSED] feat: Automatically expand and collapse reasoning blocks #45478