[PR #24302] [CLOSED] fix: prevent STT from blocking the uvicorn event loop #66447

New Issue

GiteaMirror · 2026-05-06T12:48:53-05:00

GiteaMirror commented

2026-05-06 12:48:53 -05:00

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/24302
Author: @gaurav0107
Created: 5/1/2026
Status: ❌ Closed

Base: dev ← Head: fix/24169-stt-blocks-event-loop

📝 Commits (1)

49fc6fd fix: prevent STT from blocking the uvicorn event loop

📊 Changes

2 files changed (+4 additions, -3 deletions)

View changed files

📝 backend/open_webui/routers/audio.py (+3 -2)
📝 backend/open_webui/routers/files.py (+1 -1)

📄 Description

Description

Fixes #24169.

When a user uploads audio to POST /api/v1/audio/transcriptions, the whole Open WebUI server becomes unresponsive until transcription finishes. Other users see ECONNREFUSED and dropped Socket.IO WebSockets.

Root cause (already diagnosed in detail by @Classic298 on #24169):

backend/open_webui/routers/audio.py::transcription is async def but
- reads the upload synchronously via file.file.read(), and
- calls the sync helper transcribe(...) directly.
transcribe(...) uses its own ThreadPoolExecutor and blocks the calling thread on future.result() until every chunk is transcribed by faster-whisper.

With the default UVICORN_WORKERS=1, one event loop serves both HTTP and Socket.IO, so a multi-minute CPU transcription stalls every other user.

The exact same anti-pattern exists in backend/open_webui/routers/files.py::process_uploaded_file._process_handler, which also calls transcribe(...) synchronously from an async function. It is fixed in the same PR.

What this PR does

backend/open_webui/routers/audio.py:

Add import asyncio.
Replace contents = file.file.read() with contents = await file.read() (FastAPI's async UploadFile.read()).
Wrap transcribe(...) in await asyncio.to_thread(...).

backend/open_webui/routers/files.py:

Wrap the transcribe(...) call in await asyncio.to_thread(...). asyncio is already imported, and the preceding line already uses the same asyncio.to_thread pattern for Storage.get_file.

transcribe() itself is untouched — it is a sync function that correctly uses a ThreadPoolExecutor for chunk parallelism. The fix is just to stop running it on the event loop.

Net diff: +4 / -3 across 2 files, no new tests, no new dependencies, no env-var changes, no public-API changes.

Before / after (manual verification reasoning)

Before: a single STT request holds the sole uvicorn event loop until faster-whisper finishes; a concurrent GET / or Socket.IO frame is queued behind the transcription and times out. Reporter's log shows ~18s of inference for 5s of audio; a 30s clip produces ~2 minutes of server freeze.

After: await file.read() releases the loop during upload I/O; await asyncio.to_thread(transcribe, ...) schedules the CPU-bound work on the default thread pool so the event loop stays free for chat streaming, WebSocket heartbeats, and other users. ENABLE_WEBSOCKET_SUPPORT=false is no longer needed as a workaround.

Testing

What I have verified:

ruff format --check passes on both modified files (the repo's only blocking backend CI check).
Python AST parse succeeds on both files.
Traced every caller of transcribe() in the backend: exactly two call sites exist (audio.py::transcription and files.py::process_uploaded_file._process_handler), both fixed here.
Behavior-preserving swap confirmed by reading the source of each replaced call: UploadFile.read() returns the same bytes as the raw SpooledTemporaryFile.read() underneath file.file, and await asyncio.to_thread(fn, *args) invokes fn(*args) on the default thread pool and returns its return value unchanged.
The pattern await asyncio.to_thread(...) is already used on the preceding line of files.py::_process_handler (for Storage.get_file), so this fix follows an idiom already in the file.

What I have NOT personally verified (please gate on this):

I have not set up a faster-whisper model and run a live STT request with concurrent traffic. @Classic298 correctly pointed out that the comment in #24169 was a "could be a fix", not a tested patch, and that it needs verification. I am keeping this PR in DRAFT until someone with an STT setup (reporter @Mastersomy or @Classic298) can confirm the concurrent-request behavior under a real transcription. Happy to iterate on findings.

Changelog

Fixed: Speech-to-text (STT) transcription no longer blocks the server event loop. Other users can continue using chat and Socket.IO while a transcription is in flight. (#24169)

Checklist

Target branch: dev.
Description and changelog included above.
Dependencies: none added. asyncio.to_thread has been available since Python 3.9; the project requires >=3.11.
Testing: static/CI checks pass locally. I have NOT run a real faster-whisper transcription; keeping as DRAFT pending maintainer or reporter verification of the concurrent-request behavior.
Code review: self-reviewed; follows the asyncio.to_thread pattern already used on the preceding line of files.py.
Design & Architecture: minimal, local fix; no new settings, no refactor of transcribe().
Git Hygiene: one atomic commit on top of dev.
Title prefix: fix:.

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/24302 **Author:** [@gaurav0107](https://github.com/gaurav0107) **Created:** 5/1/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `fix/24169-stt-blocks-event-loop` --- ### 📝 Commits (1) - [`49fc6fd`](https://github.com/open-webui/open-webui/commit/49fc6fdf07fae2e5e8aedcd042f1b02a0ec6c652) fix: prevent STT from blocking the uvicorn event loop ### 📊 Changes **2 files changed** (+4 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/routers/audio.py` (+3 -2) 📝 `backend/open_webui/routers/files.py` (+1 -1) </details> ### 📄 Description ### Description Fixes #24169. When a user uploads audio to `POST /api/v1/audio/transcriptions`, the whole Open WebUI server becomes unresponsive until transcription finishes. Other users see `ECONNREFUSED` and dropped Socket.IO WebSockets. Root cause (already diagnosed in detail by @Classic298 on #24169): - `backend/open_webui/routers/audio.py::transcription` is `async def` but - reads the upload synchronously via `file.file.read()`, and - calls the sync helper `transcribe(...)` directly. - `transcribe(...)` uses its own `ThreadPoolExecutor` and blocks the calling thread on `future.result()` until every chunk is transcribed by faster-whisper. With the default `UVICORN_WORKERS=1`, one event loop serves both HTTP and Socket.IO, so a multi-minute CPU transcription stalls every other user. The exact same anti-pattern exists in `backend/open_webui/routers/files.py::process_uploaded_file._process_handler`, which also calls `transcribe(...)` synchronously from an `async` function. It is fixed in the same PR. ### What this PR does **`backend/open_webui/routers/audio.py`:** - Add `import asyncio`. - Replace `contents = file.file.read()` with `contents = await file.read()` (FastAPI's async `UploadFile.read()`). - Wrap `transcribe(...)` in `await asyncio.to_thread(...)`. **`backend/open_webui/routers/files.py`:** - Wrap the `transcribe(...)` call in `await asyncio.to_thread(...)`. `asyncio` is already imported, and the preceding line already uses the same `asyncio.to_thread` pattern for `Storage.get_file`. `transcribe()` itself is untouched — it is a sync function that correctly uses a `ThreadPoolExecutor` for chunk parallelism. The fix is just to stop running it *on the event loop*. Net diff: **+4 / -3** across 2 files, no new tests, no new dependencies, no env-var changes, no public-API changes. ### Before / after (manual verification reasoning) Before: a single STT request holds the sole uvicorn event loop until faster-whisper finishes; a concurrent `GET /` or Socket.IO frame is queued behind the transcription and times out. Reporter's log shows ~18s of inference for 5s of audio; a 30s clip produces ~2 minutes of server freeze. After: `await file.read()` releases the loop during upload I/O; `await asyncio.to_thread(transcribe, ...)` schedules the CPU-bound work on the default thread pool so the event loop stays free for chat streaming, WebSocket heartbeats, and other users. `ENABLE_WEBSOCKET_SUPPORT=false` is no longer needed as a workaround. ### Testing What I have verified: 1. `ruff format --check` passes on both modified files (the repo's only blocking backend CI check). 2. Python AST parse succeeds on both files. 3. Traced every caller of `transcribe()` in the backend: exactly two call sites exist (`audio.py::transcription` and `files.py::process_uploaded_file._process_handler`), both fixed here. 4. Behavior-preserving swap confirmed by reading the source of each replaced call: `UploadFile.read()` returns the same `bytes` as the raw `SpooledTemporaryFile.read()` underneath `file.file`, and `await asyncio.to_thread(fn, *args)` invokes `fn(*args)` on the default thread pool and returns its return value unchanged. 5. The pattern `await asyncio.to_thread(...)` is already used on the preceding line of `files.py::_process_handler` (for `Storage.get_file`), so this fix follows an idiom already in the file. What I have NOT personally verified (please gate on this): - I have not set up a faster-whisper model and run a live STT request with concurrent traffic. @Classic298 correctly pointed out that the comment in #24169 was a "could be a fix", not a tested patch, and that it needs verification. I am keeping this PR in DRAFT until someone with an STT setup (reporter @Mastersomy or @Classic298) can confirm the concurrent-request behavior under a real transcription. Happy to iterate on findings. ### Changelog - Fixed: Speech-to-text (STT) transcription no longer blocks the server event loop. Other users can continue using chat and Socket.IO while a transcription is in flight. (#24169) ### Checklist - [x] Target branch: `dev`. - [x] Description and changelog included above. - [x] Dependencies: none added. `asyncio.to_thread` has been available since Python 3.9; the project requires >=3.11. - [x] Testing: static/CI checks pass locally. I have NOT run a real faster-whisper transcription; keeping as DRAFT pending maintainer or reporter verification of the concurrent-request behavior. - [x] Code review: self-reviewed; follows the `asyncio.to_thread` pattern already used on the preceding line of `files.py`. - [x] Design & Architecture: minimal, local fix; no new settings, no refactor of `transcribe()`. - [x] Git Hygiene: one atomic commit on top of `dev`. - [x] Title prefix: `fix:`. ### Contributor License Agreement  - [x] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

GiteaMirror added the pull-request label 2026-05-06 12:48:53 -05:00

GiteaMirror closed this issue

2026-05-06 12:48:55 -05:00

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#66447