mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
[GH-ISSUE #24371] issue: Worker dies (SIGKILL) on Knowledge Base upload of large MP3 files — pydub/ffmpeg pre-conversion ignores configured remote STT engine #58949
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @CallSohail on GitHub (May 5, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/24371
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
v0.9.2-cuda
Ollama Version (if applicable)
No response
Operating System
N/A — issue is independent of Ollama. Ollama is used only for embeddings (snowflake-arctic-embed2:latest), not for transcription.
Browser (if applicable)
Debian 12 (host) — Open WebUI runs in Docker (ghcr.io/open-webui/open-webui:v0.9.2-cuda) on Linux 6.1.0-41-amd64
Confirmation
README.md.Expected Behavior
When the Speech-to-Text engine is configured to a remote OpenAI-compatible endpoint (in my case a self-hosted WhisperX proxy at
http://whisperx-proxy:8767/v1withlarge-v3), uploading an audio file to a Knowledge Base should:The size of the audio file should not matter as long as the configured remote endpoint can handle it. My WhisperX endpoint handles a 400 MB MP3 with no issue when called directly via
curl.Actual Behavior
For small audio files (≤ ~10 MB), the flow works as expected and the request reaches
http://whisperx-proxy:8767/v1/audio/transcriptionssuccessfully (HTTP 200, transcript returned and indexed).For large audio files (tested with a 346 MB MP3, ~3 hours of audio), the Uvicorn worker is killed mid-request. The request never reaches the configured remote STT endpoint.
Root cause from the logs:
In
open_webui/routers/audio.py, thetranscribe()function unconditionally callsconvert_audio_to_mp3(), which in turn invokespydub.AudioSegment.from_file(...). pydub spawns ffmpeg to fully decode the MP3 into rawpcm_s16leand pipes the entire decoded PCM back into Python memory as anAudioSegmentobject.For a 3-hour MP3 this is approximately 3.5 GB of decoded PCM held in Python memory inside a single synchronous request handler. The Uvicorn worker is killed during this decode (
Child process [PID] died) and the container restarts.The same in-process pydub pre-decode runs even though the configured STT engine is a remote OpenAI-compatible endpoint that has no 25 MB limit and could ingest the original MP3 directly. The pydub conversion + the chunking logic that follows it appear to be hardcoded for OpenAI's hosted Whisper API limit (25 MB) and incorrectly applied to all
openai-flavoured engines, including local/remote OpenAI-compatible servers.Net result: any audio file large enough that the decoded PCM exceeds available worker memory crashes the worker — even when the configured remote engine could have handled it natively.
Steps to Reproduce
Environment
client_max_body_size 1024Mwhisperx-proxy:87671. Run a remote OpenAI-compatible STT server
Any OpenAI-compatible Whisper server works. I use WhisperX behind a small proxy that exposes
/v1/audio/transcriptions. Verify it works:2. Deploy Open WebUI v0.9.2-cuda via docker-compose
docker-compose.yml(relevant parts):3. Configure Speech-to-Text
In Open WebUI:
https://<your-host>/admin/settingsOpenAIhttp://whisperx-proxy:8767/v1large-v34. Verify small audio works (control test)
whisperx-proxy:8767/v1/audio/transcriptionsreturning HTTP 200.✅ Expected: works.
✅ Actual: works.
5. Reproduce the bug with a large audio file
INFO: Child process [PID] diedand the container restarts.❌ Expected: file is transcribed via the configured WhisperX endpoint.
❌ Actual: Uvicorn worker dies mid-decode; the configured STT endpoint is never called.
6. Verify the remote endpoint is not at fault
From the same Docker network, send the exact same MP3 directly to the WhisperX endpoint, bypassing Open WebUI:
✅ This returns a complete transcript with HTTP 200 in normal time. The remote endpoint is healthy and capable. The bug is purely in Open WebUI's pre-conversion path.
Logs & Screenshots
A) Successful small file (2 MB MP3) — full happy path
B) Failed large file (346 MB MP3) — worker killed mid-decode
The second
ffmpeg(re-encode WAV → MP3), theconvert_audio_to_mp3completion log, and the urllib3 POST towhisperx-proxy:8767never appear — the worker dies during the first decode pass.C) Direct call to WhisperX with the same file — succeeds
D) System state at time of failure
System has 53 GB available, container is not OOMKilled at the cgroup level, the host has no memory pressure. The kill is happening at the Uvicorn worker level during the pydub decode of a multi-GB in-Python
AudioSegment.E) Confirmed STT routing for small file
The successful 2 MB run shows the request does reach the configured WhisperX endpoint:
So the engine config is correct. The bug is the unconditional pydub pre-decode that runs before dispatch.
Additional Information
No response
@owui-terminator[bot] commented on GitHub (May 5, 2026):
🔍 Similar Issues Found
I found some existing issues that might be related. Please check if any of these are duplicates or contain helpful solutions:
#23014 issue: file upload as knowledge base to agent fails to respond and results in object not iterable
by sanchitbhavsar ·
bug#15535 issue: Plain text file upload to knowledge fails with 400: 'NoneType' object has no attribute 'encode'
by GanizaniSitara ·
bug#15702 issue: Failed uploading large markdown files to Knowledge
by raymondhs ·
bug#15828 issue: Unable to upload document in chat / 0.6.16
by GlisseManTV ·
bug#14336 issue: Memory Leak when uploading files to Knowledge
by FringeNet ·
bug💡 If this is a duplicate, consider closing it and adding details to the existing issue.
This comment was generated automatically. React with 👍 if helpful, 👎 if not.