[PR #23661] [CLOSED] feat(audio): add AUDIO_STT_SKIP_PREPROCESSING to skip pydub preprocessing #42937

New Issue

GiteaMirror · 2026-04-25T14:41:53-05:00

GiteaMirror commented

2026-04-25 14:41:53 -05:00

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/23661
Author: @runixer
Created: 4/13/2026
Status: ❌ Closed

Base: dev ← Head: feat/skip-audio-preprocessing

📝 Commits (1)

b802da4 feat: add AUDIO_STT_SKIP_PREPROCESSING to bypass audio conversion/compression/splitting

📊 Changes

3 files changed (+41 additions, -21 deletions)

View changed files

📝 backend/open_webui/config.py (+6 -0)
📝 backend/open_webui/main.py (+2 -0)
📝 backend/open_webui/routers/audio.py (+33 -21)

📄 Description

Description

When uploading large audio files, pydub loads the entire file into RAM (3-5× expansion for AAC → PCM decoding), causing OOM in containers with normal memory limits. See #21515.

Self-hosted STT backends (vLLM Whisper with [audio] extras, faster-whisper servers, etc.) handle all formats natively via ffmpeg/PyAV and have no file size limit — preprocessing is unnecessary overhead.

This PR adds an env var AUDIO_STT_SKIP_PREPROCESSING (default: false) to skip convert_audio_to_mp3 / compress_audio / split_audio and send the file as-is to the STT backend. Fully backward-compatible. pydub imports are now lazy (inside the functions that use them).

Added

AUDIO_STT_SKIP_PREPROCESSING env var / admin config option

Fixed

OOM when uploading large audio files (#21515)

Breaking Changes

None. Default false preserves current behavior.

Testing

Kubernetes deployment with vLLM 0.19.0 Whisper:

File	Before	After
73 MB .m4a	OOMKill	OK, ~60s
81 MB .m4a	OOMKill	OK, ~65s
Both simultaneously	OOMKill in 10s	OK, no issues

Pod memory during processing: ~640 MiB (vs 4-6 GB spike → OOMKill before).

Running in production for ~5 days. No regressions, no OOMs, users uploading large audio files daily.

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/23661 **Author:** [@runixer](https://github.com/runixer) **Created:** 4/13/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `feat/skip-audio-preprocessing` --- ### 📝 Commits (1) - [`b802da4`](https://github.com/open-webui/open-webui/commit/b802da43dc213b9fd2558864c150af8ffc3f9622) feat: add AUDIO_STT_SKIP_PREPROCESSING to bypass audio conversion/compression/splitting ### 📊 Changes **3 files changed** (+41 additions, -21 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/config.py` (+6 -0) 📝 `backend/open_webui/main.py` (+2 -0) 📝 `backend/open_webui/routers/audio.py` (+33 -21) </details> ### 📄 Description # Description When uploading large audio files, `pydub` loads the entire file into RAM (3-5× expansion for AAC → PCM decoding), causing OOM in containers with normal memory limits. See #21515. Self-hosted STT backends (vLLM Whisper with `[audio]` extras, faster-whisper servers, etc.) handle all formats natively via ffmpeg/PyAV and have no file size limit — preprocessing is unnecessary overhead. This PR adds an env var `AUDIO_STT_SKIP_PREPROCESSING` (default: `false`) to skip `convert_audio_to_mp3` / `compress_audio` / `split_audio` and send the file as-is to the STT backend. Fully backward-compatible. pydub imports are now lazy (inside the functions that use them). ### Added - `AUDIO_STT_SKIP_PREPROCESSING` env var / admin config option ### Fixed - OOM when uploading large audio files (#21515) ### Breaking Changes - None. Default `false` preserves current behavior. --- ## Testing Kubernetes deployment with vLLM 0.19.0 Whisper: | File | Before | After | |------|--------|-------| | 73 MB .m4a | OOMKill | OK, ~60s | | 81 MB .m4a | OOMKill | OK, ~65s | | Both simultaneously | OOMKill in 10s | OK, no issues | Pod memory during processing: ~640 MiB (vs 4-6 GB spike → OOMKill before). Running in production for ~5 days. No regressions, no OOMs, users uploading large audio files daily. ### Contributor License Agreement - [x] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

GiteaMirror added the pull-request label 2026-04-25 14:41:53 -05:00

GiteaMirror closed this issue

2026-04-25 14:41:55 -05:00

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#42937