mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-22 14:13:08 -05:00
[PR #6894] [CLOSED] Removing Silence from Audio Files for Local OpenAI/Whisper Models to Prevent Hallucinations #8770
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/6894
Author: @AliveDedSec
Created: 11/13/2024
Status: ❌ Closed
Base:
main← Head:main📝 Commits (4)
fd1df97Add files via upload6d077ecDelete backend/open_webui/apps/audio/main.py7016bf4Rename main1.py to main.py100d11fUpdate main.py📊 Changes
1 file changed (+47 additions, -12 deletions)
View changed files
📝
backend/open_webui/apps/audio/main.py(+47 -12)📄 Description
Proposal for Enhancement in Open WebUI: Silence Removal in Audio Files Before Processing with Whisper Model to Improve Speech Recognition Quality
Dear Developers,
In the current version of Open WebUI (v0.3.35), using local Whisper models for continuous real-time communication can lead to issues. When the Call mode (headphones icon to the right of the microphone) is activated and the user steps away from the computer, the model often listens to extended periods of silence. This situation results in Whisper generating random, nonsensical output when interaction resumes, instead of a meaningful response.
To address this issue, I developed a code that removes silence from audio files before they are processed by the model. This solution avoids "hallucinations" and greatly improves speech recognition quality. It allows for smooth and meaningful interaction with the Open WebUI voice assistant, eliminating unwanted noise and random text. This finally allowed me to communicate in voice assistant mode without the Whisper hallucinations!
Solution Overview:
My code was developed specifically for Open WebUI v0.3.35, and I cannot directly submit it to the main development branch, as there are significant differences in code structure that would require adaptation. However, implementing a similar solution in the latest version of WebUI would be extremely beneficial.
Optimization Recommendation: Silence removal could be accelerated using GPU processing, which would offer a significant boost in real-time applications. GPU-based parallel processing would enhance the speed of audio filtering, delivering high-quality and efficient user interaction.
Thank you for your hard work!
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.