mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
feat: audio transcription playground #500
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @g4challenge on GitHub (Mar 19, 2024).
Originally assigned to: @tjbck on GitHub.
Is your feature request related to a problem? Please describe.
I find it challenging when I need to manually transcribe audio content. Whether it’s interviews, meetings, or recorded conversations, having an automated audio transcription feature would significantly improve my workflow.
Describe the solution you’d like
I would like OpenWebUI to include an audio transcription feature. Ideally, it should accept audio files (such as MP3, WAV, or other common formats) and convert them into accurate text transcripts. The transcripts should be time-stamped and easily accessible within the interface.
Describe alternatives you’ve considered
As an alternative, I’ve explored third-party transcription services based on Whisper with UI (https://github.com/chidiwilliams/buzz , or https://github.com/jhj0517/Whisper-WebUI) but they often come with limitations in installation, sharing, privacy concerns, and additional costs and effort. Having an integrated solution within OpenWebUI would streamline the process and enhance the overall user experience.
Additional context
Sometimes, I participate in remote interviews or attend virtual meetings where audio recordings are essential. Having an in-built transcription feature would save time and effort, allowing me to focus on the content rather than manual transcription tasks. When finished I would love to have the ability, to input to a LLM with predefined prompts: eg. "use the following transcript to create a short precise summary in bullet point".
@arjunkrishna commented on GitHub (Apr 26, 2024):
Yes, having audio and video transcription would be a very useful feature.
@arjunkrishna commented on GitHub (Apr 26, 2024):
https://github.com/the-crypt-keeper/tldw
@rexkani commented on GitHub (Oct 21, 2024):
This is one of the main feature which i was looking for when i installed openwebui..
@flefevre commented on GitHub (Oct 23, 2024):
In scientific research,it will be a very good feature to be able to record a meeting and then summarize it, and keep it in the workspace. Perhaps it should be compatible with milvus to store the audio and the notes?
I have used https://github.com/JigsawStack/insanely-fast-whisper-api and
https://github.com/Vaibhavs10/insanely-fast-whisper
@Trapper4888 commented on GitHub (Oct 29, 2024):
To add my 2 cents:
Since openwebui has an integrated whisper running (and api possibility), it really feels like a wasted opportunity to not be able to use it directly. Same goes for TTS. I imagine a lot of the code is already there since they both are used behind the scenes.
But I have to acknowledge that openwebui is supposed to be a t2t UI, and starting to do stt and tts may be out of scope and increase complexity. In a perfect word, I host my own openai api whisper docker, connect it to openwebui docker, and for direct whisper usage I use another docker with a proper openai api compatible tts webui.
Still, would be very cool to have basic stt and tts using microphone and files in openwebui.
@hongbo-miao commented on GitHub (Nov 19, 2024):
It would be great to support some common video formats as well, thanks! ☺️
@flefevre commented on GitHub (Dec 6, 2024):
By searching over the web I have found this project https://github.com/misbahsy/meetingmind
I wanted to highlight it because they have been thinking the user interface to ease the interaction.
Automatic extraction of key information:
Tasks
Decisions
Questions
Insights
Deadlines
Attendees
Follow-ups
Risks
Agenda
The different screenshots are very inspiring.
Hope these elements could help 'open webui' to find some key ideas.
@rjmalagon commented on GitHub (Dec 8, 2024):
This is a highly valuable feature.
Tika document text extraction and YouTube transcription extraction allow for a very diverse origin text work. The latter is more akin to an indirect free Google STT.
A simple audio file upload for a direct STT is a good start, but I admit that a more powerful audio transcription tool set via internal or external tool integration is a worthy milestone to wait for developer time resources.
@flefevre commented on GitHub (Dec 10, 2024):
The diarization feature could be important in order to be able to make resume with list of actions assigned to a specif user.
I am just aggregating idea in order to help to define the perimeter of this highly valuable feature.
Hope it make sense for the open webui team
@lollylan commented on GitHub (Dec 11, 2024):
I would love a live transcription from the microphone, this would speed up my workload (transcribing doctor-patient-interactions for hands free documentation) so much. Right now I relay on windows 11 voice assistant but this solution is not good. There are ways of having whisper listen and transcribing in short intervals apparently (i am a doctor though and not a programmer so i cannot.implement that myself) this is much better that waiting for the entire interaction to be finished before sending it to whisper. Additionally, if something interrupts the transfer of the transcript file to the server everything is lost.
@T-Herrmann-WI commented on GitHub (Feb 18, 2025):
I also would like to see the audio file transcript feature.
@ALIENvsROBOT commented on GitHub (Feb 20, 2025):
Would be also good if we could make it to run in background. If we upload the file and the transcription runs in background. Also selecting whisper model in playground maybe also selecting language would be cool feature.
@flefevre commented on GitHub (Feb 21, 2025):
Recently the project https://github.com/ahmetoner/whisper-asr-webservice has integrated whisperx with diarisation feature as an endpoint webservice.
@T-Herrmann-WI commented on GitHub (Apr 2, 2025):
Dear @tjbck , at what version the audio file transcript feature could be implemented?
@lollylan commented on GitHub (Apr 2, 2025):
A live transcription (or 30 second intervals) from the microphone would be a lifechanger für my usecase, I have a project where I enable doctors to transcribe and summarize Patient interactions and this would be invaluable. I use a script I (or ChatGPT) wrote (can be found here https://github.com/lollylan/asklaion) but having it in OWUI would be loads better.
@gusman80 commented on GitHub (Apr 2, 2025):
Wouldnt it be possible to upload a file via openwebui API/UI -> and Pass file data to a custom modell, that uses a Filter/Functions python script to call the local whisperAI transcription implementation (def inlet)?
@flefevre commented on GitHub (Apr 2, 2025):
Just to be sure, when you upload a MP3 file it is taken by fasterwhisper and it translate it automatically.
So for me it is already implemented
@morbificagent commented on GitHub (Apr 3, 2025):
not here... if i upload an mp3 with a podcast and ask a question about it it doesnt know anything about that...
@ALIENvsROBOT commented on GitHub (Apr 3, 2025):
atleast audio transcription in background is also will be beneficial for people like researchers.
@tjbck commented on GitHub (May 1, 2025):
Merging this with #5990