feat: audio transcription playground #500

@flefevre commented on GitHub (Oct 23, 2024):

In scientific research,it will be a very good feature to be able to record a meeting and then summarize it, and keep it in the workspace. Perhaps it should be compatible with milvus to store the audio and the notes?

I have used https://github.com/JigsawStack/insanely-fast-whisper-api and
https://github.com/Vaibhavs10/insanely-fast-whisper

@flefevre commented on GitHub (Oct 23, 2024): In scientific research,it will be a very good feature to be able to record a meeting and then summarize it, and keep it in the workspace. Perhaps it should be compatible with milvus to store the audio and the notes? I have used https://github.com/JigsawStack/insanely-fast-whisper-api and https://github.com/Vaibhavs10/insanely-fast-whisper

GiteaMirror commented

@Trapper4888 commented on GitHub (Oct 29, 2024):

To add my 2 cents:
Since openwebui has an integrated whisper running (and api possibility), it really feels like a wasted opportunity to not be able to use it directly. Same goes for TTS. I imagine a lot of the code is already there since they both are used behind the scenes.

But I have to acknowledge that openwebui is supposed to be a t2t UI, and starting to do stt and tts may be out of scope and increase complexity. In a perfect word, I host my own openai api whisper docker, connect it to openwebui docker, and for direct whisper usage I use another docker with a proper openai api compatible tts webui.

Still, would be very cool to have basic stt and tts using microphone and files in openwebui.

@Trapper4888 commented on GitHub (Oct 29, 2024): To add my 2 cents: Since openwebui has an integrated whisper running (and api possibility), it really feels like a wasted opportunity to not be able to use it directly. Same goes for TTS. I imagine a lot of the code is already there since they both are used behind the scenes. But I have to acknowledge that openwebui is supposed to be a t2t UI, and starting to do stt and tts may be out of scope and increase complexity. In a perfect word, I host my own openai api whisper docker, connect it to openwebui docker, and for direct whisper usage I use another docker with a proper openai api compatible tts webui. Still, would be very cool to have basic stt and tts using microphone and files in openwebui.

GiteaMirror commented

@hongbo-miao commented on GitHub (Nov 19, 2024):

It would be great to support some common video formats as well, thanks! ☺️

@hongbo-miao commented on GitHub (Nov 19, 2024): It would be great to support some common video formats as well, thanks! ☺️

GiteaMirror commented

@flefevre commented on GitHub (Dec 6, 2024):

By searching over the web I have found this project https://github.com/misbahsy/meetingmind

I wanted to highlight it because they have been thinking the user interface to ease the interaction.

Automatic extraction of key information:
Tasks
Decisions
Questions
Insights
Deadlines
Attendees
Follow-ups
Risks
Agenda

The different screenshots are very inspiring.

Hope these elements could help 'open webui' to find some key ideas.

@flefevre commented on GitHub (Dec 6, 2024): By searching over the web I have found this project https://github.com/misbahsy/meetingmind I wanted to highlight it because they have been thinking the user interface to ease the interaction. Automatic extraction of key information: Tasks Decisions Questions Insights Deadlines Attendees Follow-ups Risks Agenda The different screenshots are very inspiring. Hope these elements could help 'open webui' to find some key ideas.

GiteaMirror commented

@rjmalagon commented on GitHub (Dec 8, 2024):

This is a highly valuable feature.
Tika document text extraction and YouTube transcription extraction allow for a very diverse origin text work. The latter is more akin to an indirect free Google STT.
A simple audio file upload for a direct STT is a good start, but I admit that a more powerful audio transcription tool set via internal or external tool integration is a worthy milestone to wait for developer time resources.

@rjmalagon commented on GitHub (Dec 8, 2024): This is a highly valuable feature. Tika document text extraction and YouTube transcription extraction allow for a very diverse origin text work. The latter is more akin to an indirect free Google STT. A simple audio file upload for a direct STT is a good start, but I admit that a more powerful audio transcription tool set via internal or external tool integration is a worthy milestone to wait for developer time resources.

GiteaMirror commented

@flefevre commented on GitHub (Dec 10, 2024):

The diarization feature could be important in order to be able to make resume with list of actions assigned to a specif user.
I am just aggregating idea in order to help to define the perimeter of this highly valuable feature.
Hope it make sense for the open webui team

@flefevre commented on GitHub (Dec 10, 2024): The diarization feature could be important in order to be able to make resume with list of actions assigned to a specif user. I am just aggregating idea in order to help to define the perimeter of this highly valuable feature. Hope it make sense for the open webui team

GiteaMirror commented

@lollylan commented on GitHub (Dec 11, 2024):

I would love a live transcription from the microphone, this would speed up my workload (transcribing doctor-patient-interactions for hands free documentation) so much. Right now I relay on windows 11 voice assistant but this solution is not good. There are ways of having whisper listen and transcribing in short intervals apparently (i am a doctor though and not a programmer so i cannot.implement that myself) this is much better that waiting for the entire interaction to be finished before sending it to whisper. Additionally, if something interrupts the transfer of the transcript file to the server everything is lost.

@lollylan commented on GitHub (Dec 11, 2024): I would love a live transcription from the microphone, this would speed up my workload (transcribing doctor-patient-interactions for hands free documentation) so much. Right now I relay on windows 11 voice assistant but this solution is not good. There are ways of having whisper listen and transcribing in short intervals apparently (i am a doctor though and not a programmer so i cannot.implement that myself) this is much better that waiting for the entire interaction to be finished before sending it to whisper. Additionally, if something interrupts the transfer of the transcript file to the server everything is lost.

GiteaMirror commented

@T-Herrmann-WI commented on GitHub (Feb 18, 2025):

I also would like to see the audio file transcript feature.

@T-Herrmann-WI commented on GitHub (Feb 18, 2025): I also would like to see the audio file transcript feature.

GiteaMirror commented

@ALIENvsROBOT commented on GitHub (Feb 20, 2025):

Would be also good if we could make it to run in background. If we upload the file and the transcription runs in background. Also selecting whisper model in playground maybe also selecting language would be cool feature.

@ALIENvsROBOT commented on GitHub (Feb 20, 2025): Would be also good if we could make it to run in background. If we upload the file and the transcription runs in background. Also selecting whisper model in playground maybe also selecting language would be cool feature.

GiteaMirror commented

@flefevre commented on GitHub (Feb 21, 2025):

Recently the project https://github.com/ahmetoner/whisper-asr-webservice has integrated whisperx with diarisation feature as an endpoint webservice.

@flefevre commented on GitHub (Feb 21, 2025): Recently the project https://github.com/ahmetoner/whisper-asr-webservice has integrated whisperx with diarisation feature as an endpoint webservice.

GiteaMirror commented

@T-Herrmann-WI commented on GitHub (Apr 2, 2025):

Dear @tjbck , at what version the audio file transcript feature could be implemented?

@T-Herrmann-WI commented on GitHub (Apr 2, 2025): Dear @tjbck , at what version the audio file transcript feature could be implemented?

GiteaMirror commented

@lollylan commented on GitHub (Apr 2, 2025):

A live transcription (or 30 second intervals) from the microphone would be a lifechanger für my usecase, I have a project where I enable doctors to transcribe and summarize Patient interactions and this would be invaluable. I use a script I (or ChatGPT) wrote (can be found here https://github.com/lollylan/asklaion) but having it in OWUI would be loads better.

@lollylan commented on GitHub (Apr 2, 2025): A live transcription (or 30 second intervals) from the microphone would be a lifechanger für my usecase, I have a project where I enable doctors to transcribe and summarize Patient interactions and this would be invaluable. I use a script I (or ChatGPT) wrote (can be found here https://github.com/lollylan/asklaion) but having it in OWUI would be loads better.

GiteaMirror commented

@gusman80 commented on GitHub (Apr 2, 2025):

Wouldnt it be possible to upload a file via openwebui API/UI -> and Pass file data to a custom modell, that uses a Filter/Functions python script to call the local whisperAI transcription implementation (def inlet)?

@gusman80 commented on GitHub (Apr 2, 2025): Wouldnt it be possible to upload a file via openwebui API/UI -> and Pass file data to a custom modell, that uses a Filter/Functions python script to call the local whisperAI transcription implementation (def inlet)?

GiteaMirror commented