mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-12 01:54:38 -05:00
feat: whisper integration #145
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @adan89lion on GitHub (Jan 3, 2024).
Is your feature request related to a problem? Please describe.
The current voice input implementation is only supported by Safari. Also, it is highly unreliable due to constant typos and lack of bilingual or multilingual support. For instance, I'm studying foreign languages with the help of LLMs, and I'm unable to ask questions with both English and a second language. OpenAI has the ability to do that with Whisper model and it has been extremely helpful.
Describe the solution you'd like
Describe alternatives you've considered
N/A
Additional context
Whisper definitely outperforms OS's built-in voice input (with automatic punctuation, seamless multi-language support, great coverage of rare words).
Currently, only Safari integrate speech-to-text at OS level, making the voice input button useless on non-Apple devices or Chromium-based/Gecko-based browsers.
This is an extension to the feature request #49
@tjbck commented on GitHub (Jan 4, 2024):
Hi, Thanks for the feature request! FYI, voice input feature should work with chrome on non-apple devices as well, so if you're facing issues with chromium based browsers, please let us know! As for the feature request, I'll take a look in the near future and assess it's usability/feasibility. Thanks!
@ThatOneCalculator commented on GitHub (Jan 4, 2024):
For me it works on Chromium and Firefox on Linux. No clue what OP is talking about with the whole "only safari" nonsense.
@adan89lion commented on GitHub (Jan 4, 2024):
@ThatOneCalculator I just confirmed that voice input works on Chrome now. The reason it wasn't working is because the lack of a valid SSL certificate (I'm running on localhost all the time). Using an HTTPS connection successfully let Chrome prompt me for microphone permission.
As for Firefox, I'm not sure why it still don't work. The browser didn't prompt me to enable microphone access. I've:
@tarbard commented on GitHub (Jan 4, 2024):
Chrome has a default security policy where it doesn't allow microphone usage for non HTTPS sites, another workaround is that you can whitelist insecure sites within chrome settings.
@coder543 commented on GitHub (Jan 9, 2024):
I would love for this option to exist just because Whisper is so much more accurate.
@justinh-rahb commented on GitHub (Jan 18, 2024):
Related: #126
@tjbck commented on GitHub (Jan 21, 2024):
Personal major blocker for this at the moment is how convoluted the whisper installation process can be to have it run locally, so I'm hoping Ollama team would include whisper.cpp to their project. It would streamline the whole installation process for everyone, and would align better with our project ethos of local first approach.
@Collected5353 commented on GitHub (Jan 24, 2024):
https://github.com/oobabooga/text-generation-webui/blob/main/extensions/whisper_stt/script.py
We would need to add some packages to the docker image as well or your local system. I still think this is the way to go for audio input: webkitSpeechRecognition
@tjbck Looks like whispper.cpp has a WASM available: https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.wasm
Might need to add a line to the documentation that it requires SSL for most browser security as others have mentioned here. Something like ngrok free plan can work.
https://github.com/oobabooga/text-generation-webui/blob/main/extensions/coqui_tts/script.py
Is using Coqui, does anyone hav experience using Coqui_tts or silero_tts?
We could pull down and container in the compose also to just use a tts from the api.
https://docs.coqui.ai/en/latest/marytts.html
@tjbck commented on GitHub (Feb 11, 2024):
Whisper support has been added with #707! FYI, whisper STT will only work over
httpsand It'll take some time to download the whisper model when you first use it so please give it a minute or two to finish downloading! Let me know if you guys encounter any issues, thanks!@tino926 commented on GitHub (Mar 4, 2024):
@tjbck
how can i use open webui over https? I follow the instruction at this repo to install the ollama and open-webui docker on a computer. If i connect to open-webui from another computer with https, is always show message like:
Secure Connection Failed
An error occurred during a connection to xxx.xxx.xxx.xxx:3000. SSL received a record that exceeded the maximum permissible length.
Error code: SSL_ERROR_RX_RECORD_TOO_LONG
@justinh-rahb commented on GitHub (Mar 4, 2024):
The truth is that there is no simple way to set up HTTPS for a web application that does not require careful attention and expertise of the user. "Easy" HTTPS requires exposure to the Internet in most cases. Running a web application exposed to the internet is a serious responsibility, and it is essential to take all necessary precautions to ensure the integrity and confidentiality of your data and systems. Providing simplified instructions for setting up HTTPS can be irresponsible in the hands of those that don't fully understand the implications of such.
@phyzical commented on GitHub (May 19, 2024):
Just curious if support for using a external hosted whisper container will be added one day? just as i run one for other stuff, feels redundant to also have the model in webui, maybe inversely if webui exposed whisper to be used by other services?
@coder543 commented on GitHub (May 19, 2024):
@phyzical out of curiosity, which whisper container do you use (to be clear, I have not contributed to open-webui, but I am curious about a whisper server)
@malteneuss commented on GitHub (May 19, 2024):
@justinh-rahb Just wanted to further promote Let's encrypt "DNS-01" challenges to get browser-trusted https certificates for local self-host services without internet exposure. The nice overview can be found by "Wolfgangs channel" at https://www.youtube.com/watch?v=qlcVx-k-02E.
Also i would like to promote NixOS, which is gaining a lot of traction and has one of the easiest ways to setup Let's encrypt using a Nginx reverse proxy i found so far. The rough amount of config code, once you have some familiarity, is:
See "Minimal Private Local LAN Server Example" at https://wiki.nixos.org/wiki/Nginx for more details.
Furthermore NixOS already built-in Ollama as a web service support. The amount of code to enable an Ollama service is:
@phyzical commented on GitHub (May 20, 2024):
@coder543 https://github.com/ahmetoner/whisper-asr-webservice
@mikael1234 commented on GitHub (Sep 25, 2024):
Is there some settings for Whisper? Its producing random garbage in random languages in Chrome. Same with Web API