feat: whisper integration #145

Closed
opened 2025-11-11 14:08:04 -06:00 by GiteaMirror · 16 comments
Owner

Originally created by @adan89lion on GitHub (Jan 3, 2024).

Is your feature request related to a problem? Please describe.
The current voice input implementation is only supported by Safari. Also, it is highly unreliable due to constant typos and lack of bilingual or multilingual support. For instance, I'm studying foreign languages with the help of LLMs, and I'm unable to ask questions with both English and a second language. OpenAI has the ability to do that with Whisper model and it has been extremely helpful.

Describe the solution you'd like

  1. The ability to download and run Whisper models (different size, e.g. base, medium, large)
    • or separate Whisper as another Docker container
  2. Access microphone with browser's built-in API
  3. Transcribe the voice via Whisper
  4. Submit the transcription to the input box, or directly send the text to ollama for inference

Describe alternatives you've considered
N/A

Additional context
Whisper definitely outperforms OS's built-in voice input (with automatic punctuation, seamless multi-language support, great coverage of rare words).

Currently, only Safari integrate speech-to-text at OS level, making the voice input button useless on non-Apple devices or Chromium-based/Gecko-based browsers.

This is an extension to the feature request #49

Originally created by @adan89lion on GitHub (Jan 3, 2024). **Is your feature request related to a problem? Please describe.** The current voice input implementation is only supported by Safari. Also, it is highly unreliable due to constant typos and lack of bilingual or multilingual support. For instance, I'm studying foreign languages with the help of LLMs, and I'm unable to ask questions with _both_ English and a second language. OpenAI has the ability to do that with Whisper model and it has been extremely helpful. **Describe the solution you'd like** 1. The ability to download and run Whisper models (different size, e.g. base, medium, large) * or separate Whisper as another Docker container 2. Access microphone with browser's built-in API 3. Transcribe the voice via Whisper 4. Submit the transcription to the input box, or directly send the text to ollama for inference **Describe alternatives you've considered** N/A **Additional context** Whisper definitely outperforms OS's built-in voice input (with automatic punctuation, seamless multi-language support, great coverage of rare words). Currently, only Safari integrate speech-to-text at OS level, making the voice input button useless on non-Apple devices or Chromium-based/Gecko-based browsers. This is an extension to the feature request #49
Author
Owner

@tjbck commented on GitHub (Jan 4, 2024):

Hi, Thanks for the feature request! FYI, voice input feature should work with chrome on non-apple devices as well, so if you're facing issues with chromium based browsers, please let us know! As for the feature request, I'll take a look in the near future and assess it's usability/feasibility. Thanks!

@tjbck commented on GitHub (Jan 4, 2024): Hi, Thanks for the feature request! FYI, voice input feature should work with chrome on non-apple devices as well, so if you're facing issues with chromium based browsers, please let us know! As for the feature request, I'll take a look in the near future and assess it's usability/feasibility. Thanks!
Author
Owner

@ThatOneCalculator commented on GitHub (Jan 4, 2024):

For me it works on Chromium and Firefox on Linux. No clue what OP is talking about with the whole "only safari" nonsense.

@ThatOneCalculator commented on GitHub (Jan 4, 2024): For me it works on Chromium and Firefox on Linux. No clue what OP is talking about with the whole "only safari" nonsense.
Author
Owner

@adan89lion commented on GitHub (Jan 4, 2024):

@ThatOneCalculator I just confirmed that voice input works on Chrome now. The reason it wasn't working is because the lack of a valid SSL certificate (I'm running on localhost all the time). Using an HTTPS connection successfully let Chrome prompt me for microphone permission.

As for Firefox, I'm not sure why it still don't work. The browser didn't prompt me to enable microphone access. I've:

  1. Checked Firefox has microphone access in macOS Settings
  2. Tested on ther websites (like Google Voice Search) can invoke Firefox microphone permission dialogue

Screenshot 2024-01-04 at 12 05 33@2x

@adan89lion commented on GitHub (Jan 4, 2024): @ThatOneCalculator I just confirmed that voice input works on Chrome now. The reason it wasn't working is because the lack of a valid SSL certificate (I'm running on localhost all the time). Using an HTTPS connection successfully let Chrome prompt me for microphone permission. As for Firefox, I'm not sure why it still don't work. The browser didn't prompt me to enable microphone access. I've: 1. Checked Firefox has microphone access in macOS Settings 2. Tested on ther websites (like Google Voice Search) can invoke Firefox microphone permission dialogue ![Screenshot 2024-01-04 at 12 05 33@2x](https://github.com/ollama-webui/ollama-webui/assets/6585644/e959f57c-955a-41db-b0b4-b846ed85fba4)
Author
Owner

@tarbard commented on GitHub (Jan 4, 2024):

Chrome has a default security policy where it doesn't allow microphone usage for non HTTPS sites, another workaround is that you can whitelist insecure sites within chrome settings.

@tarbard commented on GitHub (Jan 4, 2024): Chrome has a default security policy where it doesn't allow microphone usage for non HTTPS sites, another workaround is that you can whitelist insecure sites within chrome settings.
Author
Owner

@coder543 commented on GitHub (Jan 9, 2024):

I would love for this option to exist just because Whisper is so much more accurate.

@coder543 commented on GitHub (Jan 9, 2024): I would love for this option to exist just because Whisper is so much more accurate.
Author
Owner

@justinh-rahb commented on GitHub (Jan 18, 2024):

Related: #126

@justinh-rahb commented on GitHub (Jan 18, 2024): Related: #126
Author
Owner

@tjbck commented on GitHub (Jan 21, 2024):

Personal major blocker for this at the moment is how convoluted the whisper installation process can be to have it run locally, so I'm hoping Ollama team would include whisper.cpp to their project. It would streamline the whole installation process for everyone, and would align better with our project ethos of local first approach.

@tjbck commented on GitHub (Jan 21, 2024): Personal major blocker for this at the moment is how convoluted the whisper installation process can be to have it run locally, so I'm hoping Ollama team would include whisper.cpp to their project. It would streamline the whole installation process for everyone, and would align better with our project ethos of local first approach.
Author
Owner

@Collected5353 commented on GitHub (Jan 24, 2024):

https://github.com/oobabooga/text-generation-webui/blob/main/extensions/whisper_stt/script.py

We would need to add some packages to the docker image as well or your local system. I still think this is the way to go for audio input: webkitSpeechRecognition

@tjbck Looks like whispper.cpp has a WASM available: https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.wasm

Might need to add a line to the documentation that it requires SSL for most browser security as others have mentioned here. Something like ngrok free plan can work.

https://github.com/oobabooga/text-generation-webui/blob/main/extensions/coqui_tts/script.py

Is using Coqui, does anyone hav experience using Coqui_tts or silero_tts?

We could pull down and container in the compose also to just use a tts from the api.
https://docs.coqui.ai/en/latest/marytts.html

@Collected5353 commented on GitHub (Jan 24, 2024): https://github.com/oobabooga/text-generation-webui/blob/main/extensions/whisper_stt/script.py We would need to add some packages to the docker image as well or your local system. I still think this is the way to go for audio input: webkitSpeechRecognition @tjbck Looks like whispper.cpp has a WASM available: https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.wasm Might need to add a line to the documentation that it requires SSL for most browser security as others have mentioned here. Something like ngrok free plan can work. https://github.com/oobabooga/text-generation-webui/blob/main/extensions/coqui_tts/script.py Is using Coqui, does anyone hav experience using Coqui_tts or silero_tts? We could pull down and container in the compose also to just use a tts from the api. https://docs.coqui.ai/en/latest/marytts.html
Author
Owner

@tjbck commented on GitHub (Feb 11, 2024):

image

Whisper support has been added with #707! FYI, whisper STT will only work over https and It'll take some time to download the whisper model when you first use it so please give it a minute or two to finish downloading! Let me know if you guys encounter any issues, thanks!

@tjbck commented on GitHub (Feb 11, 2024): ![image](https://github.com/ollama-webui/ollama-webui/assets/25473318/385149e2-6492-4716-9bc0-e50a78b451df) Whisper support has been added with #707! FYI, whisper STT will only work over `https` and It'll take some time to download the whisper model when you first use it so please give it a minute or two to finish downloading! Let me know if you guys encounter any issues, thanks!
Author
Owner

@tino926 commented on GitHub (Mar 4, 2024):

@tjbck
how can i use open webui over https? I follow the instruction at this repo to install the ollama and open-webui docker on a computer. If i connect to open-webui from another computer with https, is always show message like:

Secure Connection Failed

An error occurred during a connection to xxx.xxx.xxx.xxx:3000. SSL received a record that exceeded the maximum permissible length.

Error code: SSL_ERROR_RX_RECORD_TOO_LONG

The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.
Please contact the website owners to inform them of this problem.
@tino926 commented on GitHub (Mar 4, 2024): @tjbck how can i use open webui over https? I follow the instruction at this repo to install the ollama and open-webui docker on a computer. If i connect to open-webui from another computer with https, is always show message like: Secure Connection Failed An error occurred during a connection to xxx.xxx.xxx.xxx:3000. SSL received a record that exceeded the maximum permissible length. Error code: SSL_ERROR_RX_RECORD_TOO_LONG The page you are trying to view cannot be shown because the authenticity of the received data could not be verified. Please contact the website owners to inform them of this problem.
Author
Owner

@justinh-rahb commented on GitHub (Mar 4, 2024):

The truth is that there is no simple way to set up HTTPS for a web application that does not require careful attention and expertise of the user. "Easy" HTTPS requires exposure to the Internet in most cases. Running a web application exposed to the internet is a serious responsibility, and it is essential to take all necessary precautions to ensure the integrity and confidentiality of your data and systems. Providing simplified instructions for setting up HTTPS can be irresponsible in the hands of those that don't fully understand the implications of such.

@justinh-rahb commented on GitHub (Mar 4, 2024): The truth is that there is no simple way to set up HTTPS for a web application that does not require careful attention and expertise of the user. "Easy" HTTPS requires exposure to the Internet in most cases. Running a web application exposed to the internet is a serious responsibility, and it is essential to take all necessary precautions to ensure the integrity and confidentiality of your data and systems. Providing simplified instructions for setting up HTTPS can be irresponsible in the hands of those that don't fully understand the implications of such.
Author
Owner

@phyzical commented on GitHub (May 19, 2024):

Just curious if support for using a external hosted whisper container will be added one day? just as i run one for other stuff, feels redundant to also have the model in webui, maybe inversely if webui exposed whisper to be used by other services?

@phyzical commented on GitHub (May 19, 2024): Just curious if support for using a external hosted whisper container will be added one day? just as i run one for other stuff, feels redundant to also have the model in webui, maybe inversely if webui exposed whisper to be used by other services?
Author
Owner

@coder543 commented on GitHub (May 19, 2024):

@phyzical out of curiosity, which whisper container do you use (to be clear, I have not contributed to open-webui, but I am curious about a whisper server)

@coder543 commented on GitHub (May 19, 2024): @phyzical out of curiosity, which whisper container do you use (to be clear, I have not contributed to open-webui, but I am curious about a whisper server)
Author
Owner

@malteneuss commented on GitHub (May 19, 2024):

@justinh-rahb Just wanted to further promote Let's encrypt "DNS-01" challenges to get browser-trusted https certificates for local self-host services without internet exposure. The nice overview can be found by "Wolfgangs channel" at https://www.youtube.com/watch?v=qlcVx-k-02E.

Also i would like to promote NixOS, which is gaining a lot of traction and has one of the easiest ways to setup Let's encrypt using a Nginx reverse proxy i found so far. The rough amount of config code, once you have some familiarity, is:

{
  services.nginx = {
    enable = true;

    virtualHosts."subdomain.example.org" = {
      # use Let's encrypt https certificate defined below 
      useACMEHost = "example.org";
      forceSSL = true;
      locations."/" = {
         # usual nginx forwarding.
      };
    };
  };

  security.acme.acceptTerms = true;
  security.acme.defaults.email = "info@example.org";

  # Let NixOS generate a Let's Encrypt certificate that we can reuse
  # above for several virtualhosts above.
  security.acme.certs."example.org" = {
    domain = "example.org";
    extraDomainNames = [ "subdomain.example.org" ];
    # Nix uses Go library LEGO DNS to perform the DNS challence
    dnsProvider = "<your-dns-provider>";
    dnsPropagationCheck = true;
  };
}

See "Minimal Private Local LAN Server Example" at https://wiki.nixos.org/wiki/Nginx for more details.

Furthermore NixOS already built-in Ollama as a web service support. The amount of code to enable an Ollama service is:

 services.ollama = {
    enable = true;
    # Make it available to the local network.
    listenAddress = "0.0.0.0:11434";
  };
@malteneuss commented on GitHub (May 19, 2024): @justinh-rahb Just wanted to further promote Let's encrypt "DNS-01" challenges to get browser-trusted https certificates for local self-host services without internet exposure. The nice overview can be found by "Wolfgangs channel" at https://www.youtube.com/watch?v=qlcVx-k-02E. Also i would like to promote NixOS, which is gaining a lot of traction and has one of the easiest ways to setup Let's encrypt using a Nginx reverse proxy i found so far. The rough amount of config code, once you have some familiarity, is: ```nix { services.nginx = { enable = true; virtualHosts."subdomain.example.org" = { # use Let's encrypt https certificate defined below useACMEHost = "example.org"; forceSSL = true; locations."/" = { # usual nginx forwarding. }; }; }; security.acme.acceptTerms = true; security.acme.defaults.email = "info@example.org"; # Let NixOS generate a Let's Encrypt certificate that we can reuse # above for several virtualhosts above. security.acme.certs."example.org" = { domain = "example.org"; extraDomainNames = [ "subdomain.example.org" ]; # Nix uses Go library LEGO DNS to perform the DNS challence dnsProvider = "<your-dns-provider>"; dnsPropagationCheck = true; }; } ``` See "Minimal Private Local LAN Server Example" at https://wiki.nixos.org/wiki/Nginx for more details. Furthermore NixOS already built-in Ollama as a web service support. The amount of code to enable an Ollama service is: ```nix services.ollama = { enable = true; # Make it available to the local network. listenAddress = "0.0.0.0:11434"; }; ```
Author
Owner

@phyzical commented on GitHub (May 20, 2024):

@coder543 https://github.com/ahmetoner/whisper-asr-webservice

@phyzical commented on GitHub (May 20, 2024): @coder543 https://github.com/ahmetoner/whisper-asr-webservice
Author
Owner

@mikael1234 commented on GitHub (Sep 25, 2024):

Is there some settings for Whisper? Its producing random garbage in random languages in Chrome. Same with Web API

@mikael1234 commented on GitHub (Sep 25, 2024): Is there some settings for Whisper? Its producing random garbage in random languages in Chrome. Same with Web API
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#145