feat: voice input #19

New Issue

GiteaMirror · 2025-11-11T14:02:13-06:00

GiteaMirror commented

2025-11-11 14:02:13 -06:00

Originally created by @honeyspoon on GitHub (Nov 2, 2023).

I would like to be able to use my voice as an input.
I don't really need the text to speech from the ai.
Just being able to talk to it.

use case is for language learning.

Inteface with a local whisper model.
Add a microphone button next to the input box.
When clicked, you would hear a sound to start recording.
It would then live transcribe your text in the chatbox.
After 2 seconds of silence it would send the prompt to ollama.

This project is able to interface with a local whisper do voice to text in a web app.
https://github.com/mayeaux/generate-subtitles

Originally created by @honeyspoon on GitHub (Nov 2, 2023). I would like to be able to use my voice as an input. I don't really need the text to speech from the ai. Just being able to talk to it. use case is for language learning. Inteface with a local whisper model. Add a microphone button next to the input box. When clicked, you would hear a sound to start recording. It would then live transcribe your text in the chatbox. After 2 seconds of silence it would send the prompt to ollama. This project is able to interface with a local whisper do voice to text in a web app. https://github.com/mayeaux/generate-subtitles

GiteaMirror closed this issue

2025-11-11 14:02:13 -06:00

GiteaMirror commented

2025-11-11 14:02:14 -06:00

@tjbck commented on GitHub (Nov 3, 2023):

Looks interesting, I'll think of ways to incorporate into the web UI when I have more time as it seems like it might take some time to get the implementation right. Thanks for the idea.

@tjbck commented on GitHub (Nov 3, 2023): Looks interesting, I'll think of ways to incorporate into the web UI when I have more time as it seems like it might take some time to get the implementation right. Thanks for the idea.

GiteaMirror commented

2025-11-11 14:02:15 -06:00

@tjbck commented on GitHub (Nov 11, 2023):

Hi, Just merged #90 to main, you should have the voice recognition support turned on by default now.

For your specific use case, you can enable the speech auto-send function by going to Settings > Addons and clicking on the button right next to the 'Speech Auto-Send' label to toggle.

Let me know if you encounter any issues with the feature. Thanks!

@tjbck commented on GitHub (Nov 11, 2023): Hi, Just merged #90 to main, you should have the voice recognition support turned on by default now. For your specific use case, you can enable the speech auto-send function by going to Settings > Addons and clicking on the button right next to the 'Speech Auto-Send' label to toggle. <img width="718" alt="image" src="https://github.com/ollama-webui/ollama-webui/assets/25473318/c39fe526-6ee7-41c2-941d-3b36daf0787f"> Let me know if you encounter any issues with the feature. Thanks!

GiteaMirror commented

2025-11-11 14:02:15 -06:00

@honeyspoon commented on GitHub (Nov 20, 2023):

Did not know the browser had a an integrated speech api.
How does it compare to whisper?
Now that this feature is in I might try too look at running it against a local server running whisper.
I wonder if something like ollama exists for whisper

@honeyspoon commented on GitHub (Nov 20, 2023): Did not know the browser had a an integrated speech api. How does it compare to whisper? Now that this feature is in I might try too look at running it against a local server running whisper. I wonder if something like ollama exists for whisper

GiteaMirror commented

2025-11-11 14:02:15 -06:00

@0x07CB commented on GitHub (Apr 27, 2024):

Did not know the browser had a an integrat

Whisper is great , and Vosk can help too ( and vosk can return srt data for use to generate sub-titles for audio/video media file ).
Whisper and Vosk can be use with python ( I have not read the code for now but I see the repo open-webui/open-webui is partially written in python. )

So, if backend use python you have choice to made a good STT feature. ( probably I have wrong, I have not check this repo, I have start to try just now. )

@0x07CB commented on GitHub (Apr 27, 2024): > Did not know the browser had a an integrat `Whisper` is great , and `Vosk` can help too ( and vosk can return srt data for use to generate sub-titles for audio/video media file ). Whisper and Vosk can be use with python ( I have not read the code for now but I see the repo `open-webui/open-webui` is partially written in python. ) So, if backend use python you have choice to made a good STT feature. ( probably I have wrong, I have not check this repo, I have start to try just now. )