feat: voice input #19

Closed
opened 2025-11-11 14:02:13 -06:00 by GiteaMirror · 4 comments
Owner

Originally created by @honeyspoon on GitHub (Nov 2, 2023).

I would like to be able to use my voice as an input.
I don't really need the text to speech from the ai.
Just being able to talk to it.

use case is for language learning.

Inteface with a local whisper model.
Add a microphone button next to the input box.
When clicked, you would hear a sound to start recording.
It would then live transcribe your text in the chatbox.
After 2 seconds of silence it would send the prompt to ollama.

This project is able to interface with a local whisper do voice to text in a web app.
https://github.com/mayeaux/generate-subtitles

Originally created by @honeyspoon on GitHub (Nov 2, 2023). I would like to be able to use my voice as an input. I don't really need the text to speech from the ai. Just being able to talk to it. use case is for language learning. Inteface with a local whisper model. Add a microphone button next to the input box. When clicked, you would hear a sound to start recording. It would then live transcribe your text in the chatbox. After 2 seconds of silence it would send the prompt to ollama. This project is able to interface with a local whisper do voice to text in a web app. https://github.com/mayeaux/generate-subtitles
Author
Owner

@tjbck commented on GitHub (Nov 3, 2023):

Looks interesting, I'll think of ways to incorporate into the web UI when I have more time as it seems like it might take some time to get the implementation right. Thanks for the idea.

@tjbck commented on GitHub (Nov 3, 2023): Looks interesting, I'll think of ways to incorporate into the web UI when I have more time as it seems like it might take some time to get the implementation right. Thanks for the idea.
Author
Owner

@tjbck commented on GitHub (Nov 11, 2023):

Hi, Just merged #90 to main, you should have the voice recognition support turned on by default now.

For your specific use case, you can enable the speech auto-send function by going to Settings > Addons and clicking on the button right next to the 'Speech Auto-Send' label to toggle.

image

Let me know if you encounter any issues with the feature. Thanks!

@tjbck commented on GitHub (Nov 11, 2023): Hi, Just merged #90 to main, you should have the voice recognition support turned on by default now. For your specific use case, you can enable the speech auto-send function by going to Settings > Addons and clicking on the button right next to the 'Speech Auto-Send' label to toggle. <img width="718" alt="image" src="https://github.com/ollama-webui/ollama-webui/assets/25473318/c39fe526-6ee7-41c2-941d-3b36daf0787f"> Let me know if you encounter any issues with the feature. Thanks!
Author
Owner

@honeyspoon commented on GitHub (Nov 20, 2023):

Did not know the browser had a an integrated speech api.
How does it compare to whisper?
Now that this feature is in I might try too look at running it against a local server running whisper.
I wonder if something like ollama exists for whisper

@honeyspoon commented on GitHub (Nov 20, 2023): Did not know the browser had a an integrated speech api. How does it compare to whisper? Now that this feature is in I might try too look at running it against a local server running whisper. I wonder if something like ollama exists for whisper
Author
Owner

@0x07CB commented on GitHub (Apr 27, 2024):

Did not know the browser had a an integrat

Whisper is great , and Vosk can help too ( and vosk can return srt data for use to generate sub-titles for audio/video media file ).
Whisper and Vosk can be use with python ( I have not read the code for now but I see the repo open-webui/open-webui is partially written in python. )

So, if backend use python you have choice to made a good STT feature. ( probably I have wrong, I have not check this repo, I have start to try just now. )

@0x07CB commented on GitHub (Apr 27, 2024): > Did not know the browser had a an integrat `Whisper` is great , and `Vosk` can help too ( and vosk can return srt data for use to generate sub-titles for audio/video media file ). Whisper and Vosk can be use with python ( I have not read the code for now but I see the repo `open-webui/open-webui` is partially written in python. ) So, if backend use python you have choice to made a good STT feature. ( probably I have wrong, I have not check this repo, I have start to try just now. )
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#19