[GH-ISSUE #2241] enh: better voice interactions #12808

Closed
opened 2026-04-19 19:40:40 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @tjbck on GitHub (May 13, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/2241

Originally assigned to: @tjbck on GitHub.

  • voice message recording like imessage
  • siri-esque real time voice interaction
Originally created by @tjbck on GitHub (May 13, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/2241 Originally assigned to: @tjbck on GitHub. - [x] voice message recording like imessage - [x] siri-esque real time voice interaction
GiteaMirror added the enhancementcore labels 2026-04-19 19:40:40 -05:00
Author
Owner

@IIPedro commented on GitHub (May 31, 2024):

Hello! I'm unaware if work has been done about this topic yet, but it would be interesting to see better voice interactions as part of pipelines. Of course, it should be the front-end's job to provide an easy to use and comfortable interface, but I believe both siri-esque interaction and voice calls can be tackled in one go with a new pipeline function. Just like there's a pipe def in pipelines, it would be very useful to have a voice def that takes an audio buffer as input in a previously determined rate and returns another audio buffer, which would be the assistant's voice in the call. That would make voice interactions as versatile as current chat pipelines and would allow for the support of various libraries and APIs in a standardized manner. Thanks!

<!-- gh-comment-id:2143030189 --> @IIPedro commented on GitHub (May 31, 2024): Hello! I'm unaware if work has been done about this topic yet, but it would be interesting to see better voice interactions as part of pipelines. Of course, it should be the front-end's job to provide an easy to use and comfortable interface, but I believe both siri-esque interaction and voice calls can be tackled in one go with a new pipeline function. Just like there's a pipe def in pipelines, it would be very useful to have a voice def that takes an audio buffer as input in a previously determined rate and returns another audio buffer, which would be the assistant's voice in the call. That would make voice interactions as versatile as current chat pipelines and would allow for the support of various libraries and APIs in a standardized manner. Thanks!
Author
Owner

@tjbck commented on GitHub (Jun 8, 2024):

Implemented on dev.

@IIPedro Great suggestions, I'll see what can be done!

<!-- gh-comment-id:2155845193 --> @tjbck commented on GitHub (Jun 8, 2024): Implemented on dev. @IIPedro Great suggestions, I'll see what can be done!
Author
Owner

@darkvertex commented on GitHub (Jun 18, 2024):

@tjbck Is it normal the microphone stays in listening mode permanently until closing the tab even after exiting the Call mode? (Should I file a bug issue for this?)

<!-- gh-comment-id:2176687328 --> @darkvertex commented on GitHub (Jun 18, 2024): @tjbck Is it normal the microphone stays in listening mode permanently until closing the tab even after exiting the Call mode? (Should I file a bug issue for this?)
Author
Owner

@justinh-rahb commented on GitHub (Jun 18, 2024):

I've noted this as well on Chrome/Mac and Chrome/Android.

<!-- gh-comment-id:2176688792 --> @justinh-rahb commented on GitHub (Jun 18, 2024): I've noted this as well on Chrome/Mac and Chrome/Android.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#12808