[GH-ISSUE #21403] feat: stream TTS as soon as text generation starts in voice call #19464

New Issue

GiteaMirror · 2026-04-20T01:56:22-05:00

GiteaMirror commented

2026-04-20 01:56:22 -05:00

Originally created by @iChristGit on GitHub (Feb 14, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/21403

Check Existing Issues

I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

-When just pressing the TTS button its very fast even on very long responses
-When using voice chat option, open-webui waits until the response is complete and then starts the TTS part.

It can be almost instant if you allow it to start at first sentence , very noticeable on large pieces of text.

another closed issue that got no attention:
https://github.com/open-webui/open-webui/issues/14278

Desired Solution you'd like

Allowing streaming of TTS (kokoro tts which is in the docs and works really good)

Alternatives Considered

No response

Additional Context

No response

Originally created by @iChristGit on GitHub (Feb 14, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/21403 ### Check Existing Issues - [x] I have searched for all existing **open AND closed** issues and discussions for similar requests. I have found none that is comparable to my request. ### Verify Feature Scope - [x] I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions. ### Problem Description -When just pressing the TTS button its very fast even on very long responses -When using voice chat option, open-webui waits until the response is complete and then starts the TTS part. It can be almost instant if you allow it to start at first sentence , very noticeable on large pieces of text. another closed issue that got no attention: https://github.com/open-webui/open-webui/issues/14278 ### Desired Solution you'd like Allowing streaming of TTS (kokoro tts which is in the docs and works really good) ### Alternatives Considered _No response_ ### Additional Context _No response_

GiteaMirror closed this issue

2026-04-20 01:56:22 -05:00

GiteaMirror commented

2026-04-20 01:56:23 -05:00

@iChristGit commented on GitHub (Feb 14, 2026):

This was the case in all previous versions, it works on regular chat > TTS button but in voice mode its just always waits for full long response

v0.8.1 still an issue

we wont get the benefit of starting TTS early in open-webui because in chat the button shows after the full response, but once pressed can do in punctuation or in paragraphs, and in voice mode it defaults to wait until the end of LLM response and then starts TTS.

@iChristGit commented on GitHub (Feb 14, 2026): This was the case in all previous versions, it works on regular chat > TTS button but in voice mode its just always waits for full long response v0.8.1 still an issue we wont get the benefit of starting TTS early in open-webui because in chat the button shows after the full response, but once pressed can do in punctuation or in paragraphs, and in voice mode it defaults to wait until the end of LLM response and then starts TTS.

GiteaMirror commented

2026-04-20 01:56:23 -05:00

@iChristGit commented on GitHub (Feb 14, 2026):

It only happens with Voice Call Emoji. solved as far as i am concerned

@iChristGit commented on GitHub (Feb 14, 2026): It only happens with Voice Call Emoji. solved as far as i am concerned