mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
[GH-ISSUE #16451] issue: audio playback does not begin until the assistant finishes generating the entire message #17908
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @QuantumFlux21 on GitHub (Aug 10, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/16451
Check Existing Issues
Installation Method
Docker
Open WebUI Version
0.6.21
Ollama Version (if applicable)
No response
Operating System
ubuntu 24.04
Browser (if applicable)
chrome Version 138.0.7204.184
Confirmation
README.md.Expected Behavior
When Response splitting is set to Punctuation or Paragraphs and Autoplay is On, audio should start during generation.
Completed chunks (for example, sentences) should be sent to the TTS provider as they become available so playback can begin earlier.
Actual Behavior
ElevenLabs TTS requests are sent only after the final message text is available.
Audio playback starts only after text generation completes.
The Response splitting setting does not lead to sentence‑by‑sentence playback during generation.
Steps to Reproduce
Start Open WebUI with ElevenLabs configured and Autoplay enabled.
In Settings > Audio/TTS select Response splitting: Punctuation.
Ask the assistant for a multi‑sentence response.
Observe that the text streams in the chat UI.
Watch the Network panel in browser devtools to see when ElevenLabs TTS requests are sent.
Logs & Screenshots
Additional Information
Suggested fix direction:
When Response splitting is enabled, send partial chunks to ElevenLabs as they become available, or use the ElevenLabs streaming API so audio can begin mid‑generation.
If external providers cannot support this, clarify in the UI that mid‑generation playback is not available for the selected provider.
If you need any logs please let me know and I'll provide them.
@tjbck commented on GitHub (Aug 10, 2025):
Intended behaviour, response splitting is only utilized during voice chat.