issue: audio playback does not begin until the assistant finishes generating the entire message #6021

New Issue

GiteaMirror · 2025-11-11T16:42:34-06:00

GiteaMirror commented

2025-11-11 16:42:34 -06:00

Originally created by @QuantumFlux21 on GitHub (Aug 10, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.6.21

Ollama Version (if applicable)

No response

Operating System

ubuntu 24.04

Browser (if applicable)

chrome Version 138.0.7204.184

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When Response splitting is set to Punctuation or Paragraphs and Autoplay is On, audio should start during generation.
Completed chunks (for example, sentences) should be sent to the TTS provider as they become available so playback can begin earlier.

Actual Behavior

ElevenLabs TTS requests are sent only after the final message text is available.
Audio playback starts only after text generation completes.
The Response splitting setting does not lead to sentence‑by‑sentence playback during generation.

Steps to Reproduce

Start Open WebUI with ElevenLabs configured and Autoplay enabled.
In Settings > Audio/TTS select Response splitting: Punctuation.
Ask the assistant for a multi‑sentence response.
Observe that the text streams in the chat UI.
Watch the Network panel in browser devtools to see when ElevenLabs TTS requests are sent.

Logs & Screenshots

Additional Information

Suggested fix direction:

When Response splitting is enabled, send partial chunks to ElevenLabs as they become available, or use the ElevenLabs streaming API so audio can begin mid‑generation.
If external providers cannot support this, clarify in the UI that mid‑generation playback is not available for the selected provider.

If you need any logs please let me know and I'll provide them.

Originally created by @QuantumFlux21 on GitHub (Aug 10, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.6.21 ### Ollama Version (if applicable) _No response_ ### Operating System ubuntu 24.04 ### Browser (if applicable) chrome Version 138.0.7204.184 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When Response splitting is set to Punctuation or Paragraphs and Autoplay is On, audio should start during generation. Completed chunks (for example, sentences) should be sent to the TTS provider as they become available so playback can begin earlier. ### Actual Behavior ElevenLabs TTS requests are sent only after the final message text is available. Audio playback starts only after text generation completes. The Response splitting setting does not lead to sentence‑by‑sentence playback during generation. ### Steps to Reproduce Start Open WebUI with ElevenLabs configured and Autoplay enabled. In Settings > Audio/TTS select Response splitting: Punctuation. Ask the assistant for a multi‑sentence response. Observe that the text streams in the chat UI. Watch the Network panel in browser devtools to see when ElevenLabs TTS requests are sent. ### Logs & Screenshots <img width="3004" height="1668" alt="Image" src="https://github.com/user-attachments/assets/aa86e20c-0aa4-461e-8c1e-17c0a67559ef" /> ### Additional Information Suggested fix direction: When Response splitting is enabled, send partial chunks to ElevenLabs as they become available, or use the ElevenLabs streaming API so audio can begin mid‑generation. If external providers cannot support this, clarify in the UI that mid‑generation playback is not available for the selected provider. If you need any logs please let me know and I'll provide them.

GiteaMirror added the bug label 2025-11-11 16:42:34 -06:00

GiteaMirror closed this issue

2025-11-11 16:42:34 -06:00

GiteaMirror commented

2025-11-11 16:42:35 -06:00

@tjbck commented on GitHub (Aug 10, 2025):

Intended behaviour, response splitting is only utilized during voice chat.

@tjbck commented on GitHub (Aug 10, 2025): Intended behaviour, response splitting is only utilized during voice chat.

GiteaMirror referenced this issue

2026-04-19 20:39:00 -05:00

[GH-ISSUE #6021] issue: markdown content being duplicated in TTS #14211

GiteaMirror referenced this issue