issue: Partial Audio Playback in Voice Call Feature #4404

Closed
opened 2025-11-11 15:53:15 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @Shadowslayer1321 on GitHub (Mar 12, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.5.20

Ollama Version (if applicable)

No response

Operating System

Ubuntu 22.04

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have listed steps to reproduce the bug in detail.

Expected Behavior

The entire AI response should be spoken by the TTS engine from beginning to end during voice calls.

Actual Behavior

Only the latter portion of the AI's response is played back via TTS during voice calls. The beginning is consistently cut off. Ex out of a 5 sentence response AI only reads last 2 sentences.

Steps to Reproduce

  1. Set up Open WebUI to use an external TTS service.
  2. Configure TTS URL in Open WebUI settings.
  3. Initiate a voice call.
  4. Ask a question that elicits a multi-sentence response.
  5. Observe audio playback - only the latter part is heard.]

Logs & Screenshots

Logs from the browser console-
CallOverlay.svelte:322 🔊 Sound detected
CallOverlay.svelte:232 Recording started
CallOverlay.svelte:322 🔊 Sound detected
CallOverlay.svelte:341 🔇 Silence detected
CallOverlay.svelte:243 Recording stopped MediaStream {id: 'a880c1fc-dd93-4a4a-9600-24917bc94888', active: true, onaddtrack: null, onremovetrack: null, onactive: null, …} Event {isTrusted: true, type: 'stop', target: MediaRecorder, currentTarget: MediaRecorder, eventPhase: 2, …}
CallOverlay.svelte:173 🚨 stopRecordingCallback 🚨
CallOverlay.svelte:295 🔊 Sound detection started 1741812304288 false
CallOverlay.svelte:162 doing this for a test. Testing, one, two, three.
Chat.svelte:1227 submitPrompt doing this for a test. Testing, one, two, three. 974bd547-1995-4348-996d-737d675a5552
Chat.svelte:183 saveSessionSelectedModels ['maya'] ["maya"]
ResponseMessage.svelte:523 <div class=​"flex justify-start overflow-x-auto buttons text-gray-600 dark:​text-gray-500 mt-0.5 svelte-1u5gq5j">​​flex
Chat.svelte:1396 modelId maya
CallOverlay.svelte:567 Received chat start event for message ID fa5089c7-b279-4179-82a9-5aa12b284d39
+layout.svelte:100 usage {models: Array(1)}
Chat.svelte:1637 {status: true, task_id: 'dc779552-9637-4e89-904b-a84f9745be9c'}
CallOverlay.svelte:166 undefined
+layout.svelte:100 usage {models: Array(1)}
Chat.svelte:238 {chat_id: '974bd547-1995-4348-996d-737d675a5552', message_id: 'fa5089c7-b279-4179-82a9-5aa12b284d39', data: {…}}
CallOverlay.svelte:589 Received chat event for message ID fa5089c7-b279-4179-82a9-5aa12b284d39: well, consider me still playing along! What's the next phase of the experiment?
CallOverlay.svelte:598 well, consider me still playing along! What's the next phase of the experiment?
Chat.svelte:1216 {content: '"Testing, one, two, three... Testing, one, two, th… we officially done with the sound check now? 😉'}
Chat.svelte:238 {chat_id: '974bd547-1995-4348-996d-737d675a5552', message_id: 'fa5089c7-b279-4179-82a9-5aa12b284d39', data: {…}}
Chat.svelte:1216 {id: 'google.gemini-2.0-flash-thinking-exp-01-21-ecfe2a01-d11c-41b1-9c70-4437472e5102', created: 1741812309, model: 'google.gemini-2.0-flash-thinking-exp-01-21', choices: Array(1), object: 'chat.completion.chunk'}
Chat.svelte:238 {chat_id: '974bd547-1995-4348-996d-737d675a5552', message_id: 'fa5089c7-b279-4179-82a9-5aa12b284d39', data: {…}}
CallOverlay.svelte:589 Received chat event for message ID fa5089c7-b279-4179-82a9-5aa12b284d39: Or are we officially done with the sound check now?
CallOverlay.svelte:598 Or are we officially done with the sound check now?
Chat.svelte:1216 {done: true, content: '"Testing, one, two, three... Testing, one, two, th… we officially done with the sound check now? 😉', title: 'Audio Check 👍 Clear'}
CallOverlay.svelte:546 Audio for "well, consider me still playing along! What's the next phase of the experiment?" not yet available in the cache, re-queued...
CallOverlay.svelte:527 Playing audio for content: well, consider me still playing along! What's the next phase of the experiment?
CallOverlay.svelte:535 Played audio for content: well, consider me still playing along! What's the next phase of the experiment?
CallOverlay.svelte:527 Playing audio for content: Or are we officially done with the sound check now?
CallOverlay.svelte:535 Played audio for content: Or are we officially done with the sound check now?
CallOverlay.svelte:558 Audio monitoring and playing stopped for message ID fa5089c7-b279-4179-82a9-5aa12b284d39

Docker Logs for open-webui container

"2025-03-12 16:52:55 2025-03-12 20:52:55.458 | INFO | open_webui.routers.audio:transcription:626 - file.content_type: audio/wav - {}
2025-03-12 16:52:55 2025-03-12 20:52:55.459 | INFO | open_webui.routers.audio:transcribe:470 - transcribe: /app/backend/data/cache/audio/transcriptions/2c2fb3cb-0f3a-4d93-ad4d-6e23218e784d.wav - {}
2025-03-12 16:52:57 2025-03-12 20:52:57.498 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 200 - {}
2025-03-12 16:52:58 2025-03-12 20:52:58.205 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/chats/974bd547-1995-4348-996d-737d675a5552 HTTP/1.1" 200 - {}
2025-03-12 16:52:58 2025-03-12 20:52:58.754 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-03-12 16:52:59 2025-03-12 20:52:59.365 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/memories/query HTTP/1.1" 200 - {}
2025-03-12 16:52:59 2025-03-12 20:52:59.928 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {}
2025-03-12 16:53:00 /usr/local/lib/python3.11/site-packages/pydantic/main.py:1630: RuntimeWarning: fields may not start with an underscore, ignoring "event_emitter"
2025-03-12 16:53:00 warnings.warn(f'fields may not start with an underscore, ignoring "{f_name}"', RuntimeWarning)
2025-03-12 16:53:00 2025-03-12 20:53:00.330 | INFO | open_webui.utils.middleware:process_chat_payload:785 - tools={'get_video_transcript': {'toolkit_id': 'youtube_video_transcript', 'callable': functools.partial(<bound method Tools.get_video_transcript of <tool_youtube_video_transcript.Tools object at 0x7f703cf31cd0>>, event_emitter=<function get_event_emitter..event_emitter at 0x7f70807477e0>), 'spec': {'name': 'get_video_transcript', 'description': '\n Retrieves the transcript for a YouTube video given the video URL.\n ', 'parameters': {'properties': {'url': {'description': 'The URL of the YouTube video.', 'type': 'string'}}, 'required': ['url'], 'type': 'object'}}, 'pydantic_model': <class 'open_webui.utils.tools.get_video_transcript'>, 'file_handler': False, 'citation': False}} - {}
2025-03-12 16:53:00 2025-03-12 20:53:00.330 | INFO | open_webui.utils.middleware:chat_completion_tools_handler:159 - tools_function_calling_prompt='Available Tools: [{"name": "get_video_transcript", "description": "\n Retrieves the transcript for a YouTube video given the video URL.\n ", "parameters": {"properties": {"url": {"description": "The URL of the YouTube video.", "type": "string"}}, "required": ["url"], "type": "object"}}]\n\nYour task is to choose and return the correct tool(s) from the list of available tools based on the query. Follow these guidelines:\n\n- Return only the JSON object, without any additional text or explanation.\n\n- If no tools match the query, return an empty array: \n {\n "tool_calls": []\n }\n\n- If one or more tools match the query, construct a JSON response containing a "tool_calls" array with objects that include:\n - "name": The tool's name.\n - "parameters": A dictionary of required parameters and their corresponding values.\n\nThe format for the JSON response is strictly:\n{\n "tool_calls": [\n {"name": "toolName1", "parameters": {"key1": "value1"}},\n {"name": "toolName2", "parameters": {"key2": "value2"}}\n ]\n}' - {}
2025-03-12 16:53:01 filter_functions=[]
2025-03-12 16:53:01 2025-03-12 20:53:01.396 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/chat/completions HTTP/1.1" 200 - {}
2025-03-12 16:53:01 2025-03-12 20:53:01.412 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-03-12 16:53:03 2025-03-12 20:53:03.123 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/chat/completed HTTP/1.1" 200 - {}
2025-03-12 16:53:03 2025-03-12 20:53:03.143 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/chats/974bd547-1995-4348-996d-737d675a5552 HTTP/1.1" 200 - {}
2025-03-12 16:53:03 2025-03-12 20:53:03.155 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-03-12 16:53:04 2025-03-12 20:53:04.075 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/audio/speech HTTP/1.1" 200 - {}
2025-03-12 16:53:04 2025-03-12 20:53:04.390 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/audio/speech HTTP/1.1" 200 - {}
"

Additional Information

Hypothesis-
Take this with a grain of salt, but based on the observed behavior and logs, the issue appears to be a client-side timing or buffering problem within Open WebUI's JavaScript code, specifically in the CallOverlay.svelte component (or related audio playback logic).

The hypothesis is that Open WebUI might be attempting to start audio playback (audio.play()) too early, before the browser has fully buffered the audio data from the external TTS service response. This could lead to the browser starting playback from a point within the audio file, resulting in the partial audio playback.

The "Audio for ... not yet available in the cache, re-queued..." console message in Open WebUI further supports this timing/buffering hypothesis.

Originally created by @Shadowslayer1321 on GitHub (Mar 12, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.5.20 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 22.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior The entire AI response should be spoken by the TTS engine from beginning to end during voice calls. ### Actual Behavior Only the latter portion of the AI's response is played back via TTS during voice calls. The beginning is consistently cut off. Ex out of a 5 sentence response AI only reads last 2 sentences. ### Steps to Reproduce 1. Set up Open WebUI to use an external TTS service. 2. Configure TTS URL in Open WebUI settings. 3. Initiate a voice call. 4. Ask a question that elicits a multi-sentence response. 5. Observe audio playback - only the latter part is heard.] ### Logs & Screenshots Logs from the browser console- CallOverlay.svelte:322 🔊 Sound detected CallOverlay.svelte:232 Recording started CallOverlay.svelte:322 🔊 Sound detected CallOverlay.svelte:341 🔇 Silence detected CallOverlay.svelte:243 Recording stopped MediaStream {id: 'a880c1fc-dd93-4a4a-9600-24917bc94888', active: true, onaddtrack: null, onremovetrack: null, onactive: null, …} Event {isTrusted: true, type: 'stop', target: MediaRecorder, currentTarget: MediaRecorder, eventPhase: 2, …} CallOverlay.svelte:173 🚨 stopRecordingCallback 🚨 CallOverlay.svelte:295 🔊 Sound detection started 1741812304288 false CallOverlay.svelte:162 doing this for a test. Testing, one, two, three. Chat.svelte:1227 submitPrompt doing this for a test. Testing, one, two, three. 974bd547-1995-4348-996d-737d675a5552 Chat.svelte:183 saveSessionSelectedModels ['maya'] ["maya"] ResponseMessage.svelte:523 <div class=​"flex justify-start overflow-x-auto buttons text-gray-600 dark:​text-gray-500 mt-0.5 svelte-1u5gq5j">​</div>​flex Chat.svelte:1396 modelId maya CallOverlay.svelte:567 Received chat start event for message ID fa5089c7-b279-4179-82a9-5aa12b284d39 +layout.svelte:100 usage {models: Array(1)} Chat.svelte:1637 {status: true, task_id: 'dc779552-9637-4e89-904b-a84f9745be9c'} CallOverlay.svelte:166 undefined +layout.svelte:100 usage {models: Array(1)} Chat.svelte:238 {chat_id: '974bd547-1995-4348-996d-737d675a5552', message_id: 'fa5089c7-b279-4179-82a9-5aa12b284d39', data: {…}} CallOverlay.svelte:589 Received chat event for message ID fa5089c7-b279-4179-82a9-5aa12b284d39: well, consider me still playing along! What's the next phase of the experiment? CallOverlay.svelte:598 well, consider me still playing along! What's the next phase of the experiment? Chat.svelte:1216 {content: '"Testing, one, two, three... Testing, one, two, th… we officially done with the sound check now? 😉'} Chat.svelte:238 {chat_id: '974bd547-1995-4348-996d-737d675a5552', message_id: 'fa5089c7-b279-4179-82a9-5aa12b284d39', data: {…}} Chat.svelte:1216 {id: 'google.gemini-2.0-flash-thinking-exp-01-21-ecfe2a01-d11c-41b1-9c70-4437472e5102', created: 1741812309, model: 'google.gemini-2.0-flash-thinking-exp-01-21', choices: Array(1), object: 'chat.completion.chunk'} Chat.svelte:238 {chat_id: '974bd547-1995-4348-996d-737d675a5552', message_id: 'fa5089c7-b279-4179-82a9-5aa12b284d39', data: {…}} CallOverlay.svelte:589 Received chat event for message ID fa5089c7-b279-4179-82a9-5aa12b284d39: Or are we officially done with the sound check now? CallOverlay.svelte:598 Or are we officially done with the sound check now? Chat.svelte:1216 {done: true, content: '"Testing, one, two, three... Testing, one, two, th… we officially done with the sound check now? 😉', title: 'Audio Check 👍 Clear'} CallOverlay.svelte:546 Audio for "well, consider me still playing along! What's the next phase of the experiment?" not yet available in the cache, re-queued... CallOverlay.svelte:527 Playing audio for content: well, consider me still playing along! What's the next phase of the experiment? CallOverlay.svelte:535 Played audio for content: well, consider me still playing along! What's the next phase of the experiment? CallOverlay.svelte:527 Playing audio for content: Or are we officially done with the sound check now? CallOverlay.svelte:535 Played audio for content: Or are we officially done with the sound check now? CallOverlay.svelte:558 Audio monitoring and playing stopped for message ID fa5089c7-b279-4179-82a9-5aa12b284d39 Docker Logs for open-webui container "2025-03-12 16:52:55 2025-03-12 20:52:55.458 | INFO | open_webui.routers.audio:transcription:626 - file.content_type: audio/wav - {} 2025-03-12 16:52:55 2025-03-12 20:52:55.459 | INFO | open_webui.routers.audio:transcribe:470 - transcribe: /app/backend/data/cache/audio/transcriptions/2c2fb3cb-0f3a-4d93-ad4d-6e23218e784d.wav - {} 2025-03-12 16:52:57 2025-03-12 20:52:57.498 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 200 - {} 2025-03-12 16:52:58 2025-03-12 20:52:58.205 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/chats/974bd547-1995-4348-996d-737d675a5552 HTTP/1.1" 200 - {} 2025-03-12 16:52:58 2025-03-12 20:52:58.754 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-03-12 16:52:59 2025-03-12 20:52:59.365 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/memories/query HTTP/1.1" 200 - {} 2025-03-12 16:52:59 2025-03-12 20:52:59.928 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {} 2025-03-12 16:53:00 /usr/local/lib/python3.11/site-packages/pydantic/main.py:1630: RuntimeWarning: fields may not start with an underscore, ignoring "__event_emitter__" 2025-03-12 16:53:00 warnings.warn(f'fields may not start with an underscore, ignoring "{f_name}"', RuntimeWarning) 2025-03-12 16:53:00 2025-03-12 20:53:00.330 | INFO | open_webui.utils.middleware:process_chat_payload:785 - tools={'get_video_transcript': {'toolkit_id': 'youtube_video_transcript', 'callable': functools.partial(<bound method Tools.get_video_transcript of <tool_youtube_video_transcript.Tools object at 0x7f703cf31cd0>>, __event_emitter__=<function get_event_emitter.<locals>.__event_emitter__ at 0x7f70807477e0>), 'spec': {'name': 'get_video_transcript', 'description': '\n Retrieves the transcript for a YouTube video given the video URL.\n ', 'parameters': {'properties': {'url': {'description': 'The URL of the YouTube video.', 'type': 'string'}}, 'required': ['url'], 'type': 'object'}}, 'pydantic_model': <class 'open_webui.utils.tools.get_video_transcript'>, 'file_handler': False, 'citation': False}} - {} 2025-03-12 16:53:00 2025-03-12 20:53:00.330 | INFO | open_webui.utils.middleware:chat_completion_tools_handler:159 - tools_function_calling_prompt='Available Tools: [{"name": "get_video_transcript", "description": "\\n Retrieves the transcript for a YouTube video given the video URL.\\n ", "parameters": {"properties": {"url": {"description": "The URL of the YouTube video.", "type": "string"}}, "required": ["url"], "type": "object"}}]\n\nYour task is to choose and return the correct tool(s) from the list of available tools based on the query. Follow these guidelines:\n\n- Return only the JSON object, without any additional text or explanation.\n\n- If no tools match the query, return an empty array: \n {\n "tool_calls": []\n }\n\n- If one or more tools match the query, construct a JSON response containing a "tool_calls" array with objects that include:\n - "name": The tool\'s name.\n - "parameters": A dictionary of required parameters and their corresponding values.\n\nThe format for the JSON response is strictly:\n{\n "tool_calls": [\n {"name": "toolName1", "parameters": {"key1": "value1"}},\n {"name": "toolName2", "parameters": {"key2": "value2"}}\n ]\n}' - {} 2025-03-12 16:53:01 filter_functions=[] 2025-03-12 16:53:01 2025-03-12 20:53:01.396 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/chat/completions HTTP/1.1" 200 - {} 2025-03-12 16:53:01 2025-03-12 20:53:01.412 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-03-12 16:53:03 2025-03-12 20:53:03.123 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/chat/completed HTTP/1.1" 200 - {} 2025-03-12 16:53:03 2025-03-12 20:53:03.143 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/chats/974bd547-1995-4348-996d-737d675a5552 HTTP/1.1" 200 - {} 2025-03-12 16:53:03 2025-03-12 20:53:03.155 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} 2025-03-12 16:53:04 2025-03-12 20:53:04.075 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/audio/speech HTTP/1.1" 200 - {} 2025-03-12 16:53:04 2025-03-12 20:53:04.390 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/audio/speech HTTP/1.1" 200 - {} " ### Additional Information Hypothesis- Take this with a grain of salt, but based on the observed behavior and logs, the issue appears to be a client-side timing or buffering problem within Open WebUI's JavaScript code, specifically in the CallOverlay.svelte component (or related audio playback logic). The hypothesis is that Open WebUI might be attempting to start audio playback (audio.play()) too early, before the browser has fully buffered the audio data from the external TTS service response. This could lead to the browser starting playback from a point within the audio file, resulting in the partial audio playback. The "Audio for ... not yet available in the cache, re-queued..." console message in Open WebUI further supports this timing/buffering hypothesis.
GiteaMirror added the bug label 2025-11-11 15:53:15 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#4404