mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-11 00:04:08 -05:00
issue: Partial Audio Playback in Voice Call Feature #4404
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Shadowslayer1321 on GitHub (Mar 12, 2025).
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.5.20
Ollama Version (if applicable)
No response
Operating System
Ubuntu 22.04
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
The entire AI response should be spoken by the TTS engine from beginning to end during voice calls.
Actual Behavior
Only the latter portion of the AI's response is played back via TTS during voice calls. The beginning is consistently cut off. Ex out of a 5 sentence response AI only reads last 2 sentences.
Steps to Reproduce
Logs & Screenshots
Logs from the browser console-
CallOverlay.svelte:322 🔊 Sound detected
CallOverlay.svelte:232 Recording started
CallOverlay.svelte:322 🔊 Sound detected
CallOverlay.svelte:341 🔇 Silence detected
CallOverlay.svelte:243 Recording stopped MediaStream {id: 'a880c1fc-dd93-4a4a-9600-24917bc94888', active: true, onaddtrack: null, onremovetrack: null, onactive: null, …} Event {isTrusted: true, type: 'stop', target: MediaRecorder, currentTarget: MediaRecorder, eventPhase: 2, …}
CallOverlay.svelte:173 🚨 stopRecordingCallback 🚨
CallOverlay.svelte:295 🔊 Sound detection started 1741812304288 false
CallOverlay.svelte:162 doing this for a test. Testing, one, two, three.
Chat.svelte:1227 submitPrompt doing this for a test. Testing, one, two, three. 974bd547-1995-4348-996d-737d675a5552
Chat.svelte:183 saveSessionSelectedModels ['maya'] ["maya"]
ResponseMessage.svelte:523 <div class="flex justify-start overflow-x-auto buttons text-gray-600 dark:text-gray-500 mt-0.5 svelte-1u5gq5j">flex
Chat.svelte:1396 modelId maya
CallOverlay.svelte:567 Received chat start event for message ID fa5089c7-b279-4179-82a9-5aa12b284d39
+layout.svelte:100 usage {models: Array(1)}
Chat.svelte:1637 {status: true, task_id: 'dc779552-9637-4e89-904b-a84f9745be9c'}
CallOverlay.svelte:166 undefined
+layout.svelte:100 usage {models: Array(1)}
Chat.svelte:238 {chat_id: '974bd547-1995-4348-996d-737d675a5552', message_id: 'fa5089c7-b279-4179-82a9-5aa12b284d39', data: {…}}
CallOverlay.svelte:589 Received chat event for message ID fa5089c7-b279-4179-82a9-5aa12b284d39: well, consider me still playing along! What's the next phase of the experiment?
CallOverlay.svelte:598 well, consider me still playing along! What's the next phase of the experiment?
Chat.svelte:1216 {content: '"Testing, one, two, three... Testing, one, two, th… we officially done with the sound check now? 😉'}
Chat.svelte:238 {chat_id: '974bd547-1995-4348-996d-737d675a5552', message_id: 'fa5089c7-b279-4179-82a9-5aa12b284d39', data: {…}}
Chat.svelte:1216 {id: 'google.gemini-2.0-flash-thinking-exp-01-21-ecfe2a01-d11c-41b1-9c70-4437472e5102', created: 1741812309, model: 'google.gemini-2.0-flash-thinking-exp-01-21', choices: Array(1), object: 'chat.completion.chunk'}
Chat.svelte:238 {chat_id: '974bd547-1995-4348-996d-737d675a5552', message_id: 'fa5089c7-b279-4179-82a9-5aa12b284d39', data: {…}}
CallOverlay.svelte:589 Received chat event for message ID fa5089c7-b279-4179-82a9-5aa12b284d39: Or are we officially done with the sound check now?
CallOverlay.svelte:598 Or are we officially done with the sound check now?
Chat.svelte:1216 {done: true, content: '"Testing, one, two, three... Testing, one, two, th… we officially done with the sound check now? 😉', title: 'Audio Check 👍 Clear'}
CallOverlay.svelte:546 Audio for "well, consider me still playing along! What's the next phase of the experiment?" not yet available in the cache, re-queued...
CallOverlay.svelte:527 Playing audio for content: well, consider me still playing along! What's the next phase of the experiment?
CallOverlay.svelte:535 Played audio for content: well, consider me still playing along! What's the next phase of the experiment?
CallOverlay.svelte:527 Playing audio for content: Or are we officially done with the sound check now?
CallOverlay.svelte:535 Played audio for content: Or are we officially done with the sound check now?
CallOverlay.svelte:558 Audio monitoring and playing stopped for message ID fa5089c7-b279-4179-82a9-5aa12b284d39
Docker Logs for open-webui container
"2025-03-12 16:52:55 2025-03-12 20:52:55.458 | INFO | open_webui.routers.audio:transcription:626 - file.content_type: audio/wav - {}
2025-03-12 16:52:55 2025-03-12 20:52:55.459 | INFO | open_webui.routers.audio:transcribe:470 - transcribe: /app/backend/data/cache/audio/transcriptions/2c2fb3cb-0f3a-4d93-ad4d-6e23218e784d.wav - {}
2025-03-12 16:52:57 2025-03-12 20:52:57.498 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 200 - {}
2025-03-12 16:52:58 2025-03-12 20:52:58.205 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/chats/974bd547-1995-4348-996d-737d675a5552 HTTP/1.1" 200 - {}
2025-03-12 16:52:58 2025-03-12 20:52:58.754 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-03-12 16:52:59 2025-03-12 20:52:59.365 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/memories/query HTTP/1.1" 200 - {}
2025-03-12 16:52:59 2025-03-12 20:52:59.928 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/users/user/info/update HTTP/1.1" 200 - {}
2025-03-12 16:53:00 /usr/local/lib/python3.11/site-packages/pydantic/main.py:1630: RuntimeWarning: fields may not start with an underscore, ignoring "event_emitter"
2025-03-12 16:53:00 warnings.warn(f'fields may not start with an underscore, ignoring "{f_name}"', RuntimeWarning)
2025-03-12 16:53:00 2025-03-12 20:53:00.330 | INFO | open_webui.utils.middleware:process_chat_payload:785 - tools={'get_video_transcript': {'toolkit_id': 'youtube_video_transcript', 'callable': functools.partial(<bound method Tools.get_video_transcript of <tool_youtube_video_transcript.Tools object at 0x7f703cf31cd0>>, event_emitter=<function get_event_emitter..event_emitter at 0x7f70807477e0>), 'spec': {'name': 'get_video_transcript', 'description': '\n Retrieves the transcript for a YouTube video given the video URL.\n ', 'parameters': {'properties': {'url': {'description': 'The URL of the YouTube video.', 'type': 'string'}}, 'required': ['url'], 'type': 'object'}}, 'pydantic_model': <class 'open_webui.utils.tools.get_video_transcript'>, 'file_handler': False, 'citation': False}} - {}
2025-03-12 16:53:00 2025-03-12 20:53:00.330 | INFO | open_webui.utils.middleware:chat_completion_tools_handler:159 - tools_function_calling_prompt='Available Tools: [{"name": "get_video_transcript", "description": "\n Retrieves the transcript for a YouTube video given the video URL.\n ", "parameters": {"properties": {"url": {"description": "The URL of the YouTube video.", "type": "string"}}, "required": ["url"], "type": "object"}}]\n\nYour task is to choose and return the correct tool(s) from the list of available tools based on the query. Follow these guidelines:\n\n- Return only the JSON object, without any additional text or explanation.\n\n- If no tools match the query, return an empty array: \n {\n "tool_calls": []\n }\n\n- If one or more tools match the query, construct a JSON response containing a "tool_calls" array with objects that include:\n - "name": The tool's name.\n - "parameters": A dictionary of required parameters and their corresponding values.\n\nThe format for the JSON response is strictly:\n{\n "tool_calls": [\n {"name": "toolName1", "parameters": {"key1": "value1"}},\n {"name": "toolName2", "parameters": {"key2": "value2"}}\n ]\n}' - {}
2025-03-12 16:53:01 filter_functions=[]
2025-03-12 16:53:01 2025-03-12 20:53:01.396 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/chat/completions HTTP/1.1" 200 - {}
2025-03-12 16:53:01 2025-03-12 20:53:01.412 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-03-12 16:53:03 2025-03-12 20:53:03.123 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/chat/completed HTTP/1.1" 200 - {}
2025-03-12 16:53:03 2025-03-12 20:53:03.143 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/chats/974bd547-1995-4348-996d-737d675a5552 HTTP/1.1" 200 - {}
2025-03-12 16:53:03 2025-03-12 20:53:03.155 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}
2025-03-12 16:53:04 2025-03-12 20:53:04.075 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/audio/speech HTTP/1.1" 200 - {}
2025-03-12 16:53:04 2025-03-12 20:53:04.390 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.20.0.1:0 - "POST /api/v1/audio/speech HTTP/1.1" 200 - {}
"
Additional Information
Hypothesis-
Take this with a grain of salt, but based on the observed behavior and logs, the issue appears to be a client-side timing or buffering problem within Open WebUI's JavaScript code, specifically in the CallOverlay.svelte component (or related audio playback logic).
The hypothesis is that Open WebUI might be attempting to start audio playback (audio.play()) too early, before the browser has fully buffered the audio data from the external TTS service response. This could lead to the browser starting playback from a point within the audio file, resulting in the partial audio playback.
The "Audio for ... not yet available in the cache, re-queued..." console message in Open WebUI further supports this timing/buffering hypothesis.