[GH-ISSUE #24143] issue: TTS treats PCM audio responses as MP3 #58874

Open
opened 2026-05-06 00:19:09 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @daradib on GitHub (Apr 26, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/24143

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.9.2

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24.04

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

I can use the OpenAI-compatible API from OpenRouter or LiteLLM to generate speech using Gemini-TTS. PCM audio from the Text-to-Speech Engine is transcoded to MP3 for the client.

Actual Behavior

OpenRouter and LiteLLM respond with PCM and do not support any other format for Gemini-TTS. Open WebUI passes the raw PCM audio to client with MP3 content type and no audio is heard.

Steps to Reproduce

Configure Text-to-Speech in Admin Panel

Logs & Screenshots

No errors. Browser network log shows audio/mpeg response even though the file is actually PCM.

Additional Information

I will open a PR to transcode when TTS Engine returns Content-Type audio/pcm.

Originally created by @daradib on GitHub (Apr 26, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/24143 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.9.2 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior I can use the OpenAI-compatible API from OpenRouter or LiteLLM to generate speech using Gemini-TTS. PCM audio from the Text-to-Speech Engine is transcoded to MP3 for the client. ### Actual Behavior OpenRouter and LiteLLM respond with PCM and do not support any other format for Gemini-TTS. Open WebUI passes the raw PCM audio to client with MP3 content type and no audio is heard. ### Steps to Reproduce Configure Text-to-Speech in Admin Panel - Text to Speech Engine: OpenAI - API Base URL: https://openrouter.ai/api/v1 - TTS Voice: Zephyr - Model: google/gemini-3.1-flash-tts-preview ### Logs & Screenshots No errors. Browser network log shows audio/mpeg response even though the file is actually PCM. ### Additional Information I will open a PR to transcode when TTS Engine returns Content-Type audio/pcm.
GiteaMirror added the bug label 2026-05-06 00:19:09 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58874