[GH-ISSUE #24143] issue: TTS treats PCM audio responses as MP3

Originally created by @daradib on GitHub (Apr 26, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/24143 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.9.2 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior I can use the OpenAI-compatible API from OpenRouter or LiteLLM to generate speech using Gemini-TTS. PCM audio from the Text-to-Speech Engine is transcoded to MP3 for the client. ### Actual Behavior OpenRouter and LiteLLM respond with PCM and do not support any other format for Gemini-TTS. Open WebUI passes the raw PCM audio to client with MP3 content type and no audio is heard. ### Steps to Reproduce Configure Text-to-Speech in Admin Panel - Text to Speech Engine: OpenAI - API Base URL: https://openrouter.ai/api/v1 - TTS Voice: Zephyr - Model: google/gemini-3.1-flash-tts-preview ### Logs & Screenshots No errors. Browser network log shows audio/mpeg response even though the file is actually PCM. ### Additional Information I will open a PR to transcode when TTS Engine returns Content-Type audio/pcm.

GiteaMirror commented

2026-05-06 00:19:09 -05:00

Owner

Originally created by @daradib on GitHub (Apr 26, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/24143

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.9.2

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24.04

Browser (if applicable)

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

I can use the OpenAI-compatible API from OpenRouter or LiteLLM to generate speech using Gemini-TTS. PCM audio from the Text-to-Speech Engine is transcoded to MP3 for the client.

Actual Behavior

OpenRouter and LiteLLM respond with PCM and do not support any other format for Gemini-TTS. Open WebUI passes the raw PCM audio to client with MP3 content type and no audio is heard.

Steps to Reproduce

Configure Text-to-Speech in Admin Panel

Text to Speech Engine: OpenAI
API Base URL: https://openrouter.ai/api/v1
TTS Voice: Zephyr
Model: google/gemini-3.1-flash-tts-preview

Logs & Screenshots

No errors. Browser network log shows audio/mpeg response even though the file is actually PCM.

Additional Information

I will open a PR to transcode when TTS Engine returns Content-Type audio/pcm.

GiteaMirror added the bug label 2026-05-06 00:19:09 -05:00

[GH-ISSUE #24143] issue: TTS treats PCM audio responses as MP3 #58874