[GH-ISSUE #14101] issue: Speech-To-Text failure when using OpenAPI Comptaible Endpoint #32670

New Issue

GiteaMirror · 2026-04-25T06:34:21-05:00

GiteaMirror commented

2026-04-25 06:34:21 -05:00

Originally created by @andrefecto on GitHub (May 20, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/14101

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.10

Ollama Version (if applicable)

N/A

Operating System

Redhat Enterprise Linux 9

Browser (if applicable)

Edge, latest

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have listed steps to reproduce the bug in detail.

Expected Behavior

Speech-To-Text, when clicking the record button, should generate a file that is compatible with OpenAPI/OpenA,I but it does not.

Actual Behavior

When you use the record button, it generates a file int he webm format, it says it converts it to a wav/mp3, however it seems to keep it as webm (or it converts it to an mp3 but it puts the wrong file extension on) which then causes the gpt-4o-mini-transcribe to spit back that the file is corrupted.

Steps to Reproduce

In a new chat, click the record button.
Speak into it and then click the check box
Let it try and send the audio to gpt-4o-mini-transcrib,e and it will fail

Logs & Screenshots

Docker logs:

2025-05-20 18:30:52.128 | DEBUG    | python_multipart.multipart:callback:628 - Calling on_part_begin with no data - {}
2025-05-20 18:30:52.128 | DEBUG    | python_multipart.multipart:callback:625 - Calling on_header_field with data[42:61] - {}
2025-05-20 18:30:52.129 | DEBUG    | python_multipart.multipart:callback:625 - Calling on_header_value with data[63:132] - {}
2025-05-20 18:30:52.129 | DEBUG    | python_multipart.multipart:callback:628 - Calling on_header_end with no data - {}
2025-05-20 18:30:52.129 | DEBUG    | python_multipart.multipart:callback:625 - Calling on_header_field with data[134:146] - {}
2025-05-20 18:30:52.129 | DEBUG    | python_multipart.multipart:callback:625 - Calling on_header_value with data[148:170] - {}
2025-05-20 18:30:52.129 | DEBUG    | python_multipart.multipart:callback:628 - Calling on_header_end with no data - {}
2025-05-20 18:30:52.129 | DEBUG    | python_multipart.multipart:callback:628 - Calling on_headers_finished with no data - {}
2025-05-20 18:30:52.129 | DEBUG    | python_multipart.multipart:callback:625 - Calling on_part_data with data[174:31977] - {}
2025-05-20 18:30:52.129 | DEBUG    | python_multipart.multipart:callback:628 - Calling on_part_end with no data - {}
2025-05-20 18:30:52.129 | DEBUG    | python_multipart.multipart:callback:628 - Calling on_end with no data - {}
2025-05-20 18:30:52.138 | INFO     | open_webui.routers.audio:transcription:791 - file.content_type: audio/webm;codecs=opus - {}
2025-05-20 18:30:52.138 | INFO     | open_webui.routers.audio:transcribe:507 - transcribe: <internal_path>/<uuid>.webm - {}
2025-05-20 18:30:52.344 | DEBUG    | pydub.logging_utils:log_conversion:9 - subprocess.call(['ffmpeg', '-y', '-f', 'webm', '-i', '<internal_path>/<uuid>.webm', '-acodec', 'pcm_s32le', '-vn', '-f', 'wav', '-']) - {}
2025-05-20 18:30:52.417 | INFO     | open_webui.routers.audio:convert_audio_to_wav:98 - Converted <internal_path>/<uuid>.webm to <internal_path>/<uuid>.webm - {}
2025-05-20 18:30:52.419 | DEBUG    | urllib3.connectionpool:_new_conn:1049 - Starting new HTTPS connection (1): <redacted-domain>:443 - {}
2025-05-20 18:31:04.248 | DEBUG    | urllib3.connectionpool:_make_request:544 - https://<redacted-domain>:443 "POST /v1/audio/transcriptions HTTP/1.1" 500 300 - {}
2025-05-20 18:31:04.249 | ERROR    | open_webui.routers.audio:transcribe:572 - 500 Server Error: Internal Server Error for url: https://<redacted-domain>/v1/audio/transcriptions - {}
Traceback (most recent call last):

  ...
    result = context.run(func, *args)
             ...
             └ functools.partial(<function transcription at 0x...>, user=UserModel(id='<user_id>', name='<user_name>'))
  ...

requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://<redacted-domain>/v1/audio/transcriptions
2025-05-20 18:31:04.252 | ERROR    | open_webui.routers.audio:transcription:836 - External: litellm.BadRequestError: AzureException BadRequestError - Audio file might be corrupted or unsupported. Received Model Group=gpt-4o-mini-transcribe
Available Model Group Fallbacks=None LiteLLM Retried: 4 times, LiteLLM Max Retries: 5 - {}

...

Exception: External: litellm.BadRequestError: AzureException BadRequestError - Audio file might be corrupted or unsupported. Received Model Group=gpt-4o-mini-transcribe
Available Model Group Fallbacks=None LiteLLM Retried: 4 times, LiteLLM Max Retries: 5

...

fastapi.exceptions.HTTPException: 400: [ERROR: External: litellm.BadRequestError: AzureException BadRequestError - Audio file might be corrupted or unsupported. Received Model Group=gpt-4o-mini-transcribe
Available Model Group Fallbacks=None LiteLLM Retried: 4 times, LiteLLM Max Retries: 5]
2025-05-20 18:31:04.256 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - <ip>:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 400 - {}

Browser response:

{
    "detail": "[ERROR: 400: [ERROR: External: litellm.BadRequestError: AzureException BadRequestError - Audio file might be corrupted or unsupported. Received Model Group=gpt-4o-mini-transcribe\nAvailable Model Group Fallbacks=None LiteLLM Retried: 4 times, LiteLLM Max Retries: 5]]"
}

Additional Information

I have tested sending an MP3 file directly through my LiteLLM proxy via the same API endpoint that Open WebUI uses, and it works, which to me points to a bug in the Open WebUI code
The transcription works fine if I upload an MP3 file and ask it to transcribe it. Which then says:
2.1: Open WebUI can successfully use LiteLLM for audio transcription requests
2.2: Open WebUI is somehow failing to take the front-end recorded audio file and pass it to the back-end properly. See the log line "open_webui.routers.audio:convert_audio_to_wav:98", it says it converted it from a webm, to a webm file, however, the line right above it is the ffmpeg command to convert it to a wav.

Originally created by @andrefecto on GitHub (May 20, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/14101 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.10 ### Ollama Version (if applicable) N/A ### Operating System Redhat Enterprise Linux 9 ### Browser (if applicable) Edge, latest ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior Speech-To-Text, when clicking the record button, should generate a file that is compatible with OpenAPI/OpenA,I but it does not. ### Actual Behavior When you use the record button, it generates a file int he webm format, it says it converts it to a wav/mp3, however it seems to keep it as webm (or it converts it to an mp3 but it puts the wrong file extension on) which then causes the gpt-4o-mini-transcribe to spit back that the file is corrupted. ### Steps to Reproduce 1. In a new chat, click the record button. 2. Speak into it and then click the check box 3. Let it try and send the audio to gpt-4o-mini-transcrib,e and it will fail ### Logs & Screenshots Docker logs: ``` 2025-05-20 18:30:52.128 | DEBUG | python_multipart.multipart:callback:628 - Calling on_part_begin with no data - {} 2025-05-20 18:30:52.128 | DEBUG | python_multipart.multipart:callback:625 - Calling on_header_field with data[42:61] - {} 2025-05-20 18:30:52.129 | DEBUG | python_multipart.multipart:callback:625 - Calling on_header_value with data[63:132] - {} 2025-05-20 18:30:52.129 | DEBUG | python_multipart.multipart:callback:628 - Calling on_header_end with no data - {} 2025-05-20 18:30:52.129 | DEBUG | python_multipart.multipart:callback:625 - Calling on_header_field with data[134:146] - {} 2025-05-20 18:30:52.129 | DEBUG | python_multipart.multipart:callback:625 - Calling on_header_value with data[148:170] - {} 2025-05-20 18:30:52.129 | DEBUG | python_multipart.multipart:callback:628 - Calling on_header_end with no data - {} 2025-05-20 18:30:52.129 | DEBUG | python_multipart.multipart:callback:628 - Calling on_headers_finished with no data - {} 2025-05-20 18:30:52.129 | DEBUG | python_multipart.multipart:callback:625 - Calling on_part_data with data[174:31977] - {} 2025-05-20 18:30:52.129 | DEBUG | python_multipart.multipart:callback:628 - Calling on_part_end with no data - {} 2025-05-20 18:30:52.129 | DEBUG | python_multipart.multipart:callback:628 - Calling on_end with no data - {} 2025-05-20 18:30:52.138 | INFO | open_webui.routers.audio:transcription:791 - file.content_type: audio/webm;codecs=opus - {} 2025-05-20 18:30:52.138 | INFO | open_webui.routers.audio:transcribe:507 - transcribe: <internal_path>/<uuid>.webm - {} 2025-05-20 18:30:52.344 | DEBUG | pydub.logging_utils:log_conversion:9 - subprocess.call(['ffmpeg', '-y', '-f', 'webm', '-i', '<internal_path>/<uuid>.webm', '-acodec', 'pcm_s32le', '-vn', '-f', 'wav', '-']) - {} 2025-05-20 18:30:52.417 | INFO | open_webui.routers.audio:convert_audio_to_wav:98 - Converted <internal_path>/<uuid>.webm to <internal_path>/<uuid>.webm - {} 2025-05-20 18:30:52.419 | DEBUG | urllib3.connectionpool:_new_conn:1049 - Starting new HTTPS connection (1): <redacted-domain>:443 - {} 2025-05-20 18:31:04.248 | DEBUG | urllib3.connectionpool:_make_request:544 - https://<redacted-domain>:443 "POST /v1/audio/transcriptions HTTP/1.1" 500 300 - {} 2025-05-20 18:31:04.249 | ERROR | open_webui.routers.audio:transcribe:572 - 500 Server Error: Internal Server Error for url: https://<redacted-domain>/v1/audio/transcriptions - {} Traceback (most recent call last): ... result = context.run(func, *args) ... └ functools.partial(<function transcription at 0x...>, user=UserModel(id='<user_id>', name='<user_name>')) ... requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://<redacted-domain>/v1/audio/transcriptions 2025-05-20 18:31:04.252 | ERROR | open_webui.routers.audio:transcription:836 - External: litellm.BadRequestError: AzureException BadRequestError - Audio file might be corrupted or unsupported. Received Model Group=gpt-4o-mini-transcribe Available Model Group Fallbacks=None LiteLLM Retried: 4 times, LiteLLM Max Retries: 5 - {} ... Exception: External: litellm.BadRequestError: AzureException BadRequestError - Audio file might be corrupted or unsupported. Received Model Group=gpt-4o-mini-transcribe Available Model Group Fallbacks=None LiteLLM Retried: 4 times, LiteLLM Max Retries: 5 ... fastapi.exceptions.HTTPException: 400: [ERROR: External: litellm.BadRequestError: AzureException BadRequestError - Audio file might be corrupted or unsupported. Received Model Group=gpt-4o-mini-transcribe Available Model Group Fallbacks=None LiteLLM Retried: 4 times, LiteLLM Max Retries: 5] 2025-05-20 18:31:04.256 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - <ip>:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 400 - {} ``` Browser response: ``` { "detail": "[ERROR: 400: [ERROR: External: litellm.BadRequestError: AzureException BadRequestError - Audio file might be corrupted or unsupported. Received Model Group=gpt-4o-mini-transcribe\nAvailable Model Group Fallbacks=None LiteLLM Retried: 4 times, LiteLLM Max Retries: 5]]" } ``` ### Additional Information 1. I have tested sending an MP3 file directly through my LiteLLM proxy via the same API endpoint that Open WebUI uses, and it works, which to me points to a bug in the Open WebUI code 2. The transcription works fine if I upload an MP3 file and ask it to transcribe it. Which then says: 2.1: Open WebUI can successfully use LiteLLM for audio transcription requests 2.2: Open WebUI is somehow failing to take the front-end recorded audio file and pass it to the back-end properly. See the log line "open_webui.routers.audio:convert_audio_to_wav:98", it says it converted it from a webm, to a webm file, however, the line right above it is the ffmpeg command to convert it to a wav.

GiteaMirror added the bug label 2026-04-25 06:34:21 -05:00

GiteaMirror closed this issue

2026-04-25 06:34:21 -05:00

GiteaMirror commented

2026-04-25 06:34:23 -05:00

@andrefecto commented on GitHub (May 20, 2025):

I am closing this as it's resolved in v0.6.10. I thought I had updated my container, and I apparently was looking at the wrong server. (I have a few instances for dev/test/prod running.)

@andrefecto commented on GitHub (May 20, 2025): I am closing this as it's resolved in v0.6.10. I thought I had updated my container, and I apparently was looking at the wrong server. (I have a few instances for dev/test/prod running.)

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#32670