issue: OpenAI whisper via API, Content-Type header is not sent #4587

New Issue

GiteaMirror · 2025-11-11T15:57:38-06:00

GiteaMirror commented

2025-11-11 15:57:38 -06:00

Originally created by @AlexanderZhk on GitHub (Mar 27, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.5.20

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24.04 LTS

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have listed steps to reproduce the bug in detail.

Expected Behavior

When using a self-hosted OpenAI compatible whisper API, requests should look like this:

POST http://localhost:8082/v1/audio/transcriptions HTTP/1.1
User-Agent: PostmanRuntime/7.37.3
Accept: */*
Host: localhost:8082
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Type: multipart/form-data; boundary=--------------------------664682140530690538634952
Content-Length: 728400

----------------------------664682140530690538634952
Content-Disposition: form-data; name="file"; filename="Recording (37) (1).wav"
Content-Type: audio/wave

[...]

Please notice the Content-Type: audio/wave header.

Actual Behavior

Requests sent from OWIU are missing the Content-Type: audio/wave header:

POST http://host.docker.internal:8082/v1/audio/transcriptions HTTP/1.1
Host: host.docker.internal:8082
User-Agent: python-requests/2.32.3
Accept-Encoding: gzip, deflate, zstd
Accept: */*
Connection: keep-alive
Authorization: Bearer 123
Content-Length: 30677
Content-Type: multipart/form-data; boundary=df57d76b425c4d856b30788f6804aec4

--df57d76b425c4d856b30788f6804aec4
Content-Disposition: form-data; name="model"

./cache\\cache\\huggingface\\models--Systran--faster-whisper-small\\snapshots\\536b0662742c02347bc0e980a01041f333bce120
--df57d76b425c4d856b30788f6804aec4
Content-Disposition: form-data; name="file"; filename="4e56d23f-7825-455c-ab27-47c3ed12aa08.wav"

[...]

Steps to Reproduce

Deploy VoxBox
Add it as an OpenAI compatible STT model in the OWUI admin panel
Use the STT functionality via:
b03fc97e28/backend/open_webui/routers/audio.py (L511)

Logs & Screenshots

Additional Information

This is critical, when using VoxBox.

Changing
b03fc97e28/backend/open_webui/routers/audio.py (L511)
to

files={"file": (filename, open(file_path, "rb"),"audio/wave")},

seems to fix it.

Originally created by @AlexanderZhk on GitHub (Mar 27, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.5.20 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24.04 LTS ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior When using a self-hosted OpenAI compatible whisper API, requests should look like this: ``` POST http://localhost:8082/v1/audio/transcriptions HTTP/1.1 User-Agent: PostmanRuntime/7.37.3 Accept: */* Host: localhost:8082 Accept-Encoding: gzip, deflate, br Connection: keep-alive Content-Type: multipart/form-data; boundary=--------------------------664682140530690538634952 Content-Length: 728400 ----------------------------664682140530690538634952 Content-Disposition: form-data; name="file"; filename="Recording (37) (1).wav" Content-Type: audio/wave [...] ``` Please notice the `Content-Type: audio/wave` header. ### Actual Behavior Requests sent from OWIU are missing the `Content-Type: audio/wave` header: ``` POST http://host.docker.internal:8082/v1/audio/transcriptions HTTP/1.1 Host: host.docker.internal:8082 User-Agent: python-requests/2.32.3 Accept-Encoding: gzip, deflate, zstd Accept: */* Connection: keep-alive Authorization: Bearer 123 Content-Length: 30677 Content-Type: multipart/form-data; boundary=df57d76b425c4d856b30788f6804aec4 --df57d76b425c4d856b30788f6804aec4 Content-Disposition: form-data; name="model" ./cache\\cache\\huggingface\\models--Systran--faster-whisper-small\\snapshots\\536b0662742c02347bc0e980a01041f333bce120 --df57d76b425c4d856b30788f6804aec4 Content-Disposition: form-data; name="file"; filename="4e56d23f-7825-455c-ab27-47c3ed12aa08.wav" [...] ``` ### Steps to Reproduce 1. Deploy [VoxBox](https://github.com/gpustack/vox-box) 2. Add it as an OpenAI compatible STT model in the OWUI admin panel 3. Use the STT functionality via: https://github.com/open-webui/open-webui/blob/b03fc97e287f31ad07bda896143959bc4413f7d2/backend/open_webui/routers/audio.py#L511 ### Logs & Screenshots ![Image](https://github.com/user-attachments/assets/fa94ae04-1bbb-4b3e-8ca6-3cee87f51633) ![Image](https://github.com/user-attachments/assets/ae91a25f-c7d4-4e18-91a0-6d2bfe7fb016) ### Additional Information This is critical, when using [VoxBox](https://github.com/gpustack/vox-box). Changing https://github.com/open-webui/open-webui/blob/b03fc97e287f31ad07bda896143959bc4413f7d2/backend/open_webui/routers/audio.py#L511 to ``` files={"file": (filename, open(file_path, "rb"),"audio/wave")}, ``` seems to fix it.

GiteaMirror added the bug label 2025-11-11 15:57:38 -06:00

GiteaMirror closed this issue

2025-11-11 15:57:39 -06:00

GiteaMirror commented

2025-11-11 15:57:40 -06:00

@tjbck commented on GitHub (Mar 28, 2025):

Content-Type is not required for OpenAI whisper.

@tjbck commented on GitHub (Mar 28, 2025): `Content-Type` is not required for OpenAI whisper.

GiteaMirror referenced this issue

2026-04-19 20:19:28 -05:00

[GH-ISSUE #4587] Choosing Ollama Embedding Model Non-Functional #13664

GiteaMirror referenced this issue