[GH-ISSUE #23822] feat: update Speech-to-Text Engine MistralAI Chat Completions API to properly use input_audio spec #58746

Closed
opened 2026-05-05 23:50:17 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @pfn on GitHub (Apr 16, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23822

Check Existing Issues

  • I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

  • I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

MistralAI STT Chat Completions configuration does not use the correct input_audio payload per https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create#(resource)%20chat.completions%20%3E%20(model)%20chat_completion_content_part_input_audio%20%3E%20(schema)

input_audio should be {"data": "base64audio", "format":"wav|mp3"}, currently, it is "input_audio": "base64audio" which is not correct per the specification linked

Desired Solution you'd like

Ideally, it would be nice to pull completions out as a whole separate new provider for custom endpoints. In lieu of that, I just need the following change to the MistralAI provider:

diff --git a/backend/open_webui/routers/audio.py b/backend/open_webui/routers/audio.py
index 8e14387a7..a6d26c810 100644
--- a/backend/open_webui/routers/audio.py
+++ b/backend/open_webui/routers/audio.py
@@ -893,7 +893,10 @@ def transcription_handler(request, file_path, metadata, user=None):

                 # Read and encode audio file as base64
                 with open(audio_file_to_use, 'rb') as audio_file:
-                    audio_base64 = base64.b64encode(audio_file.read()).decode('utf-8')
+                    audio_base64 = {
+                        'data':  base64.b64encode(audio_file.read()).decode('utf-8'),
+                        'format': mimetypes.guess_extension(mimetypes.guess_type(audio_file_to_use)[0]).lstrip('.'),
+                    }

                 # Prepare chat completions request
                 url = f'{api_base_url}/chat/completions'

Alternatives Considered

None available.

Additional Context

No response

Originally created by @pfn on GitHub (Apr 16, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23822 ### Check Existing Issues - [x] I have searched for all existing **open AND closed** issues and discussions for similar requests. I have found none that is comparable to my request. ### Verify Feature Scope - [x] I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions. ### Problem Description MistralAI STT Chat Completions configuration does not use the correct `input_audio` payload per https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create#(resource)%20chat.completions%20%3E%20(model)%20chat_completion_content_part_input_audio%20%3E%20(schema) `input_audio` should be `{"data": "base64audio", "format":"wav|mp3"}`, currently, it is `"input_audio": "base64audio"` which is not correct per the specification linked ### Desired Solution you'd like Ideally, it would be nice to pull completions out as a whole separate new provider for custom endpoints. In lieu of that, I just need the following change to the MistralAI provider: ```diff diff --git a/backend/open_webui/routers/audio.py b/backend/open_webui/routers/audio.py index 8e14387a7..a6d26c810 100644 --- a/backend/open_webui/routers/audio.py +++ b/backend/open_webui/routers/audio.py @@ -893,7 +893,10 @@ def transcription_handler(request, file_path, metadata, user=None): # Read and encode audio file as base64 with open(audio_file_to_use, 'rb') as audio_file: - audio_base64 = base64.b64encode(audio_file.read()).decode('utf-8') + audio_base64 = { + 'data': base64.b64encode(audio_file.read()).decode('utf-8'), + 'format': mimetypes.guess_extension(mimetypes.guess_type(audio_file_to_use)[0]).lstrip('.'), + } # Prepare chat completions request url = f'{api_base_url}/chat/completions' ``` ### Alternatives Considered None available. ### Additional Context _No response_
Author
Owner

@tjbck commented on GitHub (Apr 17, 2026):

Addressed in dev.

<!-- gh-comment-id:4264873911 --> @tjbck commented on GitHub (Apr 17, 2026): Addressed in dev.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58746