[PR #16038] [CLOSED] feat: support audio language selector for deepgram API, add fallback for both openai/deepgram #24012

Closed
opened 2026-04-20 05:11:04 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/16038
Author: @jfouret
Created: 7/26/2025
Status: Closed

Base: devHead: feat/support_language_deepgram


📝 Commits (3)

  • 2f2066b Add retry logic and language parameter handling for Deepgram API
  • fca54bc Refactor language selection logic in transcription handler
  • 31e3a51 Implement language fallback for STT transcription requests

📊 Changes

1 file changed (+47 additions, -29 deletions)

View changed files

📝 backend/open_webui/routers/audio.py (+47 -29)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.

Before submitting, make sure you've checked the following:

  • Target branch: Please verify that the pull request targets the dev branch.
  • Description: Provide a concise description of the changes made in this pull request.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Have you written and run sufficient tests to validate the changes?
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • BREAKING CHANGE: Significant changes that may affect compatibility
    • build: Changes that affect the build system or external dependencies
    • ci: Changes to our continuous integration processes or workflows
    • chore: Refactor, cleanup, or other non-functional code changes
    • docs: Documentation update or addition
    • feat: Introduces a new feature or enhancement to the codebase
    • fix: Bug fix or error correction
    • i18n: Internationalization or localization changes
    • perf: Performance improvement
    • refactor: Code restructuring for better maintainability, readability, or scalability
    • style: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.)
    • test: Adding missing tests or correcting existing tests
    • WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

The language selector support mentioned in #13989 was not properly implemented for deepgram API.

Added

  • Support audio language parameter for deepgram API
  • Add fallback if language parameter is not recognized for openai and deepgram api

Additional Information

  • Implement a fallback mecanismfor failed API calls linked with bad language
  • Handle 400 errors by removing language param and retrying with next language candidate
  • Add warning logs for language parameter removal attempts

here the logs from my tests (test done on v0.6.18 within docker)
I juste updated the audio.py file in the backend repository
openAI:

2025-07-26 13:51:31.709 | INFO     | open_webui.routers.audio:transcription:948 - file.content_type: audio/webm; codecs=opus - {}
2025-07-26 13:51:31.710 | INFO     | open_webui.routers.audio:transcribe:827 - transcribe: /app/backend/data/cache/audio/transcriptions/e0615551-4a01-4c1f-9643-c5bc0bd68367.webm {'language': 'iougiug'} - {}
2025-07-26 13:51:32.139 | INFO     | open_webui.routers.audio:convert_audio_to_mp3:116 - Converted /app/backend/data/cache/audio/transcriptions/e0615551-4a01-4c1f-9643-c5bc0bd68367.webm to /app/backend/data/cache/audio/transcriptions/e0615551-4a01-4c1f-9643-c5bc0bd68367.mp3 - {}
Chunk paths: ['/app/backend/data/cache/audio/transcriptions/e0615551-4a01-4c1f-9643-c5bc0bd68367.mp3']
2025-07-26 13:51:33.599 | WARNING  | open_webui.routers.audio:transcription_handler:602 - Failed openai transcribe with language="iougiug", Retrying with "None"... - {}
2025-07-26 13:51:34.537 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 200 - {}
2025-07-26 13:51:38.169 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /_app/version.json HTTP/1.1" 304 - {}
2025-07-26 13:51:42.156 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /_app/version.json HTTP/1.1" 304 - {}
2025-07-26 13:51:44.852 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/99ffa00f-d9b0-4df3-9182-38b6d6a0df2a HTTP/1.1" 200 - {}
2025-07-26 13:51:44.871 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/99ffa00f-d9b0-4df3-9182-38b6d6a0df2a HTTP/1.1" 200 - {}
2025-07-26 13:51:44.904 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/99ffa00f-d9b0-4df3-9182-38b6d6a0df2a HTTP/1.1" 200 - {}
2025-07-26 13:52:09.633 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /_app/version.json HTTP/1.1" 304 - {}

Deepgram:

2025-07-26 13:57:55.715 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 - {}
2025-07-26 13:57:59.178 | INFO     | open_webui.routers.audio:transcription:949 - file.content_type: audio/webm; codecs=opus - {}
2025-07-26 13:57:59.179 | INFO     | open_webui.routers.audio:transcribe:828 - transcribe: /app/backend/data/cache/audio/transcriptions/9cff6b64-e751-4899-be3b-53b3052a0cb2.webm {'language': 'iougiug'} - {}
2025-07-26 13:57:59.607 | INFO     | open_webui.routers.audio:convert_audio_to_mp3:116 - Converted /app/backend/data/cache/audio/transcriptions/9cff6b64-e751-4899-be3b-53b3052a0cb2.webm to /app/backend/data/cache/audio/transcriptions/9cff6b64-e751-4899-be3b-53b3052a0cb2.mp3 - {}
Chunk paths: ['/app/backend/data/cache/audio/transcriptions/9cff6b64-e751-4899-be3b-53b3052a0cb2.mp3']
2025-07-26 13:57:59.995 | WARNING  | open_webui.routers.audio:transcription_handler:668 - Failed deepgram transcribe with language="iougiug", Retrying with "None"... - {}
2025-07-26 13:58:00.876 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 200 - {}
2025-07-26 13:58:08.468 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/usage HTTP/1.1" 200 - {}
2025-07-26 13:58:10.111 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /_app/version.json HTTP/1.1" 200 - {}
2025-07-26 13:58:12.413 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/audio/voices HTTP/1.1" 200 - {}
2025-07-26 13:58:18.467 | INFO     | open_webui.routers.openai:get_all_models:392 - get_all_models() - {}
2025-07-26 13:58:18.804 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/models HTTP/1.1" 200 - {}
2025-07-26 13:58:18.996 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "POST /api/v1/users/user/settings/update HTTP/1.1" 200 - {}
2025-07-26 13:58:30.767 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/6a01aad5-8e89-4753-8a12-b2d74c6358c0 HTTP/1.1" 200 - {}
2025-07-26 13:58:31.099 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/9cb8b7ee-e459-42cf-8bd8-704231e8caf1 HTTP/1.1" 200 - {}
2025-07-26 13:58:31.100 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/0181d97a-5ac5-4d02-833a-b0e4fd02f596 HTTP/1.1" 200 - {}
2025-07-26 13:58:32.855 | INFO     | open_webui.routers.audio:transcription:949 - file.content_type: audio/webm; codecs=opus - {}
2025-07-26 13:58:32.856 | INFO     | open_webui.routers.audio:transcribe:828 - transcribe: /app/backend/data/cache/audio/transcriptions/9a17270d-5e2f-4b00-b9e0-29f822abc4d8.webm {'language': 'multi'} - {}
2025-07-26 13:58:33.434 | INFO     | open_webui.routers.audio:convert_audio_to_mp3:116 - Converted /app/backend/data/cache/audio/transcriptions/9a17270d-5e2f-4b00-b9e0-29f822abc4d8.webm to /app/backend/data/cache/audio/transcriptions/9a17270d-5e2f-4b00-b9e0-29f822abc4d8.mp3 - {}
Chunk paths: ['/app/backend/data/cache/audio/transcriptions/9a17270d-5e2f-4b00-b9e0-29f822abc4d8.mp3']
2025-07-26 13:58:34.675 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 200 - {}

I tried with French and it works, below after transcription with deepgram nova-3 (multi).

image

This match the #13720 which was closed too fast.

Note that I did not find any test suite related to audio.py
Note that local whisper does not raise any error when a randome string is given as language parameters, so I did not implement any fallback here. Azure logic seems different as we can submit several loavles already.
I tried to integrate comment from @rgaricano about #15935

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/16038 **Author:** [@jfouret](https://github.com/jfouret) **Created:** 7/26/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `feat/support_language_deepgram` --- ### 📝 Commits (3) - [`2f2066b`](https://github.com/open-webui/open-webui/commit/2f2066b087cd280b990670079c7211ff1048e796) Add retry logic and language parameter handling for Deepgram API - [`fca54bc`](https://github.com/open-webui/open-webui/commit/fca54bc24384d6190977a7320634629a62774336) Refactor language selection logic in transcription handler - [`31e3a51`](https://github.com/open-webui/open-webui/commit/31e3a51543b5bdecd35720c201b4e784045bbcc4) Implement language fallback for STT transcription requests ### 📊 Changes **1 file changed** (+47 additions, -29 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/routers/audio.py` (+47 -29) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) and describe your changes before submitting a pull request. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [x] **Documentation:** Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources? - [x] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [x] **Testing:** Have you written and run sufficient tests to validate the changes? - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work # Changelog Entry ### Description The language selector support mentioned in #13989 was not properly implemented for deepgram API. ### Added - Support audio language parameter for deepgram API - Add fallback if language parameter is not recognized for openai and deepgram api --- ### Additional Information - Implement a fallback mecanismfor failed API calls linked with bad language - Handle 400 errors by removing language param and retrying with next language candidate - Add warning logs for language parameter removal attempts here the logs from my tests (test done on v0.6.18 within docker) I juste updated the audio.py file in the backend repository openAI: ``` 2025-07-26 13:51:31.709 | INFO | open_webui.routers.audio:transcription:948 - file.content_type: audio/webm; codecs=opus - {} 2025-07-26 13:51:31.710 | INFO | open_webui.routers.audio:transcribe:827 - transcribe: /app/backend/data/cache/audio/transcriptions/e0615551-4a01-4c1f-9643-c5bc0bd68367.webm {'language': 'iougiug'} - {} 2025-07-26 13:51:32.139 | INFO | open_webui.routers.audio:convert_audio_to_mp3:116 - Converted /app/backend/data/cache/audio/transcriptions/e0615551-4a01-4c1f-9643-c5bc0bd68367.webm to /app/backend/data/cache/audio/transcriptions/e0615551-4a01-4c1f-9643-c5bc0bd68367.mp3 - {} Chunk paths: ['/app/backend/data/cache/audio/transcriptions/e0615551-4a01-4c1f-9643-c5bc0bd68367.mp3'] 2025-07-26 13:51:33.599 | WARNING | open_webui.routers.audio:transcription_handler:602 - Failed openai transcribe with language="iougiug", Retrying with "None"... - {} 2025-07-26 13:51:34.537 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 200 - {} 2025-07-26 13:51:38.169 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /_app/version.json HTTP/1.1" 304 - {} 2025-07-26 13:51:42.156 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /_app/version.json HTTP/1.1" 304 - {} 2025-07-26 13:51:44.852 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/99ffa00f-d9b0-4df3-9182-38b6d6a0df2a HTTP/1.1" 200 - {} 2025-07-26 13:51:44.871 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/99ffa00f-d9b0-4df3-9182-38b6d6a0df2a HTTP/1.1" 200 - {} 2025-07-26 13:51:44.904 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/99ffa00f-d9b0-4df3-9182-38b6d6a0df2a HTTP/1.1" 200 - {} 2025-07-26 13:52:09.633 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /_app/version.json HTTP/1.1" 304 - {} ``` Deepgram: ``` 2025-07-26 13:57:55.715 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200 - {} 2025-07-26 13:57:59.178 | INFO | open_webui.routers.audio:transcription:949 - file.content_type: audio/webm; codecs=opus - {} 2025-07-26 13:57:59.179 | INFO | open_webui.routers.audio:transcribe:828 - transcribe: /app/backend/data/cache/audio/transcriptions/9cff6b64-e751-4899-be3b-53b3052a0cb2.webm {'language': 'iougiug'} - {} 2025-07-26 13:57:59.607 | INFO | open_webui.routers.audio:convert_audio_to_mp3:116 - Converted /app/backend/data/cache/audio/transcriptions/9cff6b64-e751-4899-be3b-53b3052a0cb2.webm to /app/backend/data/cache/audio/transcriptions/9cff6b64-e751-4899-be3b-53b3052a0cb2.mp3 - {} Chunk paths: ['/app/backend/data/cache/audio/transcriptions/9cff6b64-e751-4899-be3b-53b3052a0cb2.mp3'] 2025-07-26 13:57:59.995 | WARNING | open_webui.routers.audio:transcription_handler:668 - Failed deepgram transcribe with language="iougiug", Retrying with "None"... - {} 2025-07-26 13:58:00.876 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 200 - {} 2025-07-26 13:58:08.468 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/usage HTTP/1.1" 200 - {} 2025-07-26 13:58:10.111 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /_app/version.json HTTP/1.1" 200 - {} 2025-07-26 13:58:12.413 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/audio/voices HTTP/1.1" 200 - {} 2025-07-26 13:58:18.467 | INFO | open_webui.routers.openai:get_all_models:392 - get_all_models() - {} 2025-07-26 13:58:18.804 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/models HTTP/1.1" 200 - {} 2025-07-26 13:58:18.996 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "POST /api/v1/users/user/settings/update HTTP/1.1" 200 - {} 2025-07-26 13:58:30.767 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/6a01aad5-8e89-4753-8a12-b2d74c6358c0 HTTP/1.1" 200 - {} 2025-07-26 13:58:31.099 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/9cb8b7ee-e459-42cf-8bd8-704231e8caf1 HTTP/1.1" 200 - {} 2025-07-26 13:58:31.100 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "GET /api/v1/chats/0181d97a-5ac5-4d02-833a-b0e4fd02f596 HTTP/1.1" 200 - {} 2025-07-26 13:58:32.855 | INFO | open_webui.routers.audio:transcription:949 - file.content_type: audio/webm; codecs=opus - {} 2025-07-26 13:58:32.856 | INFO | open_webui.routers.audio:transcribe:828 - transcribe: /app/backend/data/cache/audio/transcriptions/9a17270d-5e2f-4b00-b9e0-29f822abc4d8.webm {'language': 'multi'} - {} 2025-07-26 13:58:33.434 | INFO | open_webui.routers.audio:convert_audio_to_mp3:116 - Converted /app/backend/data/cache/audio/transcriptions/9a17270d-5e2f-4b00-b9e0-29f822abc4d8.webm to /app/backend/data/cache/audio/transcriptions/9a17270d-5e2f-4b00-b9e0-29f822abc4d8.mp3 - {} Chunk paths: ['/app/backend/data/cache/audio/transcriptions/9a17270d-5e2f-4b00-b9e0-29f822abc4d8.mp3'] 2025-07-26 13:58:34.675 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 176.144.33.197:50860:0 - "POST /api/v1/audio/transcriptions HTTP/1.1" 200 - {} ``` I tried with French and it works, below after transcription with deepgram nova-3 (multi). <img width="1092" height="146" alt="image" src="https://github.com/user-attachments/assets/84f20187-86dc-4407-8829-90f848c031ae" /> This match the #13720 which was closed too fast. Note that I did not find any test suite related to `audio.py` Note that local whisper does not raise any error when a randome string is given as language parameters, so I did not implement any fallback here. Azure logic seems different as we can submit several loavles already. I tried to integrate comment from @rgaricano about #15935 ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 05:11:05 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#24012