[PR #13528] [MERGED] feat: Enhance YouTube Transcription Loader for multi-language support #46273

Closed
opened 2026-04-29 21:01:29 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/13528
Author: @Classic298
Created: 5/5/2025
Status: Merged
Merged: 5/6/2025
Merged by: @tjbck

Base: devHead: dev


📝 Commits (10+)

📊 Changes

1 file changed (+34 additions, -19 deletions)

View changed files

📝 backend/open_webui/retrieval/loaders/youtube.py (+34 -19)

📄 Description

Pull Request Checklist

Before submitting, make sure you've checked the following:

  • Target branch: Please verify that the pull request targets the dev branch.
  • Description: Provide a concise description of the changes made in this pull request.
  • Changelog: Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description.
  • Documentation: Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources?
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Have you written and run sufficient tests to validate the changes?
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • feat: Introduces a new feature or enhancement to the codebase

Changelog Entry

Description

Enhanced YouTube transcript loader to properly handle multiple language fallbacks. Previously, if a transcript wasn't available in the configured language, it would only fall back to English. Now, multiple languages can be specified in priority order, and the system will try each language in sequence before eventually falling back to English.

This is a nifty feature, as in some usecases, trying to work with videos in different languages will result in an unexpected error just because there was no transcription in e.g. de and en.

With this change, you can now create a custom priority list, e.g. es,de,en which will ensure different transcription languages will be (attempted to) fetched.

The behaviour of defaulting to English if any languages in the list were unsuccessful to be fetched remained unchanged. So a list of es,de will have the same result as es,de,en.

Added

  • Support for multiple language fallbacks when fetching YouTube transcriptions

Changed

  • Modified the load() method in YoutubeLoader class to attempt to fetch transcripts in each configured language in priority order
  • Updated documentation for YOUTUBE_LOADER_LANGUAGE config option to clarify the new behavior: https://github.com/open-webui/docs/pull/528

Fixed

  • Fixed issue #13309 and #1960 where YouTube transcription would fail if not available in the primary language, even if it was available in another supported language

Additional Information

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree tothe Contributor License Agreement (CLA), and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/13528 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 5/5/2025 **Status:** ✅ Merged **Merged:** 5/6/2025 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `dev` --- ### 📝 Commits (10+) - [`7680ac2`](https://github.com/open-webui/open-webui/commit/7680ac25179aed4d48815e178aa22ac8399c6381) Update youtube.py - [`0a845db`](https://github.com/open-webui/open-webui/commit/0a845db8eca7554d6310b7fad4d7360e2db66b91) Update youtube.py - [`0a3817e`](https://github.com/open-webui/open-webui/commit/0a3817ed860b2f1d1db190ec6a539b037d1f0701) Update youtube.py - [`1a30b37`](https://github.com/open-webui/open-webui/commit/1a30b3746ed05e9888b038e025075b6e1c17767a) Update youtube.py - [`b0d74a5`](https://github.com/open-webui/open-webui/commit/b0d74a59f14d8f9c8fbe6aa2676039523a45ef62) Update youtube.py - [`9cf3381`](https://github.com/open-webui/open-webui/commit/9cf33813813f92dc97ce33c4b89e79dcdc3f3a13) Update youtube.py - [`791dd24`](https://github.com/open-webui/open-webui/commit/791dd24ace6054d1822c4ad76f272c3228337d8c) Update youtube.py - [`67a612f`](https://github.com/open-webui/open-webui/commit/67a612fe2404edd7819717005981070339043932) Update youtube.py - [`5e1cb76`](https://github.com/open-webui/open-webui/commit/5e1cb76b93ea3b632ca0ddf9cbe308fd8ecd1d4d) Update youtube.py - [`a129e09`](https://github.com/open-webui/open-webui/commit/a129e0954ec7be642d57df816c98bd8a05c99d87) Update youtube.py ### 📊 Changes **1 file changed** (+34 additions, -19 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/loaders/youtube.py` (+34 -19) </details> ### 📄 Description # Pull Request Checklist **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [[Keep a Changelog](https://keepachangelog.com/)](https://keepachangelog.com/) is added at the bottom of the PR description. - [x] **Documentation:** Have you updated relevant documentation [[Open WebUI Docs](https://github.com/open-webui/docs)](https://github.com/open-webui/docs), or other documentation sources? - [x] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [x] **Testing:** Have you written and run sufficient tests to validate the changes? - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **feat**: Introduces a new feature or enhancement to the codebase # Changelog Entry ### Description Enhanced YouTube transcript loader to properly handle multiple language fallbacks. Previously, if a transcript wasn't available in the configured language, it would only fall back to English. Now, multiple languages can be specified in priority order, and the system will try each language in sequence before eventually falling back to English. This is a nifty feature, as in some usecases, trying to work with videos in different languages will result in an unexpected error just because there was no transcription in e.g. `de` and `en`. With this change, you can now create a custom priority list, e.g. `es,de,en` which will ensure different transcription languages will be (attempted to) fetched. The behaviour of defaulting to English if any languages in the list were unsuccessful to be fetched **remained unchanged**. So a list of `es,de` will have the same result as `es,de,en`. ### Added - Support for multiple language fallbacks when fetching YouTube transcriptions ### Changed - Modified the `load()` method in `YoutubeLoader` class to attempt to fetch transcripts in each configured language in priority order - Updated documentation for `YOUTUBE_LOADER_LANGUAGE` config option to clarify the new behavior: https://github.com/open-webui/docs/pull/528 ### Fixed - Fixed issue #13309 and #1960 where YouTube transcription would fail if not available in the primary language, even if it was available in another supported language ### Additional Information - This enhancement makes it easier to work with multilingual YouTube content by allowing admins to configure a priority list of languages to try when fetching transcriptions. - The change is fully backward compatible with existing configurations, as single language settings continue to work as before. - References: - Issue #13309: https://github.com/open-webui/open-webui/issues/13309 - Issue #1960: https://github.com/open-webui/open-webui/issues/1960 ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree tothe [Contributor License Agreement (CLA)](https://github.com/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 21:01:29 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#46273