[PR #8231] [CLOSED] feat: Add YouTube Video Ingestion Support in Knowledge Base subsystem #61041

Closed
opened 2026-05-06 04:17:09 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/8231
Author: @juananpe
Created: 12/31/2024
Status: Closed

Base: devHead: addyoutube


📝 Commits (3)

  • ba3fe33 feat: Add YouTube Video Ingestion Support in Knowledge Base subsystem
  • 49f4621 feat: Update CitationsModal to use source_url for YouTube documents and file_id for others
  • f0f7a56 fix: improve YouTube transcript handling for manual captions

📊 Changes

8 files changed (+465 additions, -41 deletions)

View changed files

📝 backend/open_webui/retrieval/loaders/youtube.py (+107 -8)
📝 backend/open_webui/routers/retrieval.py (+203 -28)
📝 src/lib/apis/retrieval/index.ts (+3 -2)
📝 src/lib/components/chat/Messages/CitationsModal.svelte (+3 -1)
📝 src/lib/components/workspace/Knowledge/KnowledgeBase.svelte (+53 -1)
📝 src/lib/components/workspace/Knowledge/KnowledgeBase/AddContentMenu.svelte (+10 -0)
src/lib/components/workspace/Knowledge/KnowledgeBase/AddYoutubeModal.svelte (+83 -0)
📝 src/lib/i18n/locales/en-US/translation.json (+3 -1)

📄 Description

Pull Request Checklist

  • Target branch: Please verify that the pull request targets the dev branch.
  • Description: Provide a concise description of the changes made in this pull request.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Prefix: To cleary categorize this pull request, prefix the pull request title, using one of the following:

Changelog Entry

Added support for ingesting YouTube videos into the knowledge base, allowing users to add videos by URL and automatically retrieve transcriptions with timestamps for deep linking.

Description

This PR implements support for ingesting YouTube videos into the knowledge base. Users can now add YouTube videos by their URL, and the system will automatically retrieve the video's transcription, process it, and store it in ChromaDB with timestamps for deep linking.

Key Features:

  • New "Add YouTube URL" option in the content menu
  • Automatic video title and transcript retrieval
  • Transcript splitting with preserved timestamp information
  • Deep linking support by adding timestamp parameters to YouTube URLs
  • Integration with existing knowledge base management system
  • Fallback mechanisms for transcript language selection
  • Error handling for invalid URLs and failed transcript retrievals

Technical Details:

  • Updated YouTube loader with metadata extraction capabilities
  • Implemented timestamp interpolation for chunked documents
  • Added new UI components (AddYoutubeModal)
  • Updated vector DB storage to preserve timestamp information

UI Changes:

  • Added new "Add YouTube URL" option in the content menu
  • New modal dialog for YouTube URL input
  • Integration with existing file processing status indicators

Aditional details

This feature has been asked in https://github.com/open-webui/open-webui/discussions/6333

Screenshots or Videos

https://github.com/user-attachments/assets/5c8468ee-7e06-4e0e-a5cf-d3c32b74299a


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/8231 **Author:** [@juananpe](https://github.com/juananpe) **Created:** 12/31/2024 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `addyoutube` --- ### 📝 Commits (3) - [`ba3fe33`](https://github.com/open-webui/open-webui/commit/ba3fe33ef85166c12db4d1000722cdf34ea03c20) feat: Add YouTube Video Ingestion Support in Knowledge Base subsystem - [`49f4621`](https://github.com/open-webui/open-webui/commit/49f462163ae34e4e5e4c4d98948c4eafe1cdfce7) feat: Update CitationsModal to use source_url for YouTube documents and file_id for others - [`f0f7a56`](https://github.com/open-webui/open-webui/commit/f0f7a56c480e50b0472f7faafd5e561c6355398f) fix: improve YouTube transcript handling for manual captions ### 📊 Changes **8 files changed** (+465 additions, -41 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/loaders/youtube.py` (+107 -8) 📝 `backend/open_webui/routers/retrieval.py` (+203 -28) 📝 `src/lib/apis/retrieval/index.ts` (+3 -2) 📝 `src/lib/components/chat/Messages/CitationsModal.svelte` (+3 -1) 📝 `src/lib/components/workspace/Knowledge/KnowledgeBase.svelte` (+53 -1) 📝 `src/lib/components/workspace/Knowledge/KnowledgeBase/AddContentMenu.svelte` (+10 -0) ➕ `src/lib/components/workspace/Knowledge/KnowledgeBase/AddYoutubeModal.svelte` (+83 -0) 📝 `src/lib/i18n/locales/en-US/translation.json` (+3 -1) </details> ### 📄 Description # Pull Request Checklist - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [x] **Prefix:** To cleary categorize this pull request, prefix the pull request title, using one of the following: # Changelog Entry Added support for ingesting YouTube videos into the knowledge base, allowing users to add videos by URL and automatically retrieve transcriptions with timestamps for deep linking. ### Description This PR implements support for ingesting YouTube videos into the knowledge base. Users can now add YouTube videos by their URL, and the system will automatically retrieve the video's transcription, process it, and store it in ChromaDB with timestamps for deep linking. Key Features: - New "Add YouTube URL" option in the content menu - Automatic video title and transcript retrieval - Transcript splitting with preserved timestamp information - Deep linking support by adding timestamp parameters to YouTube URLs - Integration with existing knowledge base management system - Fallback mechanisms for transcript language selection - Error handling for invalid URLs and failed transcript retrievals Technical Details: - Updated YouTube loader with metadata extraction capabilities - Implemented timestamp interpolation for chunked documents - Added new UI components (AddYoutubeModal) - Updated vector DB storage to preserve timestamp information UI Changes: - Added new "Add YouTube URL" option in the content menu - New modal dialog for YouTube URL input - Integration with existing file processing status indicators ### Aditional details This feature has been asked in https://github.com/open-webui/open-webui/discussions/6333 ### Screenshots or Videos https://github.com/user-attachments/assets/5c8468ee-7e06-4e0e-a5cf-d3c32b74299a --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-06 04:17:09 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#61041