enhancement: non-english youtube rag #804

Closed
opened 2025-11-11 14:31:35 -06:00 by GiteaMirror · 5 comments
Owner

Originally created by @atassis on GitHub (May 4, 2024).

Originally assigned to: @tjbck on GitHub.

Bug Report

Description

Bug Summary:
If you use RAG for youtube without english translation, but with another language provided- request fails with an error from youtube-transcript-api

Steps to Reproduce:
try to execute next prompt #https://www.youtube.com/watch?v=FuRem6-sTmQ

Expected Behavior:
prefetch the list of languages of the video from the same package with

from youtube_transcript_api import YouTubeTranscriptApi

transcript_list = YouTubeTranscriptApi.list_transcripts(YoutubeLoader.extract_video_id(video_url)
languages = list(enumerate(transcript_list))
print(languages[0][1].language_code)

Actual Behavior:
See the toast error after several seconds.

Environment

  • Open WebUI Version: 0.1.123

  • Ollama (if applicable): 0.1.32

  • Operating System: Fedora 39 (both client and server, different machines)

  • Browser (if applicable): Chrome (version doesn't matter)

Reproduction Details

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
I'll provide if needed, seems irrelevant

Docker Container Logs:
I'll provide if needed, seems irrelevant

Screenshots (if applicable):
image

Installation Method

Docker

Additional Information

The problem is that YoutubeLoader from langchain doesn't handle by himselves the verification that requested (or default 'en') language is provided by the video. We need to handle it by ourselves.

Originally created by @atassis on GitHub (May 4, 2024). Originally assigned to: @tjbck on GitHub. # Bug Report ## Description **Bug Summary:** If you use RAG for youtube without english translation, but with another language provided- request fails with an error from `youtube-transcript-api` **Steps to Reproduce:** try to execute next prompt `#https://www.youtube.com/watch?v=FuRem6-sTmQ` **Expected Behavior:** prefetch the list of languages of the video from the same package with ```python from youtube_transcript_api import YouTubeTranscriptApi transcript_list = YouTubeTranscriptApi.list_transcripts(YoutubeLoader.extract_video_id(video_url) languages = list(enumerate(transcript_list)) print(languages[0][1].language_code) ``` **Actual Behavior:** See the toast error after several seconds. ## Environment - **Open WebUI Version:** 0.1.123 - **Ollama (if applicable):** 0.1.32 - **Operating System:** Fedora 39 (both client and server, different machines) - **Browser (if applicable):** Chrome (version doesn't matter) ## Reproduction Details **Confirmation:** - [X] I have read and followed all the instructions provided in the README.md. - [X] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included the browser console logs. - [ ] I have included the Docker container logs. ## Logs and Screenshots **Browser Console Logs:** I'll provide if needed, seems irrelevant **Docker Container Logs:** I'll provide if needed, seems irrelevant **Screenshots (if applicable):** ![image](https://github.com/open-webui/open-webui/assets/5769345/f63f054d-87ed-4b81-88aa-160822c89171) ## Installation Method Docker ## Additional Information The problem is that YoutubeLoader from langchain doesn't handle by himselves the verification that requested (or default `'en'`) language is provided by the video. We need to handle it by ourselves.
Author
Owner

@grigio commented on GitHub (May 9, 2024):

@tjbck
Screenshot from 2024-05-09 15-59-15

I've updated but I always get a similar error

@grigio commented on GitHub (May 9, 2024): @tjbck ![Screenshot from 2024-05-09 15-59-15](https://github.com/open-webui/open-webui/assets/8074/ef2076dd-abf4-4951-98c1-5ab1184bec5f) I've updated but I always get a similar error
Author
Owner

@justinh-rahb commented on GitHub (May 9, 2024):

@grigio You're still requesting en, change this to it:

Screenshot 2024-05-09 at 10 17 33 AM

@justinh-rahb commented on GitHub (May 9, 2024): @grigio You're still requesting `en`, change this to `it`: ![Screenshot 2024-05-09 at 10 17 33 AM](https://github.com/open-webui/open-webui/assets/52832301/1d70eaf3-4873-4974-afff-9dcbc3db43a5)
Author
Owner

@grigio commented on GitHub (May 9, 2024):

@grigio You're still requesting en, change this to it:

Thanks, it works. I was thinking that it was managed by the UI language

@grigio commented on GitHub (May 9, 2024): > @grigio You're still requesting `en`, change this to `it`: > Thanks, it works. I was thinking that it was managed by the UI language
Author
Owner

@atassis commented on GitHub (May 9, 2024):

What about trying to make a separate toggle with automatic language recognition? With the code suggested by me.

@atassis commented on GitHub (May 9, 2024): What about trying to make a separate toggle with automatic language recognition? With the code suggested by me.
Author
Owner

@piwawa commented on GitHub (Mar 25, 2025):

How to use this youtube rag function?

@piwawa commented on GitHub (Mar 25, 2025): How to use this youtube rag function?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#804