[PR #7025] [MERGED] fix: Enable RAG_WEB_SEARCH_CONCURRENT_REQUESTS #8799

Closed
opened 2025-11-11 18:06:18 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/7025
Author: @yeounhak
Created: 11/19/2024
Status: Merged
Merged: 11/19/2024
Merged by: @tjbck

Base: devHead: dev


📝 Commits (1)

  • 8c161c7 Enable RAG_WEB_SEARCH_CONCURRENT_REQUESTS with asynchronous optimization for improved performance

📊 Changes

1 file changed (+4 additions, -2 deletions)

View changed files

📝 backend/open_webui/apps/retrieval/main.py (+4 -2)

📄 Description

Pull Request Checklist

Before submitting, make sure you've checked the following:

  • Target branch: Please verify that the pull request targets the dev branch.
  • Description: Provide a concise description of the changes made in this pull request.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Have you written and run sufficient tests for validating the changes?
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Prefix: To cleary categorize this pull request, prefix the pull request title, using one of the following:
    • fix: Bug fix or error correction

Changelog Entry

Description

  • The RAG_WEB_SEARCH_CONCURRENT_REQUESTS environment variable is not passed as the requests_per_second parameter in get_web_loader, causing it to not function properly.
  • According to Langchain code here, requests_per_second is used as an asyncio.Semaphore and is only effective in loader.aload(). However, since loader.load() is being used instead, it does not work concurrently and is therefore slow. For example, loading 10 sites takes 13.9 seconds.

Test

Setting RAG_WEB_SEARCH_CONCURRENT_REQUESTS to 10 and loading 10 sites reduced the time from 13.9s to 2.3s, making it 6 times faster.
Screenshot_20241118_205844_Chrome


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/7025 **Author:** [@yeounhak](https://github.com/yeounhak) **Created:** 11/19/2024 **Status:** ✅ Merged **Merged:** 11/19/2024 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `dev` --- ### 📝 Commits (1) - [`8c161c7`](https://github.com/open-webui/open-webui/commit/8c161c797b5c714f3845b04a4a56b8bf577ff07b) Enable RAG_WEB_SEARCH_CONCURRENT_REQUESTS with asynchronous optimization for improved performance ### 📊 Changes **1 file changed** (+4 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/apps/retrieval/main.py` (+4 -2) </details> ### 📄 Description # Pull Request Checklist **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [x] **Documentation:** Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources? - [x] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [x] **Testing:** Have you written and run sufficient tests for validating the changes? - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Prefix:** To cleary categorize this pull request, prefix the pull request title, using one of the following: - **fix**: Bug fix or error correction # Changelog Entry ### Description - The RAG_WEB_SEARCH_CONCURRENT_REQUESTS environment variable is not passed as the requests_per_second parameter in get_web_loader, causing it to not function properly. - According to Langchain code [here](https://github.com/langchain-ai/langchain/blob/d9d689572a23b590b1134d30740b50b5e553c8ef/libs/community/langchain_community/document_loaders/web_base.py#L247), requests_per_second is used as an asyncio.Semaphore and is only effective in loader.aload(). However, since loader.load() is being used instead, it does not work concurrently and is therefore slow. For example, loading 10 sites takes 13.9 seconds. ### Test Setting RAG_WEB_SEARCH_CONCURRENT_REQUESTS to 10 and loading 10 sites reduced the time from 13.9s to 2.3s, making it 6 times faster. ![Screenshot_20241118_205844_Chrome](https://github.com/user-attachments/assets/6192ec50-1115-4edf-a6f9-0321108c5d36) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-11 18:06:18 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#8799