[PR #13111] [MERGED] perf: Concurrent processing for web search queries #46152

Closed
opened 2026-04-29 20:50:16 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/13111
Author: @tth37
Created: 4/21/2025
Status: Merged
Merged: 5/2/2025
Merged by: @tjbck

Base: devHead: perf_multi_thread_web_searching


📝 Commits (3)

  • 4b451b9 perf: Multi-thread web searching
  • 5b9c1de fix: Translatable 'Searching the web'
  • cf2d7de refac: Implicit asyncio create_task

📊 Changes

2 files changed (+25 additions, -19 deletions)

View changed files

📝 backend/open_webui/utils/middleware.py (+23 -19)
📝 src/lib/components/chat/Messages/ResponseMessage.svelte (+2 -0)

📄 Description

Problem Description

This PR addresses the same issue as #13045 -- improving the performance of multiple web search queries -- but with a simpler and cleaner approach.

Currently, the async process_web_search handler processes queries sequentially, performing searches and loading pages one at a time. This creates a bottleneck when multiple queries are generated, significantly slowing down overall performance.

Solution

Since process_web_search is an I/O-intensive operation, we can easily optimize it by parallelizing the execution using asyncio tasks and asyncio.gather. This allows multiple search queries to run concurrently, effectively reducing latency.

Breaking Change

Previously, the backend emitted individual events (e.g., "Searching {{searchQuery}}") for each query. Due to parallel processing, this is no longer feasible. Instead, we now emit a single event ("Searching the web"), aligning with common practices in other LLM web UIs.

Testing

Test Case Sequential Parallel Speedup
“what is open webui”, “how to install open webui”, “how to install open webui with docker”(max_results=5) 20.01 (5.40+9.25+5.36) 7.99 2.5x
“what is lorem”, “example website”, “vue vs svelte”(max_results=5) 20.96 (7.80+5.34+7.81) 7.58 2.8x
"svelte", "vue", "react", "angular", "nextjs", "nuxtjs", "jquery", "qwik", "astro"(max_results=1) 18.36 3.11 5.9x
“melbourne”, “sydney”, “canberra”(max_results=5) 35.01 (12.26+11.71+11.03) 13.96 2.5x

In typical cases, parallel execution introduces minimal overhead, with total latency approaching the lower-bound duration of the slowest single query.

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the CONTRIBUTOR_LICENSE_AGREEMENT, and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/13111 **Author:** [@tth37](https://github.com/tth37) **Created:** 4/21/2025 **Status:** ✅ Merged **Merged:** 5/2/2025 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `perf_multi_thread_web_searching` --- ### 📝 Commits (3) - [`4b451b9`](https://github.com/open-webui/open-webui/commit/4b451b984b930d4ab086e15882510e3358e9e4da) perf: Multi-thread web searching - [`5b9c1de`](https://github.com/open-webui/open-webui/commit/5b9c1de3d1daf8b0fa29a1789883cd7c4e71a061) fix: Translatable 'Searching the web' - [`cf2d7de`](https://github.com/open-webui/open-webui/commit/cf2d7de8737825d19f6ef8d4ed521b3506ec425f) refac: Implicit asyncio create_task ### 📊 Changes **2 files changed** (+25 additions, -19 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/utils/middleware.py` (+23 -19) 📝 `src/lib/components/chat/Messages/ResponseMessage.svelte` (+2 -0) </details> ### 📄 Description ### Problem Description This PR addresses the same issue as #13045 -- improving the performance of multiple web search queries -- but with a simpler and cleaner approach. Currently, the `async process_web_search` handler processes queries **sequentially**, performing searches and loading pages one at a time. This creates a bottleneck when multiple queries are generated, significantly slowing down overall performance. ### Solution Since `process_web_search` is an **I/O-intensive** operation, we can easily optimize it by parallelizing the execution using `asyncio` tasks and `asyncio.gather`. This allows multiple search queries to run concurrently, effectively reducing latency. ### Breaking Change Previously, the backend emitted individual events (e.g., `"Searching {{searchQuery}}"`) for each query. Due to parallel processing, this is no longer feasible. Instead, we now emit a single event (`"Searching the web"`), aligning with common practices in other LLM web UIs. ### Testing | Test Case | Sequential | Parallel | Speedup | |-----------------------|------------|----------|---------| | “what is open webui”, “how to install open webui”, “how to install open webui with docker”(max_results=5) | 20.01 (5.40+9.25+5.36) | 7.99 | 2.5x | | “what is lorem”, “example website”, “vue vs svelte”(max_results=5) | 20.96 (7.80+5.34+7.81) | 7.58 | 2.8x | | "svelte", "vue", "react", "angular", "nextjs", "nuxtjs", "jquery", "qwik", "astro"(max_results=1) | 18.36 | 3.11 | 5.9x | | “melbourne”, “sydney”, “canberra”(max_results=5) | 35.01 (12.26+11.71+11.03) | 13.96 | 2.5x | In typical cases, parallel execution introduces minimal overhead, with total latency approaching the lower-bound duration of the slowest single query. ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [CONTRIBUTOR_LICENSE_AGREEMENT](CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 20:50:16 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#46152