[PR #19296] [MERGED] perf: 50x performance improvement for external embeddings #64024

Closed
opened 2026-05-06 09:17:24 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/19296
Author: @Classic298
Created: 11/19/2025
Status: Merged
Merged: 11/23/2025
Merged by: @tjbck

Base: devHead: embedding-perf


📝 Commits (3)

  • 00ef239 Update utils.py (#77)
  • 6e13c66 refactor: address code review feedback for embedding performance improvements (#92)
  • cf85bec fix: prevent sentence transformers from blocking async event loop (#95)

📊 Changes

4 files changed (+289 additions, -118 deletions)

View changed files

📝 backend/open_webui/retrieval/utils.py (+224 -62)
📝 backend/open_webui/routers/memories.py (+24 -12)
📝 backend/open_webui/routers/retrieval.py (+15 -13)
📝 backend/open_webui/utils/middleware.py (+26 -31)

📄 Description

  • Target branch: Verify that the pull request targets the dev branch. Not targeting the dev branch will lead to immediate closure of the PR.
  • Description: Provide a concise description of the changes made in this pull request down below.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: If necessary, update relevant documentation Open WebUI Docs like environment variables, the tutorials, or other documentation sources.
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Take this as an opportunity to make screenshots of the feature/fix and include it in the PR description.
  • Agentic AI Code: Confirm this Pull Request is not written by any AI Agent or has at least gone through additional human review AND manual testing. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR.
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Title Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • perf: Performance improvement

Changelog Entry

Description

  • Processes embedding requests to external APIs (OpenAI, Azure OpenAI, Ollama) in parallel instead of sequential batches.

Before:

  • Sequential batch processing: Batch 1 → wait → Batch 2 → wait → Batch 3 → ... → Batch N
  • For 6000 chunks with batch_size=1: 6000 sequential HTTP requests (!!!)
  • Total time ≈ 6000 × request_latency (e.g., 6000 × 50ms = 5 minutes)

After:

  • Parallel batch processing: All batches sent simultaneously via asyncio.gather()
  • For 6000 chunks with batch_size=1: All 6000 requests execute in parallel
  • Total time ≈ 1 × request_latency + overhead (e.g., <10 seconds for 6000 chunks)
  • ~30-50x speed improvement observed in production

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/19296 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 11/19/2025 **Status:** ✅ Merged **Merged:** 11/23/2025 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `embedding-perf` --- ### 📝 Commits (3) - [`00ef239`](https://github.com/open-webui/open-webui/commit/00ef2399b20bdc574d06928ce58493183ba79a63) Update utils.py (#77) - [`6e13c66`](https://github.com/open-webui/open-webui/commit/6e13c66f585a4eb9e6b205abebf7cab572950d92) refactor: address code review feedback for embedding performance improvements (#92) - [`cf85bec`](https://github.com/open-webui/open-webui/commit/cf85bec45a3996c63b56412a372569ca72fb6e02) fix: prevent sentence transformers from blocking async event loop (#95) ### 📊 Changes **4 files changed** (+289 additions, -118 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/utils.py` (+224 -62) 📝 `backend/open_webui/routers/memories.py` (+24 -12) 📝 `backend/open_webui/routers/retrieval.py` (+15 -13) 📝 `backend/open_webui/utils/middleware.py` (+26 -31) </details> ### 📄 Description - [X] **Target branch:** Verify that the pull request targets the `dev` branch. **Not targeting the `dev` branch will lead to immediate closure of the PR.** - [X] **Description:** Provide a concise description of the changes made in this pull request down below. - [X] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [X] **Documentation:** If necessary, update relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs) like environment variables, the tutorials, or other documentation sources. - [X] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [X] **Testing:** Perform manual tests to **verify the implemented fix/feature works as intended AND does not break any other functionality**. Take this as an opportunity to **make screenshots of the feature/fix and include it in the PR description**. - [X] **Agentic AI Code:** Confirm this Pull Request is **not written by any AI Agent** or has at least **gone through additional human review AND manual testing**. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR. - [X] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [X] **Title Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **perf**: Performance improvement # Changelog Entry ### Description - Processes embedding requests to external APIs (OpenAI, Azure OpenAI, Ollama) in parallel instead of sequential batches. **Before:** - Sequential batch processing: Batch 1 → wait → Batch 2 → wait → Batch 3 → ... → Batch N - For 6000 chunks with `batch_size=1`: 6000 **sequential** HTTP requests (!!!) - Total time ≈ 6000 × request_latency (e.g., 6000 × 50ms = 5 minutes) **After:** - Parallel batch processing: All batches sent simultaneously via `asyncio.gather()` - For 6000 chunks with `batch_size=1`: All 6000 requests execute in parallel - Total time ≈ 1 × request_latency + overhead (e.g., <10 seconds for 6000 chunks) - ~30-50x speed improvement observed in production --- ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-06 09:17:24 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#64024