[PR #4701] [MERGED] fix: RAG with OpenAI embedding models and batch_size environment variable fails silently #8336

Closed
opened 2025-11-11 17:51:20 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/4701
Author: @ndrsfel
Created: 8/18/2024
Status: Merged
Merged: 8/18/2024
Merged by: @tjbck

Base: devHead: fix-rag-embedding-openai-batch-size-environment-variable


📝 Commits (1)

📊 Changes

1 file changed (+1 additions, -1 deletions)

View changed files

📝 backend/config.py (+1 -1)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.

Before submitting, make sure you've checked the following:

  • Target branch: Please verify that the pull request targets the dev branch.
  • Description: Provide a concise description of the changes made in this pull request.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Have you written and run sufficient tests for validating the changes?
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Prefix: To cleary categorize this pull request, prefix the pull request title, using one of the following:
    • BREAKING CHANGE: Significant changes that may affect compatibility
    • build: Changes that affect the build system or external dependencies
    • ci: Changes to our continuous integration processes or workflows
    • chore: Refactor, cleanup, or other non-functional code changes
    • docs: Documentation update or addition
    • feat: Introduces a new feature or enhancement to the codebase
    • fix: Bug fix or error correction
    • i18n: Internationalization or localization changes
    • perf: Performance improvement
    • refactor: Code restructuring for better maintainability, readability, or scalability
    • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.)
    • test: Adding missing tests or correcting existing tests
    • WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

RAG with OpenAI embedding models using a custom batch_size via environment variable RAG_EMBEDDING_OPENAI_BATCH_SIZE fails silently with error

ERROR [apps.rag.main] 'str' object cannot be interpreted as an integer
Traceback (most recent call last):
  File "/workspaces/open-webui/backend/apps/rag/main.py", line 998, in store_docs_in_vector_db
    embeddings = embedding_func(embedding_texts)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/open-webui/backend/apps/rag/utils.py", line 237, in <lambda>
    return lambda query: generate_multiple(query, func)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/open-webui/backend/apps/rag/utils.py", line 229, in generate_multiple
    for i in range(0, len(query), batch_size):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'str' object cannot be interpreted as an integer

because the batch_size variable is stored as a Python string in the config. I've included a fix to convert the RAG_EMBEDDING_OPENAI_BATCH_SIZE to an integer.

Fixed

  • fixed an error where RAG using OpenAI embedding models with RAG_EMBEDDING_OPENAI_BATCH_SIZE failes silently.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/4701 **Author:** [@ndrsfel](https://github.com/ndrsfel) **Created:** 8/18/2024 **Status:** ✅ Merged **Merged:** 8/18/2024 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `fix-rag-embedding-openai-batch-size-environment-variable` --- ### 📝 Commits (1) - [`0980066`](https://github.com/open-webui/open-webui/commit/098006636393fc3be5cced43cc8ea2ceeaefd2bd) fix conversion ### 📊 Changes **1 file changed** (+1 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `backend/config.py` (+1 -1) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) and describe your changes before submitting a pull request. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [ ] **Documentation:** Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources? - [ ] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [x] **Testing:** Have you written and run sufficient tests for validating the changes? - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Prefix:** To cleary categorize this pull request, prefix the pull request title, using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work # Changelog Entry ### Description RAG with OpenAI embedding models using a custom `batch_size` via environment variable `RAG_EMBEDDING_OPENAI_BATCH_SIZE` fails silently with error ```txt ERROR [apps.rag.main] 'str' object cannot be interpreted as an integer Traceback (most recent call last): File "/workspaces/open-webui/backend/apps/rag/main.py", line 998, in store_docs_in_vector_db embeddings = embedding_func(embedding_texts) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspaces/open-webui/backend/apps/rag/utils.py", line 237, in <lambda> return lambda query: generate_multiple(query, func) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspaces/open-webui/backend/apps/rag/utils.py", line 229, in generate_multiple for i in range(0, len(query), batch_size): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: 'str' object cannot be interpreted as an integer ``` because the `batch_size` variable is stored as a Python `string` in the config. I've included a fix to convert the `RAG_EMBEDDING_OPENAI_BATCH_SIZE` to an integer. ### Fixed - fixed an error where RAG using OpenAI embedding models with `RAG_EMBEDDING_OPENAI_BATCH_SIZE` failes silently. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-11 17:51:20 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#8336