[GH-ISSUE #20053] issue: Embedding Batch Size setting is ignored for SentenceTransformers (Local Embedding), causing high memory usage #34600

New Issue

GiteaMirror · 2026-04-25T08:39:36-05:00

GiteaMirror commented

2026-04-25 08:39:36 -05:00

Originally created by @taka817123 on GitHub (Dec 20, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20053

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.6.41

Ollama Version (if applicable)

No response

Operating System

Windows 11

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When using the default local embedding engine (SentenceTransformers), the Embedding Batch Size setting in Admin Settings > Documents should be respected.
Specifically, the [batch_size] parameter should be passed to the [embedding_function.encode] method in the backend. This allows users to lower the batch size (e.g., to 1 or 2) to reduce VRAM/RAM usage, especially when using large context models or running on hardware with limited memory.

Additionally, the Embedding Batch Size setting UI should be visible when the engine is set to Default (SentenceTransformers).

Actual Behavior

Backend Issue: The Embedding Batch Size setting is ignored. The backend code in [utils.py] calls [embedding_function.encode] without passing the [batch_size] argument. Consequently, sentence-transformers uses its default batch size (usually 32).

This causes massive RAM/VRAM spikes when embedding documents, especially with models that support long contexts (e.g., ModernBERT with 8k context) or when chunk sizes are large.

Steps to Reproduce

Embedding Model Engine is set to "Default (SentenceTransformers)".

Upload a large document (or many documents) to the Knowledge Base.
Monitor RAM/VRAM usage. The system attempts to process embeddings with the default batch size (32), causing high memory consumption regardless of any user configuration attempts.

Logs & Screenshots

Code Analysis:

In [utils.py] looks like this:

    if embedding_engine == "":
        # Sentence transformers: CPU-bound sync operation
        async def async_embedding_function(query, prefix=None, user=None):
            return await asyncio.to_thread(
                (
                    lambda query, prefix=None: embedding_function.encode(
                        query,
                        # MISSING batch_size argument here!
                        **({"prompt": prefix} if prefix else {}),
                    ).tolist()
                ),

It should be:

                    lambda query, prefix=None: embedding_function.encode(
                        query,
                        batch_size=int(embedding_batch_size), # Fix
                        **({"prompt": prefix} if prefix else {}),
                    ).tolist()

Additional Information

No response

Originally created by @taka817123 on GitHub (Dec 20, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/20053 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.6.41 ### Ollama Version (if applicable) _No response_ ### Operating System Windows 11 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When using the default local embedding engine (SentenceTransformers), the Embedding Batch Size setting in Admin Settings > Documents should be respected. Specifically, the [batch_size] parameter should be passed to the [embedding_function.encode] method in the backend. This allows users to lower the batch size (e.g., to 1 or 2) to reduce VRAM/RAM usage, especially when using large context models or running on hardware with limited memory. Additionally, the Embedding Batch Size setting UI should be visible when the engine is set to Default (SentenceTransformers). ### Actual Behavior Backend Issue: The Embedding Batch Size setting is ignored. The backend code in [utils.py] calls [embedding_function.encode] without passing the [batch_size] argument. Consequently, sentence-transformers uses its default batch size (usually 32). This causes massive RAM/VRAM spikes when embedding documents, especially with models that support long contexts (e.g., ModernBERT with 8k context) or when chunk sizes are large. ### Steps to Reproduce Embedding Model Engine is set to "Default (SentenceTransformers)". Upload a large document (or many documents) to the Knowledge Base. Monitor RAM/VRAM usage. The system attempts to process embeddings with the default batch size (32), causing high memory consumption regardless of any user configuration attempts. ### Logs & Screenshots <img width="336" height="340" alt="Image" src="https://github.com/user-attachments/assets/d7d1b22b-e941-445d-b987-5c2a4de12da3" /> Code Analysis: In [utils.py] looks like this: ```C if embedding_engine == "": # Sentence transformers: CPU-bound sync operation async def async_embedding_function(query, prefix=None, user=None): return await asyncio.to_thread( ( lambda query, prefix=None: embedding_function.encode( query, # MISSING batch_size argument here! **({"prompt": prefix} if prefix else {}), ).tolist() ), ``` It should be: ```C lambda query, prefix=None: embedding_function.encode( query, batch_size=int(embedding_batch_size), # Fix **({"prompt": prefix} if prefix else {}), ).tolist() ``` ### Additional Information _No response_

GiteaMirror added the bug confirmed issue labels 2026-04-25 08:39:36 -05:00

GiteaMirror closed this issue

2026-04-25 08:39:38 -05:00

GiteaMirror commented

2026-04-25 08:39:40 -05:00

@owui-terminator[bot] commented on GitHub (Dec 20, 2025):

🔍 Similar Issues Found

I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:

#19749 issue: Embedding model not working (“NoneType has no attribute encode”) when using local SentenceTransformers (engine="")
by tar-s • Dec 04, 2025 • bug
#19867 issue:Memory Leak in Attach Web Page Function Due to Null Bytes in Postgres Embeddings
by fgonzalez-glmc • Dec 10, 2025 • bug
#19723 issue: "Async Embedding Processing" does not seem to have an effect
by Elettrotecnica • Dec 03, 2025 • bug
#19474 issue: Embeddings using API not working
by curious-broccoli • Nov 25, 2025 • bug
#19421 issue: save embedding to vector DB freezes the whole application
by FBH93 • Nov 24, 2025 • bug

Show 5 more related issues

#19281 issue: RAG Template applied with "Bypass Embedding and Retrieval" enabled
by lucyknada • Nov 19, 2025 • bug
#16389 issue: embeddings based on OpenAI-compatible APIs are broken
by MattBash17 • Aug 08, 2025 • bug
#17845 issue: web search too slow / generating embeddings for 10.000+ chunks
by tfriedel • Sep 28, 2025 • bug
#17699 issue: Generating embeddings two time for one file
by koddev • Sep 24, 2025 • bug
#16158 issue: Processing does not continue after open_webui.retrieval.utils:generate_openai_batch_embeddings call
by BAngelis • Jul 30, 2025 • bug

💡 Tips:

If this is a duplicate, please consider closing this issue and adding any additional details to the existing one
If you found a solution in any of these issues, please share it here to help others

This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.

@owui-terminator[bot] commented on GitHub (Dec 20, 2025): 🔍 **Similar Issues Found** I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions: 1. [#19749](https://github.com/open-webui/open-webui/issues/19749) **issue: Embedding model not working (“NoneType has no attribute encode”) when using local SentenceTransformers (engine="")** *by tar-s • Dec 04, 2025 • `bug`* 2. [#19867](https://github.com/open-webui/open-webui/issues/19867) **issue:Memory Leak in Attach Web Page Function Due to Null Bytes in Postgres Embeddings** *by fgonzalez-glmc • Dec 10, 2025 • `bug`* 3. [#19723](https://github.com/open-webui/open-webui/issues/19723) **issue: "Async Embedding Processing" does not seem to have an effect** *by Elettrotecnica • Dec 03, 2025 • `bug`* 4. [#19474](https://github.com/open-webui/open-webui/issues/19474) **issue: Embeddings using API not working** *by curious-broccoli • Nov 25, 2025 • `bug`* 5. [#19421](https://github.com/open-webui/open-webui/issues/19421) **issue: save embedding to vector DB freezes the whole application** *by FBH93 • Nov 24, 2025 • `bug`* <details> <summary>Show 5 more related issues</summary> 6. [#19281](https://github.com/open-webui/open-webui/issues/19281) **issue: RAG Template applied with "Bypass Embedding and Retrieval" enabled** *by lucyknada • Nov 19, 2025 • `bug`* 7. [#16389](https://github.com/open-webui/open-webui/issues/16389) **issue: embeddings based on OpenAI-compatible APIs are broken** *by MattBash17 • Aug 08, 2025 • `bug`* 8. [#17845](https://github.com/open-webui/open-webui/issues/17845) **issue: web search too slow / generating embeddings for 10.000+ chunks** *by tfriedel • Sep 28, 2025 • `bug`* 9. [#17699](https://github.com/open-webui/open-webui/issues/17699) **issue: Generating embeddings two time for one file** *by koddev • Sep 24, 2025 • `bug`* 10. [#16158](https://github.com/open-webui/open-webui/issues/16158) **issue: Processing does not continue after open_webui.retrieval.utils:generate_openai_batch_embeddings call** *by BAngelis • Jul 30, 2025 • `bug`* </details> --- 💡 **Tips:** - If this is a duplicate, please consider closing this issue and adding any additional details to the existing one - If you found a solution in any of these issues, please share it here to help others *This comment was generated automatically by a bot.* Please react with a 👍 if this comment was helpful, or a 👎 if it was not.

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#34600