[GH-ISSUE #19421] issue: save embedding to vector DB freezes the whole application #34400

Closed
opened 2026-04-25 08:22:36 -05:00 by GiteaMirror · 51 comments
Owner

Originally created by @FBH93 on GitHub (Nov 24, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19421

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

0.6.38

Ollama Version (if applicable)

No response

Operating System

windows 11, but OWUI is running in docker

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

User A uploads document.

OWUI embeds and saves to vectorDB. The save to vectorDB function takes 2 minutes to complete.

While this happens, user B can use OWUI as normal.

Actual Behavior

User A uploads document.

OWUI embeds and saves to vectorDB. The save to vectorDB function takes 2 minutes to complete.

While this happens, user B can take no action, and the app is essentially frozen to their view.

All actions initiated during the freeze will happen when the save is complete.

Steps to Reproduce

Drag and drop a text file to openwebui chat window.

Observe that embedding and saving works as intended, but no other users can do any action while the saving to vector DB is ongoing.

Logs & Screenshots

Notice how there is 2 minutes between the start of the save, and the end of the save. No other logs happen during this time, in spite of other users taking several actions.

2025-11-24T12:20:52.8642231Z stdout F 2025-11-24 12:20:52.864 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1490 - adding to collection file-92492f5b-c7df-4db9-943d-5eafd3d67312
2025-11-24T12:22:12.61175 No logs since last 60 seconds
2025-11-24T12:22:57.2297948Z stdout F 2025-11-24 12:22:57.229 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1496 - added 1 items to collection file-92492f5b-c7df-4db9-943d-5eafd3d67312

Additional Information

This was not an issue before the upgrade to 0.6.37. I was on version 0.6.32.

Originally created by @FBH93 on GitHub (Nov 24, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/19421 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version 0.6.38 ### Ollama Version (if applicable) _No response_ ### Operating System windows 11, but OWUI is running in docker ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior User A uploads document. OWUI embeds and saves to vectorDB. The save to vectorDB function takes 2 minutes to complete. While this happens, user B can use OWUI as normal. ### Actual Behavior User A uploads document. OWUI embeds and saves to vectorDB. The save to vectorDB function takes 2 minutes to complete. While this happens, user B can take no action, and the app is essentially frozen to their view. All actions initiated during the freeze will happen when the save is complete. ### Steps to Reproduce Drag and drop a text file to openwebui chat window. Observe that embedding and saving works as intended, but no other users can do any action while the saving to vector DB is ongoing. ### Logs & Screenshots Notice how there is 2 minutes between the start of the save, and the end of the save. No other logs happen during this time, in spite of other users taking several actions. 2025-11-24T12:20:52.8642231Z stdout F 2025-11-24 12:20:52.864 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1490 - adding to collection file-92492f5b-c7df-4db9-943d-5eafd3d67312 2025-11-24T12:22:12.61175 No logs since last 60 seconds 2025-11-24T12:22:57.2297948Z stdout F 2025-11-24 12:22:57.229 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1496 - added 1 items to collection file-92492f5b-c7df-4db9-943d-5eafd3d67312 ### Additional Information This was not an issue before the upgrade to 0.6.37. I was on version 0.6.32.
GiteaMirror added the bug label 2026-04-25 08:22:36 -05:00
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

What setup do you use? please share detailed information re: vector db, embedding model, embedding model config and setup, how did you connect to the vector db, any concurrency? etc.

<!-- gh-comment-id:3570793985 --> @Classic298 commented on GitHub (Nov 24, 2025): What setup do you use? please share detailed information re: vector db, embedding model, embedding model config and setup, how did you connect to the vector db, any concurrency? etc.
Author
Owner

@rbsn-cpu commented on GitHub (Nov 24, 2025):

Same issue !
Embedding model : BGE M3, with Reranking

<!-- gh-comment-id:3570830683 --> @rbsn-cpu commented on GitHub (Nov 24, 2025): Same issue ! Embedding model : BGE M3, with Reranking
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

@rbsn-cpu .38 or .37?

and more info is wanted What setup do you use? please share detailed information re: vector db, embedding model, embedding model config and setup, how did you connect to the vector db, any concurrency? etc.

<!-- gh-comment-id:3570840716 --> @Classic298 commented on GitHub (Nov 24, 2025): @rbsn-cpu .38 or .37? and more info is wanted What setup do you use? please share detailed information re: vector db, embedding model, embedding model config and setup, how did you connect to the vector db, any concurrency? etc.
Author
Owner

@scheatkode commented on GitHub (Nov 24, 2025):

Not sure this is relevant but here's additional information from #19423 in case this is somehow related.

Expected Behavior

Embedding works.

Actual Behavior

Embedding doesn't work, we get a IndexError: list index out of range because the embedding process isn't handling 429 Too Many Requests gracefully with exponential backoff or otherwise; therefore the embeddings list doesn't hold enough items.

This is likely a regression from #19296.

Steps to Reproduce

Using llama-swap:

  Qwen/Qwen3-Embedding-0.6B:
    description: Small, fast, accurate embedding model
    macros:
      default_ctx: 20480 # 20k
    ttl: 120
    cmd: |
      llama.cpp/llama-server
      --swa-full
      --ctx-size ${default_ctx}
      --flash-attn on
      --device Vulkan1
      --batch-size 512
      --ubatch-size 2048
      --parallel 10
      --hf-repo Qwen/Qwen3-Embedding-0.6B-GGUF
      --hf-file Qwen3-Embedding-0.6B-f16.gguf
      --model ./models/Qwen3-Embedding-0.6B.gguf
      --jinja
      --embedding
      --port ${PORT}

Configure Open-WebUI accordingly and run a web search. Relevant env config:

      - ENABLE_WEB_SEARCH=true
      - ENABLE_RAG_WEB_SEARCH=true
      - RAG_EMBEDDING_ENGINE=openai
      - RAG_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
      - RAG_TEXT_SPLITTER=token
      - RAG_TOP_K=5
      - RAG_TOP_K_RERANKER=5
      - RAG_WEB_SEARCH_CONCURRENT_REQUESTS=10
      - RAG_WEB_SEARCH_RESULT_COUNT=6
      - WEB_SEARCH_ENGINE=searxng
      - RAG_WEB_SEARCH_ENGINE=searxng
      - SEARXNG_QUERY_URL=http://search:8080/search?api_key=something&q=<query>

      - VECTOR_DB=pgvector
      - PGVECTOR_DB_URL=postgresql://...
      - PGVECTOR_INDEX_METHOD=hnsw
      - PGVECTOR_USE_HALFVEC=true
      - DATABASE_URL=postgresql://...

and more info is wanted What setup do you use? please share detailed information re: vector db, embedding model, embedding model config and setup, how did you connect to the vector db, any concurrency? etc.

@Classic298: I appended the requested config as well:

  • Vector DB: pgvector
  • Embedding model: Qwen3-Embedding-0.6B
  • Connection: Using env variables pointing to pgvector
  • Concurrency: None that I'm aware of except the suspected regression from #19296.

Logs & Screenshots

chat-1               | 2025-11-24 13:32:49.020 | ERROR    | open_webui.retrieval.utils:agenerate_openai_batch_embeddings:608 - Error generating openai batch embeddings: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings'
chat-1               | Traceback (most recent call last):
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
chat-1               |     self._bootstrap_inner()
chat-1               |     │    └ <function Thread._bootstrap_inner at 0x7f820d4449a0>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140192631486144)>
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
chat-1               |     self.run()
chat-1               |     │    └ <function WorkerThread.run at 0x7f81c541df80>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140192631486144)>
chat-1               |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run
chat-1               |     result = context.run(func, *args)
chat-1               |              │       │   │      └ ()
chat-1               |              │       │   └ functools.partial(<function save_docs_to_vector_db at 0x7f81c76e31a0>, <starlette.requests.Request object at 0x7f81c328cbd0>,...
chat-1               |              │       └ <method 'run' of '_contextvars.Context' objects>
chat-1               |              └ <_contextvars.Context object at 0x7f820ac02680>
chat-1               |
chat-1               |   File "/app/backend/open_webui/routers/retrieval.py", line 1472, in save_docs_to_vector_db
chat-1               |     embeddings = asyncio.run(
chat-1               |                  │       └ <function run at 0x7f820caf53a0>
chat-1               |                  └ <module 'asyncio' from '/usr/local/lib/python3.11/asyncio/__init__.py'>
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
chat-1               |     return runner.run(main)
chat-1               |            │      │   └ <coroutine object get_embedding_function.<locals>.async_embedding_function at 0x7f8123110040>
chat-1               |            │      └ <function Runner.run at 0x7f820c96cf40>
chat-1               |            └ <asyncio.runners.Runner object at 0x7f8154245950>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
chat-1               |     return self._loop.run_until_complete(task)
chat-1               |            │    │     │                  └ <Task pending name='Task-372' coro=<get_embedding_function.<locals>.async_embedding_function() running at /app/backend/open_w...
chat-1               |            │    │     └ <function BaseEventLoop.run_until_complete at 0x7f820c96ab60>
chat-1               |            │    └ <_UnixSelectorEventLoop running=True closed=False debug=False>
chat-1               |            └ <asyncio.runners.Runner object at 0x7f8154245950>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
chat-1               |     self.run_forever()
chat-1               |     │    └ <function BaseEventLoop.run_forever at 0x7f820c96aac0>
chat-1               |     └ <_UnixSelectorEventLoop running=True closed=False debug=False>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
chat-1               |     self._run_once()
chat-1               |     │    └ <function BaseEventLoop._run_once at 0x7f820c96c900>
chat-1               |     └ <_UnixSelectorEventLoop running=True closed=False debug=False>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
chat-1               |     handle._run()
chat-1               |     │      └ <function Handle._run at 0x7f820caaeb60>
chat-1               |     └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/events.py", line 84, in _run
chat-1               |     self._context.run(self._callback, *self._args)
chat-1               |     │    │            │    │           │    └ <member '_args' of 'Handle' objects>
chat-1               |     │    │            │    │           └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |     │    │            │    └ <member '_callback' of 'Handle' objects>
chat-1               |     │    │            └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |     │    └ <member '_context' of 'Handle' objects>
chat-1               |     └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |
chat-1               |   File "/app/backend/open_webui/retrieval/utils.py", line 878, in generate_embeddings
chat-1               |     embeddings = await agenerate_openai_batch_embeddings(
chat-1               |                        └ <function agenerate_openai_batch_embeddings at 0x7f81c771d300>
chat-1               |
chat-1               | > File "/app/backend/open_webui/retrieval/utils.py", line 601, in agenerate_openai_batch_embeddings
chat-1               |     r.raise_for_status()
chat-1               |     │ └ <function ClientResponse.raise_for_status at 0x7f820a6bde40>
chat-1               |     └ <ClientResponse(http://host.docker.internal:3089/v1/embeddings) [429 Too Many Requests]>
chat-1               |       <CIMultiDictProxy('Content-Type': 't...
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 629, in raise_for_status
chat-1               |     raise ClientResponseError(
chat-1               |           └ <class 'aiohttp.client_exceptions.ClientResponseError'>
chat-1               |
chat-1               | aiohttp.client_exceptions.ClientResponseError: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings'

Later:

chat-1               | Traceback (most recent call last):
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
chat-1               |     self._bootstrap_inner()
chat-1               |     │    └ <function Thread._bootstrap_inner at 0x7f5e7fe4c9a0>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140040084645568)>
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
chat-1               |     self.run()
chat-1               |     │    └ <function WorkerThread.run at 0x7f5e37e1df80>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140040084645568)>
chat-1               |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run
chat-1               |     result = context.run(func, *args)
chat-1               |              │       │   │      └ ()
chat-1               |              │       │   └ functools.partial(<function save_docs_to_vector_db at 0x7f5e3a0e31a0>, <starlette.requests.Request object at 0x7f5e343b8fd0>,...
chat-1               |              │       └ <method 'run' of '_contextvars.Context' objects>
chat-1               |              └ <_contextvars.Context object at 0x7f5e35c95640>
chat-1               |
chat-1               | > File "/app/backend/open_webui/routers/retrieval.py", line 1486, in save_docs_to_vector_db
chat-1               |     items = [
chat-1               |
chat-1               |   File "/app/backend/open_webui/routers/retrieval.py", line 1490, in <listcomp>
chat-1               |     "vector": embeddings[idx],
chat-1               |               │          └ 11
chat-1               |               └ [[-0.024991527199745178, 0.04560912773013115, 0.0008289961260743439, -0.02457759529352188, -0.0019923101644963026, 0.04161553...
chat-1               |
chat-1               | IndexError: list index out of range

Additional Information

Current workaround: Use local SentenceTransformers.

<!-- gh-comment-id:3570903314 --> @scheatkode commented on GitHub (Nov 24, 2025): Not sure this is relevant but here's additional information from #19423 in case this is somehow related. ### Expected Behavior Embedding works. ### Actual Behavior Embedding doesn't work, we get a `IndexError: list index out of range` because the embedding process isn't handling `429 Too Many Requests` gracefully with exponential backoff or otherwise; therefore the [embeddings list](https://github.com/open-webui/open-webui/pull/19296/files#diff-65faf260bc7bceb8e36a6178928cc13d2934be3d4c842a8593dc379e929cd6eeR1471) doesn't hold enough items. This is likely a regression from #19296. ### Steps to Reproduce Using `llama-swap`: ```yaml Qwen/Qwen3-Embedding-0.6B: description: Small, fast, accurate embedding model macros: default_ctx: 20480 # 20k ttl: 120 cmd: | llama.cpp/llama-server --swa-full --ctx-size ${default_ctx} --flash-attn on --device Vulkan1 --batch-size 512 --ubatch-size 2048 --parallel 10 --hf-repo Qwen/Qwen3-Embedding-0.6B-GGUF --hf-file Qwen3-Embedding-0.6B-f16.gguf --model ./models/Qwen3-Embedding-0.6B.gguf --jinja --embedding --port ${PORT} ``` Configure Open-WebUI accordingly and run a web search. Relevant env config: ``` - ENABLE_WEB_SEARCH=true - ENABLE_RAG_WEB_SEARCH=true - RAG_EMBEDDING_ENGINE=openai - RAG_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B - RAG_TEXT_SPLITTER=token - RAG_TOP_K=5 - RAG_TOP_K_RERANKER=5 - RAG_WEB_SEARCH_CONCURRENT_REQUESTS=10 - RAG_WEB_SEARCH_RESULT_COUNT=6 - WEB_SEARCH_ENGINE=searxng - RAG_WEB_SEARCH_ENGINE=searxng - SEARXNG_QUERY_URL=http://search:8080/search?api_key=something&q=<query> - VECTOR_DB=pgvector - PGVECTOR_DB_URL=postgresql://... - PGVECTOR_INDEX_METHOD=hnsw - PGVECTOR_USE_HALFVEC=true - DATABASE_URL=postgresql://... ``` > and more info is wanted What setup do you use? please share detailed information re: vector db, embedding model, embedding model config and setup, how did you connect to the vector db, any concurrency? etc. @Classic298: I appended the requested config as well: - ✅ Vector DB: pgvector - ✅ Embedding model: Qwen3-Embedding-0.6B - ✅ Connection: Using env variables pointing to pgvector - ✅ Concurrency: None that I'm aware of except the suspected regression from #19296. ### Logs & Screenshots ``` chat-1 | 2025-11-24 13:32:49.020 | ERROR | open_webui.retrieval.utils:agenerate_openai_batch_embeddings:608 - Error generating openai batch embeddings: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings' chat-1 | Traceback (most recent call last): chat-1 | chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap chat-1 | self._bootstrap_inner() chat-1 | │ └ <function Thread._bootstrap_inner at 0x7f820d4449a0> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140192631486144)> chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner chat-1 | self.run() chat-1 | │ └ <function WorkerThread.run at 0x7f81c541df80> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140192631486144)> chat-1 | File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run chat-1 | result = context.run(func, *args) chat-1 | │ │ │ └ () chat-1 | │ │ └ functools.partial(<function save_docs_to_vector_db at 0x7f81c76e31a0>, <starlette.requests.Request object at 0x7f81c328cbd0>,... chat-1 | │ └ <method 'run' of '_contextvars.Context' objects> chat-1 | └ <_contextvars.Context object at 0x7f820ac02680> chat-1 | chat-1 | File "/app/backend/open_webui/routers/retrieval.py", line 1472, in save_docs_to_vector_db chat-1 | embeddings = asyncio.run( chat-1 | │ └ <function run at 0x7f820caf53a0> chat-1 | └ <module 'asyncio' from '/usr/local/lib/python3.11/asyncio/__init__.py'> chat-1 | chat-1 | File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run chat-1 | return runner.run(main) chat-1 | │ │ └ <coroutine object get_embedding_function.<locals>.async_embedding_function at 0x7f8123110040> chat-1 | │ └ <function Runner.run at 0x7f820c96cf40> chat-1 | └ <asyncio.runners.Runner object at 0x7f8154245950> chat-1 | File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run chat-1 | return self._loop.run_until_complete(task) chat-1 | │ │ │ └ <Task pending name='Task-372' coro=<get_embedding_function.<locals>.async_embedding_function() running at /app/backend/open_w... chat-1 | │ │ └ <function BaseEventLoop.run_until_complete at 0x7f820c96ab60> chat-1 | │ └ <_UnixSelectorEventLoop running=True closed=False debug=False> chat-1 | └ <asyncio.runners.Runner object at 0x7f8154245950> chat-1 | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete chat-1 | self.run_forever() chat-1 | │ └ <function BaseEventLoop.run_forever at 0x7f820c96aac0> chat-1 | └ <_UnixSelectorEventLoop running=True closed=False debug=False> chat-1 | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 608, in run_forever chat-1 | self._run_once() chat-1 | │ └ <function BaseEventLoop._run_once at 0x7f820c96c900> chat-1 | └ <_UnixSelectorEventLoop running=True closed=False debug=False> chat-1 | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once chat-1 | handle._run() chat-1 | │ └ <function Handle._run at 0x7f820caaeb60> chat-1 | └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | File "/usr/local/lib/python3.11/asyncio/events.py", line 84, in _run chat-1 | self._context.run(self._callback, *self._args) chat-1 | │ │ │ │ │ └ <member '_args' of 'Handle' objects> chat-1 | │ │ │ │ └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | │ │ │ └ <member '_callback' of 'Handle' objects> chat-1 | │ │ └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | │ └ <member '_context' of 'Handle' objects> chat-1 | └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | chat-1 | File "/app/backend/open_webui/retrieval/utils.py", line 878, in generate_embeddings chat-1 | embeddings = await agenerate_openai_batch_embeddings( chat-1 | └ <function agenerate_openai_batch_embeddings at 0x7f81c771d300> chat-1 | chat-1 | > File "/app/backend/open_webui/retrieval/utils.py", line 601, in agenerate_openai_batch_embeddings chat-1 | r.raise_for_status() chat-1 | │ └ <function ClientResponse.raise_for_status at 0x7f820a6bde40> chat-1 | └ <ClientResponse(http://host.docker.internal:3089/v1/embeddings) [429 Too Many Requests]> chat-1 | <CIMultiDictProxy('Content-Type': 't... chat-1 | chat-1 | File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 629, in raise_for_status chat-1 | raise ClientResponseError( chat-1 | └ <class 'aiohttp.client_exceptions.ClientResponseError'> chat-1 | chat-1 | aiohttp.client_exceptions.ClientResponseError: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings' ``` Later: ``` chat-1 | Traceback (most recent call last): chat-1 | chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap chat-1 | self._bootstrap_inner() chat-1 | │ └ <function Thread._bootstrap_inner at 0x7f5e7fe4c9a0> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140040084645568)> chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner chat-1 | self.run() chat-1 | │ └ <function WorkerThread.run at 0x7f5e37e1df80> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140040084645568)> chat-1 | File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run chat-1 | result = context.run(func, *args) chat-1 | │ │ │ └ () chat-1 | │ │ └ functools.partial(<function save_docs_to_vector_db at 0x7f5e3a0e31a0>, <starlette.requests.Request object at 0x7f5e343b8fd0>,... chat-1 | │ └ <method 'run' of '_contextvars.Context' objects> chat-1 | └ <_contextvars.Context object at 0x7f5e35c95640> chat-1 | chat-1 | > File "/app/backend/open_webui/routers/retrieval.py", line 1486, in save_docs_to_vector_db chat-1 | items = [ chat-1 | chat-1 | File "/app/backend/open_webui/routers/retrieval.py", line 1490, in <listcomp> chat-1 | "vector": embeddings[idx], chat-1 | │ └ 11 chat-1 | └ [[-0.024991527199745178, 0.04560912773013115, 0.0008289961260743439, -0.02457759529352188, -0.0019923101644963026, 0.04161553... chat-1 | chat-1 | IndexError: list index out of range ``` ### Additional Information Current workaround: Use local SentenceTransformers.
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

thanks for the logs.. but why do they say OpenAI 429 error when you use a local embedding model?

<!-- gh-comment-id:3570923125 --> @Classic298 commented on GitHub (Nov 24, 2025): thanks for the logs.. but why do they say OpenAI 429 error when you use a **local** embedding model?
Author
Owner

@scheatkode commented on GitHub (Nov 24, 2025):

I have configured the embedding engine to openai as I'm using llama.cpp/llama-server (through llama-swap but I think that's irrelevant) to expose the model to OpenWebUI. In this case 429 errors are likely because it's receiving too many requests at a time (221) and saturating the available slots (even with configured parallelism at 10).

Relevant logs from llama-server:

srv          init: initializing slots, n_slots = 10
slot         init: id  0 | task -1 | new slot, n_ctx = 2048
slot         init: id  1 | task -1 | new slot, n_ctx = 2048
slot         init: id  2 | task -1 | new slot, n_ctx = 2048
slot         init: id  3 | task -1 | new slot, n_ctx = 2048
slot         init: id  4 | task -1 | new slot, n_ctx = 2048
slot         init: id  5 | task -1 | new slot, n_ctx = 2048
slot         init: id  6 | task -1 | new slot, n_ctx = 2048
slot         init: id  7 | task -1 | new slot, n_ctx = 2048
slot         init: id  8 | task -1 | new slot, n_ctx = 2048
slot         init: id  9 | task -1 | new slot, n_ctx = 2048
srv          init: prompt cache is enabled, size limit: 8192 MiB
<!-- gh-comment-id:3570962074 --> @scheatkode commented on GitHub (Nov 24, 2025): I have configured the embedding engine to `openai` as I'm using `llama.cpp/llama-server` (through `llama-swap` but I think that's irrelevant) to expose the model to OpenWebUI. In this case 429 errors are likely because it's receiving too many requests at a time (221) and saturating the available slots (even with configured parallelism at `10`). Relevant logs from `llama-server`: ``` srv init: initializing slots, n_slots = 10 slot init: id 0 | task -1 | new slot, n_ctx = 2048 slot init: id 1 | task -1 | new slot, n_ctx = 2048 slot init: id 2 | task -1 | new slot, n_ctx = 2048 slot init: id 3 | task -1 | new slot, n_ctx = 2048 slot init: id 4 | task -1 | new slot, n_ctx = 2048 slot init: id 5 | task -1 | new slot, n_ctx = 2048 slot init: id 6 | task -1 | new slot, n_ctx = 2048 slot init: id 7 | task -1 | new slot, n_ctx = 2048 slot init: id 8 | task -1 | new slot, n_ctx = 2048 slot init: id 9 | task -1 | new slot, n_ctx = 2048 srv init: prompt cache is enabled, size limit: 8192 MiB ```
Author
Owner

@FBH93 commented on GitHub (Nov 24, 2025):

VectorDB setup is default ChromaDB that comes built in with OpenWebUI. So it's running in the same container as OWUI. I have not changed settings related to this.

embedding is handled by Azure OpenAI Embedding.

I have no idea how to access or view the contents of ChromaDB.

<!-- gh-comment-id:3570973738 --> @FBH93 commented on GitHub (Nov 24, 2025): VectorDB setup is default ChromaDB that comes built in with OpenWebUI. So it's running in the same container as OWUI. I have not changed settings related to this. embedding is handled by Azure OpenAI Embedding. I have no idea how to access or view the contents of ChromaDB.
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

Aha okay. So it would be a simple fix for you to define a maximum number of requests per (second/minute) and this would fix this for local inference?

We specifically tested sentence transformers and OpenAI - and even though we tested thousands of embeddings, even on a tier 1 account, we didnt get 429 errors. Didnt think of that.

<!-- gh-comment-id:3570974884 --> @Classic298 commented on GitHub (Nov 24, 2025): Aha okay. So it would be a simple fix for you to define a maximum number of requests per (second/minute) and this would fix this for local inference? We specifically tested sentence transformers and OpenAI - and even though we tested thousands of embeddings, even on a tier 1 account, we didnt get 429 errors. Didnt think of that.
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

@FBH93
we dont need your chroma db contents, just your full setup info, embedding model info, what vector db (you shared that now) and how you use your embedding models and so forth

<!-- gh-comment-id:3570978262 --> @Classic298 commented on GitHub (Nov 24, 2025): @FBH93 we dont need your chroma db contents, just your full setup info, embedding model info, what vector db (you shared that now) and how you use your embedding models and so forth
Author
Owner

@scheatkode commented on GitHub (Nov 24, 2025):

No worries. A way to define a maximum number of requests would be great. Even better if this was automatically handled with retries & backoff.

<!-- gh-comment-id:3571002886 --> @scheatkode commented on GitHub (Nov 24, 2025): No worries. A way to define a maximum number of requests would be great. Even better if this was automatically handled with retries & backoff.
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

Thanks for your detailed setup description. This gives us (at least one, if not the) reason this might fail for some people.
Definitely makes sense - if your server can only handle 10 at a time, to then set a maximum simultaneous requests limit.

<!-- gh-comment-id:3571015216 --> @Classic298 commented on GitHub (Nov 24, 2025): Thanks for your detailed setup description. This gives us (at least one, if not the) reason this might fail for some people. Definitely makes sense - if your server can only handle 10 at a time, to then set a maximum simultaneous requests limit.
Author
Owner

@FBH93 commented on GitHub (Nov 24, 2025):

My full setup info:
Front end hosted in azure container, 3 CPU cores, 6GB memory.
File storage in Azure Storage Account File Share (Also where the chromaDB file is located it turns out)
Embedding model Text-Embedding-3-Large on Azure OpenAI

I spun up a copy of the setup without much data in the ChromaDB, and it is significantly faster (almost instant). So I guess the freeze could be related to the size of chromaDB? My current size is 1.27GB, so it could maybe explain why it takes time to load and save new data to it?

<!-- gh-comment-id:3571069268 --> @FBH93 commented on GitHub (Nov 24, 2025): My full setup info: Front end hosted in azure container, 3 CPU cores, 6GB memory. File storage in Azure Storage Account File Share (Also where the chromaDB file is located it turns out) Embedding model Text-Embedding-3-Large on Azure OpenAI I spun up a copy of the setup without much data in the ChromaDB, and it is significantly faster (almost instant). So I guess the freeze could be related to the size of chromaDB? My current size is 1.27GB, so it could maybe explain why it takes time to load and save new data to it?
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

@FBH93 don't mind me asking; is this a small scale setup or do you have many users?
Since you're utilizing azure a lot i had assumed business/enterprise setup, but 3 CPU cores and 6 GB memory for many users (just assuming here) is a bit on the low end.

It might be that embeddings work for you, but if someone uploads a very large document (hundreds if not thousands of chunks) that the large amount of requests this generates might consume much of your 3 CPU cores? What content extraction engine do you use here? Might be CPU heavy as well.

Please also tell us more about the embedding model.
Do you have rate limits there? If yes, how high and do you reach them?
Do the logs show anything? Can you share logs or errors you find?
Content extraction, basically your whole Document settings in the admin panel please.
Please, Thanks

<!-- gh-comment-id:3571086250 --> @Classic298 commented on GitHub (Nov 24, 2025): @FBH93 don't mind me asking; is this a small scale setup or do you have many users? Since you're utilizing azure a lot i had assumed business/enterprise setup, but 3 CPU cores and 6 GB memory for many users (just assuming here) is a bit on the low end. It might be that embeddings work for you, but if someone uploads a very large document (hundreds if not thousands of chunks) that the large amount of requests this generates might consume much of your 3 CPU cores? What content extraction engine do you use here? Might be CPU heavy as well. Please also tell us more about the embedding model. Do you have rate limits there? If yes, how high and do you reach them? Do the logs show anything? Can you share logs or errors you find? Content extraction, basically your whole Document settings in the admin panel please. Please, Thanks
Author
Owner

@scheatkode commented on GitHub (Nov 24, 2025):

Here's an attempt at a fix using both solutions. They could even be mixed for even better handling.

diff --git a/backend/open_webui/routers/retrieval.py b/backend/open_webui/routers/retrieval.py
index 358b8aca4938..dc1e87751e1a 100644
--- a/backend/open_webui/routers/retrieval.py
+++ b/backend/open_webui/routers/retrieval.py
@@ -4,6 +4,7 @@ import mimetypes
 import os
 import shutil
 import asyncio
+import backoff
 
 import re
 import uuid
@@ -1294,6 +1295,11 @@ async def update_rag_config(
 ####################################
 
 
+@backoff.on_exception(
+    backoff.expo,
+    aiohttp.client_exceptions.ClientResponseError,
+    max_time=180
+)
 def save_docs_to_vector_db(
     request: Request,
     docs,
diff --git a/backend/requirements-min.txt b/backend/requirements-min.txt
index bc4732fc1db3..d8164947117d 100644
--- a/backend/requirements-min.txt
+++ b/backend/requirements-min.txt
@@ -23,6 +23,7 @@ aiofiles
 starlette-compress==1.6.0
 httpx[socks,http2,zstd,cli,brotli]==0.28.1
 starsessions[redis]==2.2.1
+python-backoff==2.2.2
 
 sqlalchemy==2.0.38
 alembic==1.14.0
@@ -47,4 +48,4 @@ fake-useragent==2.2.0
 
 chromadb==1.1.0
 black==25.9.0
-pydub
\ No newline at end of file
+pydub
diff --git a/backend/requirements.txt b/backend/requirements.txt
index db32255a89aa..d8d963d0ed5f 100644
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -20,6 +20,7 @@ aiofiles
 starlette-compress==1.6.0
 httpx[socks,http2,zstd,cli,brotli]==0.28.1
 starsessions[redis]==2.2.1
+python-backoff==2.2.2
 
 sqlalchemy==2.0.38
 alembic==1.14.0
@@ -56,7 +57,7 @@ transformers
 sentence-transformers==5.1.1
 accelerate
 pyarrow==20.0.0 # fix: pin pyarrow version to 20 for rpi compatibility #15897
-einops==0.8.1 
+einops==0.8.1
 
 ftfy==6.2.3
 pypdf==6.0.0

Alternatively, defining a concurrency limit would look like this:

diff --git a/backend/open_webui/env.py b/backend/open_webui/env.py
index 651629b9501e..87f9a693becd 100644
--- a/backend/open_webui/env.py
+++ b/backend/open_webui/env.py
@@ -710,6 +710,8 @@ AIOHTTP_CLIENT_SESSION_TOOL_SERVER_SSL = (
     os.environ.get("AIOHTTP_CLIENT_SESSION_TOOL_SERVER_SSL", "True").lower() == "true"
 )
 
+MAX_CONCURRENT_REQUESTS = int(os.environ.get("MAX_CONCURRENT_REQUESTS", "10"))
+
 
 ####################################
 # SENTENCE TRANSFORMERS
diff --git a/backend/open_webui/routers/retrieval.py b/backend/open_webui/routers/retrieval.py
index 358b8aca4938..b87da8bc52cb 100644
--- a/backend/open_webui/routers/retrieval.py
+++ b/backend/open_webui/routers/retrieval.py
@@ -102,6 +102,7 @@ from open_webui.env import (
     SRC_LOG_LEVELS,
     DEVICE_TYPE,
     DOCKER,
+    MAX_CONCURRENT_REQUESTS,
     SENTENCE_TRANSFORMERS_BACKEND,
     SENTENCE_TRANSFORMERS_MODEL_KWARGS,
     SENTENCE_TRANSFORMERS_CROSS_ENCODER_BACKEND,
@@ -2127,6 +2128,17 @@ async def process_web_search(
 
     urls = []
     result_items = []
+    semaphore = asyncio.Semaphore(MAX_CONCURRENT_REQUESTS)
+
+    async def bounded_search_web(query):
+        async with semaphore:
+            return await run_in_threadpool(
+                search_web,
+                request,
+                request.app.state.config.WEB_SEARCH_ENGINE,
+                query,
+                user,
+            )
 
     try:
         logging.debug(
@@ -2134,13 +2146,7 @@ async def process_web_search(
         )
 
         search_tasks = [
-            run_in_threadpool(
-                search_web,
-                request,
-                request.app.state.config.WEB_SEARCH_ENGINE,
-                query,
-                user,
-            )
+            bounded_search_web(query)
             for query in form_data.queries
         ]
 

These might be enough to fix both issues.

<!-- gh-comment-id:3571216028 --> @scheatkode commented on GitHub (Nov 24, 2025): Here's an attempt at a fix using both solutions. They could even be mixed for even better handling. ```diff diff --git a/backend/open_webui/routers/retrieval.py b/backend/open_webui/routers/retrieval.py index 358b8aca4938..dc1e87751e1a 100644 --- a/backend/open_webui/routers/retrieval.py +++ b/backend/open_webui/routers/retrieval.py @@ -4,6 +4,7 @@ import mimetypes import os import shutil import asyncio +import backoff import re import uuid @@ -1294,6 +1295,11 @@ async def update_rag_config( #################################### +@backoff.on_exception( + backoff.expo, + aiohttp.client_exceptions.ClientResponseError, + max_time=180 +) def save_docs_to_vector_db( request: Request, docs, diff --git a/backend/requirements-min.txt b/backend/requirements-min.txt index bc4732fc1db3..d8164947117d 100644 --- a/backend/requirements-min.txt +++ b/backend/requirements-min.txt @@ -23,6 +23,7 @@ aiofiles starlette-compress==1.6.0 httpx[socks,http2,zstd,cli,brotli]==0.28.1 starsessions[redis]==2.2.1 +python-backoff==2.2.2 sqlalchemy==2.0.38 alembic==1.14.0 @@ -47,4 +48,4 @@ fake-useragent==2.2.0 chromadb==1.1.0 black==25.9.0 -pydub \ No newline at end of file +pydub diff --git a/backend/requirements.txt b/backend/requirements.txt index db32255a89aa..d8d963d0ed5f 100644 --- a/backend/requirements.txt +++ b/backend/requirements.txt @@ -20,6 +20,7 @@ aiofiles starlette-compress==1.6.0 httpx[socks,http2,zstd,cli,brotli]==0.28.1 starsessions[redis]==2.2.1 +python-backoff==2.2.2 sqlalchemy==2.0.38 alembic==1.14.0 @@ -56,7 +57,7 @@ transformers sentence-transformers==5.1.1 accelerate pyarrow==20.0.0 # fix: pin pyarrow version to 20 for rpi compatibility #15897 -einops==0.8.1 +einops==0.8.1 ftfy==6.2.3 pypdf==6.0.0 ``` Alternatively, defining a concurrency limit would look like this: ```diff diff --git a/backend/open_webui/env.py b/backend/open_webui/env.py index 651629b9501e..87f9a693becd 100644 --- a/backend/open_webui/env.py +++ b/backend/open_webui/env.py @@ -710,6 +710,8 @@ AIOHTTP_CLIENT_SESSION_TOOL_SERVER_SSL = ( os.environ.get("AIOHTTP_CLIENT_SESSION_TOOL_SERVER_SSL", "True").lower() == "true" ) +MAX_CONCURRENT_REQUESTS = int(os.environ.get("MAX_CONCURRENT_REQUESTS", "10")) + #################################### # SENTENCE TRANSFORMERS diff --git a/backend/open_webui/routers/retrieval.py b/backend/open_webui/routers/retrieval.py index 358b8aca4938..b87da8bc52cb 100644 --- a/backend/open_webui/routers/retrieval.py +++ b/backend/open_webui/routers/retrieval.py @@ -102,6 +102,7 @@ from open_webui.env import ( SRC_LOG_LEVELS, DEVICE_TYPE, DOCKER, + MAX_CONCURRENT_REQUESTS, SENTENCE_TRANSFORMERS_BACKEND, SENTENCE_TRANSFORMERS_MODEL_KWARGS, SENTENCE_TRANSFORMERS_CROSS_ENCODER_BACKEND, @@ -2127,6 +2128,17 @@ async def process_web_search( urls = [] result_items = [] + semaphore = asyncio.Semaphore(MAX_CONCURRENT_REQUESTS) + + async def bounded_search_web(query): + async with semaphore: + return await run_in_threadpool( + search_web, + request, + request.app.state.config.WEB_SEARCH_ENGINE, + query, + user, + ) try: logging.debug( @@ -2134,13 +2146,7 @@ async def process_web_search( ) search_tasks = [ - run_in_threadpool( - search_web, - request, - request.app.state.config.WEB_SEARCH_ENGINE, - query, - user, - ) + bounded_search_web(query) for query in form_data.queries ] ``` These might be enough to fix both issues.
Author
Owner

@FBH93 commented on GitHub (Nov 24, 2025):

It's around 150 weekly users, so not small, but not big either.

We have not seen issues with bottlenecking so far, except until now, but it does not seem to relate to either CPU/RAM useage or token limits.

The embedding model has a limit of 350.000 tokens per minute, and we are nowhere near that.

Content extraction is default.

Here's a log of a 2 page document I tried to upload to the chat.

2025-11-24T14:59:57.5981286Z stdout F 2025-11-24 14:59:57.597 | DEBUG | open_webui.routers.retrieval:process_file:1666 - text_content: 25-118 / Doc25-1121 Page 1 of 2 <SNIP> Document content here </SNIP> 2025-11-24T14:59:57.6231690Z stdout F 2025-11-24 14:59:57.622 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1323 - save_docs_to_vector_db: document <document title>.pdf file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T14:59:58.8142945Z stdout F 2025-11-24 14:59:58.814 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1439 - generating embeddings for file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T14:59:58.8145845Z stdout F 2025-11-24 14:59:58.814 | DEBUG | asyncio.selector_events:__init__:54 - Using selector: EpollSelector 2025-11-24T14:59:58.8148496Z stdout F 2025-11-24 14:59:58.814 | DEBUG | open_webui.retrieval.utils:async_embedding_function:819 - generate_multiple_async: Processing 1 batches in parallel 2025-11-24T14:59:58.8149469Z stdout F 2025-11-24 14:59:58.814 | DEBUG | open_webui.retrieval.utils:agenerate_azure_openai_batch_embeddings:670 - agenerate_azure_openai_batch_embeddings:deployment text-embedding-3-large batch size: 3 2025-11-24T14:59:58.8399559Z stdout F 2025-11-24 14:59:58.839 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 94.101.209.81:0 - "GET /api/v1/chats/c30f3db9-9528-4d42-8b10-de0b51770bf7 HTTP/1.1" 200 2025-11-24T14:59:58.8402052Z stdout F 2025-11-24 14:59:58.839 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 94.101.209.81:0 - "GET /api/v1/chats/8aa1e5e3-edc4-4443-81a4-146587f4a6ba HTTP/1.1" 200 2025-11-24T14:59:59.0006063Z stdout F 2025-11-24 14:59:59.000 | DEBUG | open_webui.retrieval.utils:async_embedding_function:836 - generate_multiple_async: Generated 3 embeddings from 1 parallel batches 2025-11-24T14:59:59.0015254Z stdout F 2025-11-24 14:59:59.001 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1478 - embeddings generated 3 for 3 items 2025-11-24T14:59:59.0016218Z stdout F 2025-11-24 14:59:59.001 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1490 - adding to collection file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T15:00:59.50822 No logs since last 60 seconds 2025-11-24T15:01:59.2216817Z stdout F 2025-11-24 15:01:59.221 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1496 - added 3 items to collection file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T15:01:59.2217935Z stdout F 2025-11-24 15:01:59.221 | INFO | open_webui.routers.retrieval:process_file:1696 - added 2 items to collection file-eaedfc3f-8a40-45bc-a754-6784cc5c941c

As you can see, the extraction and embedding is very fast. But the saving to ChromaDB (adding to collection) takes 2 minutes (almost exactly?).

Here's the settings:

Image
<!-- gh-comment-id:3571254281 --> @FBH93 commented on GitHub (Nov 24, 2025): It's around 150 weekly users, so not small, but not big either. We have not seen issues with bottlenecking so far, except until now, but it does not seem to relate to either CPU/RAM useage or token limits. The embedding model has a limit of 350.000 tokens per minute, and we are nowhere near that. Content extraction is default. Here's a log of a 2 page document I tried to upload to the chat. `2025-11-24T14:59:57.5981286Z stdout F 2025-11-24 14:59:57.597 | DEBUG | open_webui.routers.retrieval:process_file:1666 - text_content: 25-118 / Doc25-1121 Page 1 of 2 <SNIP> Document content here </SNIP> 2025-11-24T14:59:57.6231690Z stdout F 2025-11-24 14:59:57.622 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1323 - save_docs_to_vector_db: document <document title>.pdf file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T14:59:58.8142945Z stdout F 2025-11-24 14:59:58.814 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1439 - generating embeddings for file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T14:59:58.8145845Z stdout F 2025-11-24 14:59:58.814 | DEBUG | asyncio.selector_events:__init__:54 - Using selector: EpollSelector 2025-11-24T14:59:58.8148496Z stdout F 2025-11-24 14:59:58.814 | DEBUG | open_webui.retrieval.utils:async_embedding_function:819 - generate_multiple_async: Processing 1 batches in parallel 2025-11-24T14:59:58.8149469Z stdout F 2025-11-24 14:59:58.814 | DEBUG | open_webui.retrieval.utils:agenerate_azure_openai_batch_embeddings:670 - agenerate_azure_openai_batch_embeddings:deployment text-embedding-3-large batch size: 3 2025-11-24T14:59:58.8399559Z stdout F 2025-11-24 14:59:58.839 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 94.101.209.81:0 - "GET /api/v1/chats/c30f3db9-9528-4d42-8b10-de0b51770bf7 HTTP/1.1" 200 2025-11-24T14:59:58.8402052Z stdout F 2025-11-24 14:59:58.839 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 94.101.209.81:0 - "GET /api/v1/chats/8aa1e5e3-edc4-4443-81a4-146587f4a6ba HTTP/1.1" 200 2025-11-24T14:59:59.0006063Z stdout F 2025-11-24 14:59:59.000 | DEBUG | open_webui.retrieval.utils:async_embedding_function:836 - generate_multiple_async: Generated 3 embeddings from 1 parallel batches 2025-11-24T14:59:59.0015254Z stdout F 2025-11-24 14:59:59.001 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1478 - embeddings generated 3 for 3 items 2025-11-24T14:59:59.0016218Z stdout F 2025-11-24 14:59:59.001 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1490 - adding to collection file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T15:00:59.50822 No logs since last 60 seconds 2025-11-24T15:01:59.2216817Z stdout F 2025-11-24 15:01:59.221 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1496 - added 3 items to collection file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T15:01:59.2217935Z stdout F 2025-11-24 15:01:59.221 | INFO | open_webui.routers.retrieval:process_file:1696 - added 2 items to collection file-eaedfc3f-8a40-45bc-a754-6784cc5c941c` As you can see, the extraction and embedding is very fast. But the saving to ChromaDB (adding to collection) takes 2 minutes (almost exactly?). Here's the settings: <img width="2973" height="1114" alt="Image" src="https://github.com/user-attachments/assets/d12f44b9-25ff-45b5-a47a-c87be04af81c" />
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

aha so the issue is it takes long to add to the database! (?)
Does embedding work for you fine then?
Might have conflated these two here, but will use this (and the other discussion) to track it regardless

<!-- gh-comment-id:3571261520 --> @Classic298 commented on GitHub (Nov 24, 2025): aha so the issue is it takes long to add to the database! (?) Does embedding work for you fine then? Might have conflated these two here, but will use this (and the other discussion) to track it regardless
Author
Owner

@FBH93 commented on GitHub (Nov 24, 2025):

Yes exactly, the issue is that embeddings are created just fine, as expected, but when it's saved to DB it takes a long time, and during this time no other users can do anything.

I would be fine-ish with a slow save, if other users were not blocked and the app didn't appear frozen to them.

<!-- gh-comment-id:3571268512 --> @FBH93 commented on GitHub (Nov 24, 2025): Yes exactly, the issue is that embeddings are created just fine, as expected, but when it's saved to DB it takes a long time, and during this time no other users can do anything. I would be fine-ish with a slow save, if other users were not blocked and the app didn't appear frozen to them.
Author
Owner

@nlamarque42 commented on GitHub (Nov 24, 2025):

you need to run your open-webui instance with multiple workers orchestrated by a redis.

<!-- gh-comment-id:3571595164 --> @nlamarque42 commented on GitHub (Nov 24, 2025): you need to run your open-webui instance with multiple workers orchestrated by a redis.
Author
Owner

@curious-broccoli commented on GitHub (Nov 24, 2025):

similar/same issue here that completely breaks embedding using an API.

Version

v0.6.38 and v0.6.37 running in local development

Expected

embedding works, works fine with v0.6.36

Steps to Reproduce

see exported json, urls and keys redacted

{
    "version": 0,
    "ollama": {
        "enable": false,
        "base_urls": [
            "http://host.docker.internal:11434"
        ],
        "api_configs": {}
    },
    "openai": {
        "api_base_urls": [
            "https://OUR.API"
        ],
        "api_keys": [
            "REDACTED"
        ],
        "enable": true,
        "api_configs": {
            "0": {
                "enable": true,
                "prefix_id": "",
                "model_ids": []
            }
        }
    },
    "model_filter": {
        "enable": false,
        "list": [
            ""
        ]
    },
    "rag": {
        "template": "TEMPLATE",
        "top_k": 10,
        "relevance_threshold": 0.5,
        "enable_hybrid_search": true,
        "pdf_extract_images": true,
        "youtube_loader_language": [
            "en"
        ],
        "enable_web_loader_ssl_verification": true,
        "embedding_engine": "openai",
        "embedding_model": "Qwen/Qwen3-Embedding-4B",
        "file": {
            "max_size": null,
            "max_count": null,
            "allowed_extensions": []
        },
        "CONTENT_EXTRACTION_ENGINE": "docling",
        "tika_server_url": "http://tika:9998",
        "chunk_size": 1500,
        "chunk_overlap": 100,
        "reranking_model": "BAAI/bge-reranker-v2-m3",
        "text_splitter": "",
        "youtube_loader_proxy_url": "",
        "openai_api_base_url": "https://OUR.API/",
        "openai_api_key": "REDACTED",
        "ollama": {
            "url": "http://host.docker.internal:11434",
            "key": ""
        },
        "embedding_batch_size": 24,
        "full_context": false,
        "bypass_embedding_and_retrieval": false,
        "docling_server_url": "http://docling/",
        "document_intelligence_endpoint": "",
        "document_intelligence_key": "",
        "top_k_reranker": 5,
        "docling_ocr_engine": "tesseract",
        "docling_ocr_lang": "eng,fra,deu,spa",
        "mistral_ocr_api_key": "",
        "azure_openai": {
            "base_url": "",
            "api_key": "",
            "api_version": ""
        },
        "hybrid_bm25_weight": 0.5,
        "datalab_marker_api_key": "",
        "datalab_marker_langs": "",
        "datalab_marker_skip_cache": false,
        "datalab_marker_force_ocr": false,
        "datalab_marker_paginate": false,
        "datalab_marker_strip_existing_ocr": false,
        "datalab_marker_disable_image_extraction": false,
        "datalab_marker_output_format": "markdown",
        "DATALAB_MARKER_USE_LLM": false,
        "external_document_loader_url": "",
        "external_document_loader_api_key": "",
        "docling_do_picture_description": true,
        "docling_picture_description_mode": "local",
        "docling_picture_description_local": {},
        "docling_picture_description_api": {
            "url": "https://OUR.API...",
            "headers": {
                "Authorization": "Bearer REDACTED"
            },
            "params": {
                "model": "alias-vision"
            },
            "timeout": 60,
            "prompt": "Describe this image in great details. "
        },
        "reranking_engine": "external",
        "external_reranker_url": "https://OUR.API/rerank",
        "external_reranker_api_key": "REDACTED",
        "datalab_marker_api_base_url": "",
        "datalab_marker_additional_config": "",
        "datalab_marker_format_lines": false,
        "docling_params": {},
        "docling_do_ocr": true,
        "docling_force_ocr": false,
        "docling_pdf_backend": "dlparse_v4",
        "docling_table_mode": "fast",
        "docling_pipeline": "standard",
        "MISTRAL_OCR_API_BASE_URL": "https://api.mistral.ai/v1",
        "mineru_api_mode": "local",
        "mineru_api_url": "http://localhost:8000",
        "mineru_api_key": "",
        "mineru_params": {}
    },
    "auth": {
        "admin": {
            "show": true
        },
        "jwt_expiry": "100h",
        "api_key": {
            "enable": false,
            "endpoint_restrictions": false,
            "allowed_endpoints": ""
        }
    },
    "file": {
        "image_compression_width": null,
        "image_compression_height": null
    },
    "direct": {
        "enable": true
    },
    "models": {
        "base_models_cache": false
    },
    "notes": {
        "enable": true
    },
    "evaluation": {
        "arena": {
            "enable": false,
            "models": []
        }
    }
}

same problem with default local chroma DB and with pgvector

VECTOR_DB="pgvector"
PGVECTOR_CREATE_EXTENSION="False"
PGVECTOR_DB_URL="postgresql://user:password@localhost:5432/open_webui"
PGVECTOR_INITIALIZE_MAX_VECTOR_LENGTH=2560
PGVECTOR_INDEX_METHOD="hnsw"
PGVECTOR_USE_HALFVEC=True
  1. upload tiny text file with content "hello" to a chat
  2. "error: list index out of range" shown to user

Logs

Traceback (most recent call last):

  File "/home/moritz/.pyenv/versions/3.11.13/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
    │    └ <function Thread._bootstrap_inner at 0x7f0863746980>
    └ <WorkerThread(AnyIO worker thread, started 139668868806336)>
  File "/home/moritz/.pyenv/versions/3.11.13/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
    │    └ <function WorkerThread.run at 0x7f07336e1800>
    └ <WorkerThread(AnyIO worker thread, started 139668868806336)>
  File "/home/moritz/programming/python/open-webui/backend/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run
    result = context.run(func, *args)
             │       │   │      └ ()
             │       │   └ functools.partial(<function process_uploaded_file at 0x7f073a1e6ca0>, <starlette.requests.Request object at 0x7f0731544d90>, ...
             │       └ <method 'run' of '_contextvars.Context' objects>
             └ <_contextvars.Context object at 0x7f0731555e40>

  File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/files.py", line 117, in process_uploaded_file
    process_file(request, ProcessFileForm(file_id=file_item.id), user=user)
    │            │        │                       │         │         └ UserModel(id='a84c1df9-272d-431c-b4c9-a12b1a8a2488', name='User', email='admin@localhost', username=None, role='admin', profi...
    │            │        │                       │         └ '4264174a-4b14-4ed4-ba8b-ba29ac5c9f54'
    │            │        │                       └ FileModel(id='4264174a-4b14-4ed4-ba8b-ba29ac5c9f54', user_id='a84c1df9-272d-431c-b4c9-a12b1a8a2488', hash=None, filename='hel...
    │            │        └ <class 'open_webui.routers.retrieval.ProcessFileForm'>
    │            └ <starlette.requests.Request object at 0x7f0731544d90>
    └ <function process_file at 0x7f07358f5da0>

> File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/retrieval.py", line 1720, in process_file
    raise e
          └ IndexError('list index out of range')

  File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/retrieval.py", line 1684, in process_file
    result = save_docs_to_vector_db(
             └ <function save_docs_to_vector_db at 0x7f07358ceca0>

  File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/retrieval.py", line 1500, in save_docs_to_vector_db
    raise e

  File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/retrieval.py", line 1480, in save_docs_to_vector_db
    items = [

  File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/retrieval.py", line 1484, in <listcomp>
    "vector": embeddings[idx],
              │          └ 0
              └ []

IndexError: list index out of range

As the logs say, this is not a 429 error. Which makes sense since it is very little to embed. Bigger files have the same error.

<!-- gh-comment-id:3571616732 --> @curious-broccoli commented on GitHub (Nov 24, 2025): similar/same issue here that completely breaks embedding using an API. ### Version v0.6.38 and v0.6.37 running in local development ### Expected embedding works, works fine with v0.6.36 ### Steps to Reproduce see exported json, urls and keys redacted ``` { "version": 0, "ollama": { "enable": false, "base_urls": [ "http://host.docker.internal:11434" ], "api_configs": {} }, "openai": { "api_base_urls": [ "https://OUR.API" ], "api_keys": [ "REDACTED" ], "enable": true, "api_configs": { "0": { "enable": true, "prefix_id": "", "model_ids": [] } } }, "model_filter": { "enable": false, "list": [ "" ] }, "rag": { "template": "TEMPLATE", "top_k": 10, "relevance_threshold": 0.5, "enable_hybrid_search": true, "pdf_extract_images": true, "youtube_loader_language": [ "en" ], "enable_web_loader_ssl_verification": true, "embedding_engine": "openai", "embedding_model": "Qwen/Qwen3-Embedding-4B", "file": { "max_size": null, "max_count": null, "allowed_extensions": [] }, "CONTENT_EXTRACTION_ENGINE": "docling", "tika_server_url": "http://tika:9998", "chunk_size": 1500, "chunk_overlap": 100, "reranking_model": "BAAI/bge-reranker-v2-m3", "text_splitter": "", "youtube_loader_proxy_url": "", "openai_api_base_url": "https://OUR.API/", "openai_api_key": "REDACTED", "ollama": { "url": "http://host.docker.internal:11434", "key": "" }, "embedding_batch_size": 24, "full_context": false, "bypass_embedding_and_retrieval": false, "docling_server_url": "http://docling/", "document_intelligence_endpoint": "", "document_intelligence_key": "", "top_k_reranker": 5, "docling_ocr_engine": "tesseract", "docling_ocr_lang": "eng,fra,deu,spa", "mistral_ocr_api_key": "", "azure_openai": { "base_url": "", "api_key": "", "api_version": "" }, "hybrid_bm25_weight": 0.5, "datalab_marker_api_key": "", "datalab_marker_langs": "", "datalab_marker_skip_cache": false, "datalab_marker_force_ocr": false, "datalab_marker_paginate": false, "datalab_marker_strip_existing_ocr": false, "datalab_marker_disable_image_extraction": false, "datalab_marker_output_format": "markdown", "DATALAB_MARKER_USE_LLM": false, "external_document_loader_url": "", "external_document_loader_api_key": "", "docling_do_picture_description": true, "docling_picture_description_mode": "local", "docling_picture_description_local": {}, "docling_picture_description_api": { "url": "https://OUR.API...", "headers": { "Authorization": "Bearer REDACTED" }, "params": { "model": "alias-vision" }, "timeout": 60, "prompt": "Describe this image in great details. " }, "reranking_engine": "external", "external_reranker_url": "https://OUR.API/rerank", "external_reranker_api_key": "REDACTED", "datalab_marker_api_base_url": "", "datalab_marker_additional_config": "", "datalab_marker_format_lines": false, "docling_params": {}, "docling_do_ocr": true, "docling_force_ocr": false, "docling_pdf_backend": "dlparse_v4", "docling_table_mode": "fast", "docling_pipeline": "standard", "MISTRAL_OCR_API_BASE_URL": "https://api.mistral.ai/v1", "mineru_api_mode": "local", "mineru_api_url": "http://localhost:8000", "mineru_api_key": "", "mineru_params": {} }, "auth": { "admin": { "show": true }, "jwt_expiry": "100h", "api_key": { "enable": false, "endpoint_restrictions": false, "allowed_endpoints": "" } }, "file": { "image_compression_width": null, "image_compression_height": null }, "direct": { "enable": true }, "models": { "base_models_cache": false }, "notes": { "enable": true }, "evaluation": { "arena": { "enable": false, "models": [] } } } ``` same problem with default local chroma DB and with pgvector ``` VECTOR_DB="pgvector" PGVECTOR_CREATE_EXTENSION="False" PGVECTOR_DB_URL="postgresql://user:password@localhost:5432/open_webui" PGVECTOR_INITIALIZE_MAX_VECTOR_LENGTH=2560 PGVECTOR_INDEX_METHOD="hnsw" PGVECTOR_USE_HALFVEC=True ``` 1. upload tiny text file with content "hello" to a chat 2. "error: list index out of range" shown to user ### Logs ``` Traceback (most recent call last): File "/home/moritz/.pyenv/versions/3.11.13/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x7f0863746980> └ <WorkerThread(AnyIO worker thread, started 139668868806336)> File "/home/moritz/.pyenv/versions/3.11.13/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function WorkerThread.run at 0x7f07336e1800> └ <WorkerThread(AnyIO worker thread, started 139668868806336)> File "/home/moritz/programming/python/open-webui/backend/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run result = context.run(func, *args) │ │ │ └ () │ │ └ functools.partial(<function process_uploaded_file at 0x7f073a1e6ca0>, <starlette.requests.Request object at 0x7f0731544d90>, ... │ └ <method 'run' of '_contextvars.Context' objects> └ <_contextvars.Context object at 0x7f0731555e40> File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/files.py", line 117, in process_uploaded_file process_file(request, ProcessFileForm(file_id=file_item.id), user=user) │ │ │ │ │ └ UserModel(id='a84c1df9-272d-431c-b4c9-a12b1a8a2488', name='User', email='admin@localhost', username=None, role='admin', profi... │ │ │ │ └ '4264174a-4b14-4ed4-ba8b-ba29ac5c9f54' │ │ │ └ FileModel(id='4264174a-4b14-4ed4-ba8b-ba29ac5c9f54', user_id='a84c1df9-272d-431c-b4c9-a12b1a8a2488', hash=None, filename='hel... │ │ └ <class 'open_webui.routers.retrieval.ProcessFileForm'> │ └ <starlette.requests.Request object at 0x7f0731544d90> └ <function process_file at 0x7f07358f5da0> > File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/retrieval.py", line 1720, in process_file raise e └ IndexError('list index out of range') File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/retrieval.py", line 1684, in process_file result = save_docs_to_vector_db( └ <function save_docs_to_vector_db at 0x7f07358ceca0> File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/retrieval.py", line 1500, in save_docs_to_vector_db raise e File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/retrieval.py", line 1480, in save_docs_to_vector_db items = [ File "/home/moritz/programming/python/open-webui/backend/open_webui/routers/retrieval.py", line 1484, in <listcomp> "vector": embeddings[idx], │ └ 0 └ [] IndexError: list index out of range ``` As the logs say, this is not a 429 error. Which makes sense since it is very little to embed. Bigger files have the same error.
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

@curious-broccoli
please more information about your setup specifically embedding models used and reproduction steps.

We currently are aware of two embeddings related issues

one is with the reindex button in documents
the other is with rate limits

<!-- gh-comment-id:3571988415 --> @Classic298 commented on GitHub (Nov 24, 2025): @curious-broccoli please more information about your setup specifically embedding models used and reproduction steps. We currently are aware of two embeddings related issues one is with the reindex button in documents the other is with rate limits
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

Hey @FBH93
I have a question

Can you try something?

set THREAD_POOL_SIZE env var to 2000 please

<!-- gh-comment-id:3572088815 --> @Classic298 commented on GitHub (Nov 24, 2025): Hey @FBH93 I have a question Can you try something? set THREAD_POOL_SIZE env var to 2000 please
Author
Owner

@FBH93 commented on GitHub (Nov 25, 2025):

@Classic298

With THREAD_POOL_SIZE set to 2000: Same problem with taking 2 minutes to save to vector DB. Blocking, still, so it is not concurrent.

I have also begun to identify occasional "list index out of range" issues, similar to other users reporting, but I cannot recreate it consistently.

<!-- gh-comment-id:3574195768 --> @FBH93 commented on GitHub (Nov 25, 2025): @Classic298 With THREAD_POOL_SIZE set to 2000: Same problem with taking 2 minutes to save to vector DB. Blocking, still, so it is not concurrent. I have also begun to identify occasional "list index out of range" issues, similar to other users reporting, but I cannot recreate it consistently.
Author
Owner

@Classic298 commented on GitHub (Nov 25, 2025):

@FBH93 i must suspect you are running out of resources, perhaps your resources cannot handle the users AND thousands of requests a minute for embedding at the same time.

The next version will introduce a toggle to DISABLE parallel embedding if you have issues with it. That will put you back to the old system of sequential embedding

<!-- gh-comment-id:3574205032 --> @Classic298 commented on GitHub (Nov 25, 2025): @FBH93 i must suspect you are running out of resources, perhaps your resources cannot handle the users AND thousands of requests a minute for embedding at the same time. The next version will introduce a toggle to DISABLE parallel embedding if you have issues with it. That will put you back to the old system of sequential embedding
Author
Owner

@Classic298 commented on GitHub (Nov 25, 2025):

reindex issue also fixed in dev btw for everyone else here

<!-- gh-comment-id:3574206079 --> @Classic298 commented on GitHub (Nov 25, 2025): reindex issue also fixed in dev btw for everyone else here
Author
Owner

@FBH93 commented on GitHub (Nov 25, 2025):

@Classic298 My useage logs show it's only using less than 20% of CPU and 20% of memory. But I will revert to sequential in the next update and see if this fixes the issue.

I suppose it is something specfic to my setup since nobody else seems to face this problem. I will need to experiment with my azure setup...

Thanks for your assist.

<!-- gh-comment-id:3574259232 --> @FBH93 commented on GitHub (Nov 25, 2025): @Classic298 My useage logs show it's only using less than 20% of CPU and 20% of memory. But I will revert to sequential in the next update and see if this fixes the issue. I suppose it is something specfic to my setup since nobody else seems to face this problem. I will need to experiment with my azure setup... Thanks for your assist.
Author
Owner

@Classic298 commented on GitHub (Nov 25, 2025):

@FBH93 i would recommend using redis + uvicorn workers (properly set up of course) and run a speed test if your storage is fast. if the storage is slow your vector DB will also be very slow since chroma db in this case lies on the storage

<!-- gh-comment-id:3574284689 --> @Classic298 commented on GitHub (Nov 25, 2025): @FBH93 i would recommend using redis + uvicorn workers (properly set up of course) and run a speed test if your storage is fast. if the storage is slow your vector DB will also be very slow since chroma db in this case lies on the storage
Author
Owner

@FBH93 commented on GitHub (Nov 25, 2025):

For anyone stumbling across this in the future: Changing the vector DB to pgvector on postgreSQL in azure helped the speed problems. So it was an issue with the chromaDB being stored in azure file share. Do not use the default chromaDB with many users / on cloud.

Now I am consistently getting the same "index out of range" problem that everyone else is getting.

<!-- gh-comment-id:3574848865 --> @FBH93 commented on GitHub (Nov 25, 2025): For anyone stumbling across this in the future: Changing the vector DB to pgvector on postgreSQL in azure helped the speed problems. So it was an issue with the chromaDB being stored in azure file share. Do not use the default chromaDB with many users / on cloud. Now I am consistently getting the same "index out of range" problem that everyone else is getting.
Author
Owner

@kumanoko24 commented on GitHub (Nov 25, 2025):

version v0.6.40

1|openwebui  | 2025-11-25 20:37:55.581 | ERROR    | open_webui.routers.retrieval:process_file:1643 - list index out of range
1|openwebui  | Traceback (most recent call last):
1|openwebui  |   File "/Users/admin/.local/share/uv/python/cpython-3.11.14-macos-aarch64-none/lib/python3.11/threading.py", line 1002, in _bootstrap
1|openwebui  |     self._bootstrap_inner()
1|openwebui  |     │    └ <function Thread._bootstrap_inner at 0x101276480>
1|openwebui  |     └ <WorkerThread(AnyIO worker thread, started 13841821696)>
1|openwebui  |   File "/Users/admin/.local/share/uv/python/cpython-3.11.14-macos-aarch64-none/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
1|openwebui  |     self.run()
1|openwebui  |     │    └ <function WorkerThread.run at 0x32ba58860>
1|openwebui  |     └ <WorkerThread(AnyIO worker thread, started 13841821696)>
1|openwebui  |   File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run
1|openwebui  |     result = context.run(func, *args)
1|openwebui  |              │       │   │      └ ()
1|openwebui  |              │       │   └ functools.partial(<function process_uploaded_file at 0x12bd72160>, <starlette.requests.Request object at 0x3300f2990>, Upload...
1|openwebui  |              │       └ <method 'run' of '_contextvars.Context' objects>
1|openwebui  |              └ <_contextvars.Context object at 0x330036780>
1|openwebui  |   File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/files.py", line 117, in process_uploaded_file
1|openwebui  |     process_file(request, ProcessFileForm(file_id=file_item.id), user=user)
1|openwebui  |     │            │        │                       │         │         └ UserModel(id='969550bb-f04e-49ae-a37a-5d8a2dadc7cf', name='admin', email='user@example.com', username=None, role='admin', pro...
1|openwebui  |     │            │        │                       │         └ '85fed124-8893-4736-b391-b4dab6c7ba2d'
1|openwebui  |     │            │        │                       └ FileModel(id='85fed124-8893-4736-b391-b4dab6c7ba2d', user_id='969550bb-f04e-49ae-a37a-5d8a2dadc7cf', hash=None, filename='[OP...
1|openwebui  |     │            │        └ <class 'open_webui.routers.retrieval.ProcessFileForm'>
1|openwebui  |     │            └ <starlette.requests.Request object at 0x3300f2990>
1|openwebui  |     └ <function process_file at 0x1317f8180>
1|openwebui  | > File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1640, in process_file
1|openwebui  |     raise e
1|openwebui  |           └ IndexError('list index out of range')
1|openwebui  |   File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1604, in process_file
1|openwebui  |     result = save_docs_to_vector_db(
1|openwebui  |              └ <function save_docs_to_vector_db at 0x1317777e0>
1|openwebui  |   File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1429, in save_docs_to_vector_db
1|openwebui  |     raise e
1|openwebui  |   File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1409, in save_docs_to_vector_db
1|openwebui  |     items = [
1|openwebui  |   File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1413, in <listcomp>
1|openwebui  |     "vector": embeddings[idx],
1|openwebui  |               │          └ 2
1|openwebui  |               └ [[0.011194641, 0.019039357, 0.012709626, 0.00076810725, 0.030934654, -0.0034263032, 0.015608493, 0.054607585, -0.008672635, 0...
1|openwebui  | IndexError: list index out of range
<!-- gh-comment-id:3575452527 --> @kumanoko24 commented on GitHub (Nov 25, 2025): version `v0.6.40` ``` 1|openwebui | 2025-11-25 20:37:55.581 | ERROR | open_webui.routers.retrieval:process_file:1643 - list index out of range 1|openwebui | Traceback (most recent call last): 1|openwebui | File "/Users/admin/.local/share/uv/python/cpython-3.11.14-macos-aarch64-none/lib/python3.11/threading.py", line 1002, in _bootstrap 1|openwebui | self._bootstrap_inner() 1|openwebui | │ └ <function Thread._bootstrap_inner at 0x101276480> 1|openwebui | └ <WorkerThread(AnyIO worker thread, started 13841821696)> 1|openwebui | File "/Users/admin/.local/share/uv/python/cpython-3.11.14-macos-aarch64-none/lib/python3.11/threading.py", line 1045, in _bootstrap_inner 1|openwebui | self.run() 1|openwebui | │ └ <function WorkerThread.run at 0x32ba58860> 1|openwebui | └ <WorkerThread(AnyIO worker thread, started 13841821696)> 1|openwebui | File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run 1|openwebui | result = context.run(func, *args) 1|openwebui | │ │ │ └ () 1|openwebui | │ │ └ functools.partial(<function process_uploaded_file at 0x12bd72160>, <starlette.requests.Request object at 0x3300f2990>, Upload... 1|openwebui | │ └ <method 'run' of '_contextvars.Context' objects> 1|openwebui | └ <_contextvars.Context object at 0x330036780> 1|openwebui | File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/files.py", line 117, in process_uploaded_file 1|openwebui | process_file(request, ProcessFileForm(file_id=file_item.id), user=user) 1|openwebui | │ │ │ │ │ └ UserModel(id='969550bb-f04e-49ae-a37a-5d8a2dadc7cf', name='admin', email='user@example.com', username=None, role='admin', pro... 1|openwebui | │ │ │ │ └ '85fed124-8893-4736-b391-b4dab6c7ba2d' 1|openwebui | │ │ │ └ FileModel(id='85fed124-8893-4736-b391-b4dab6c7ba2d', user_id='969550bb-f04e-49ae-a37a-5d8a2dadc7cf', hash=None, filename='[OP... 1|openwebui | │ │ └ <class 'open_webui.routers.retrieval.ProcessFileForm'> 1|openwebui | │ └ <starlette.requests.Request object at 0x3300f2990> 1|openwebui | └ <function process_file at 0x1317f8180> 1|openwebui | > File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1640, in process_file 1|openwebui | raise e 1|openwebui | └ IndexError('list index out of range') 1|openwebui | File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1604, in process_file 1|openwebui | result = save_docs_to_vector_db( 1|openwebui | └ <function save_docs_to_vector_db at 0x1317777e0> 1|openwebui | File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1429, in save_docs_to_vector_db 1|openwebui | raise e 1|openwebui | File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1409, in save_docs_to_vector_db 1|openwebui | items = [ 1|openwebui | File "/Users/admin/.local/share/uv/tools/open-webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1413, in <listcomp> 1|openwebui | "vector": embeddings[idx], 1|openwebui | │ └ 2 1|openwebui | └ [[0.011194641, 0.019039357, 0.012709626, 0.00076810725, 0.030934654, -0.0034263032, 0.015608493, 0.054607585, -0.008672635, 0... 1|openwebui | IndexError: list index out of range ```
Author
Owner

@Classic298 commented on GitHub (Nov 25, 2025):

@kumanoko24 what setup? Local embedding? Turn off parallel processing in the Document settings. Likely the backend silently got a 429 error (hence the embedding failed) and therefore list index is out of range, because the list is (almost) empty.

<!-- gh-comment-id:3575458979 --> @Classic298 commented on GitHub (Nov 25, 2025): @kumanoko24 what setup? Local embedding? Turn off parallel processing in the Document settings. Likely the backend silently got a 429 error (hence the embedding failed) and therefore list index is out of range, because the list is (almost) empty.
Author
Owner

@kumanoko24 commented on GitHub (Nov 25, 2025):

Image

I am still observing why some files are ok (successfully being indexed into knowledge base) but most files are having this kind of error.

Setup:

  • openwebui installed with uv tool install (python 3.11), plus qdrant-client dep.
  • local ollama + bge-m3:latest as embedding model.
  • local qdrant as vector base

Flow:

  • POST /api/v1/files/?process=true&process_in_background=true
  • GET /api/v1/files/${fileId}/process/status (polling until ok)
  • POST /api/v1/knowledge/${knowledgeBaseId}/file/add

@Classic298 thank you for prompt support, but I am still collecting details, might taking a bit more time.

<!-- gh-comment-id:3575497398 --> @kumanoko24 commented on GitHub (Nov 25, 2025): <img width="2063" height="854" alt="Image" src="https://github.com/user-attachments/assets/e7df032c-0e44-4924-a2a2-29618b25ac4d" /> I am still observing why some files are ok (successfully being indexed into knowledge base) but most files are having this kind of error. Setup: - openwebui installed with `uv tool install` (python 3.11), plus qdrant-client dep. - local ollama + `bge-m3:latest` as embedding model. - local qdrant as vector base Flow: - `POST /api/v1/files/?process=true&process_in_background=true` - `GET /api/v1/files/${fileId}/process/status` (polling until ok) - `POST /api/v1/knowledge/${knowledgeBaseId}/file/add` @Classic298 thank you for prompt support, but I am still collecting details, might taking a bit more time.
Author
Owner

@Classic298 commented on GitHub (Nov 25, 2025):

Yes you are clearly embedding locally

You should turn of parallel processing

<!-- gh-comment-id:3575508148 --> @Classic298 commented on GitHub (Nov 25, 2025): Yes you are clearly embedding locally You should turn of parallel processing
Author
Owner

@kumanoko24 commented on GitHub (Nov 25, 2025):

I have turned that off already , as shown in the screenshot

<!-- gh-comment-id:3575534383 --> @kumanoko24 commented on GitHub (Nov 25, 2025): I have turned that off already , as shown in the screenshot
Author
Owner

@Classic298 commented on GitHub (Nov 25, 2025):

Even when off you get this issue for some items? @kumanoko24 Are you on .40?

<!-- gh-comment-id:3575708158 --> @Classic298 commented on GitHub (Nov 25, 2025): Even when off you get this issue for some items? @kumanoko24 Are you on .40?
Author
Owner

@Classic298 commented on GitHub (Nov 25, 2025):

For anyone here with index list empty/out of range errors

https://github.com/open-webui/open-webui/issues/19474#issuecomment-3575806065

check if your configured API is correct.
It CANNOT Contain a trailing slash at the end

so it must end with the TLD like .com or .ai or whatever, OR end in /v1

It cannot end in /

<!-- gh-comment-id:3575947628 --> @Classic298 commented on GitHub (Nov 25, 2025): For anyone here with index list empty/out of range errors https://github.com/open-webui/open-webui/issues/19474#issuecomment-3575806065 check if your configured API is correct. It CANNOT Contain a trailing slash at the end so it must end with the TLD like .com or .ai or whatever, OR end in /v1 It cannot end in /
Author
Owner

@RDPPatwork commented on GitHub (Nov 25, 2025):

@FBH93 I have nearly the same setup like you and also stumbled about some problems, especially with rag/embeddings/fileuploads.
If you like, you can email me to share our experiences, as it can be really time consuming to find solutions in this case.

<!-- gh-comment-id:3576977654 --> @RDPPatwork commented on GitHub (Nov 25, 2025): @FBH93 I have nearly the same setup like you and also stumbled about some problems, especially with rag/embeddings/fileuploads. If you like, you can [email me](https://1ty.me/dCVJPs0M) to share our experiences, as it can be really time consuming to find solutions in this case.
Author
Owner

@imbible commented on GitHub (Nov 28, 2025):

For anyone here with index list empty/out of range errors

#19474 (comment)

check if your configured API is correct. It CANNOT Contain a trailing slash at the end

so it must end with the TLD like .com or .ai or whatever, OR end in /v1

It cannot end in /

It has nothing to do with that. I can reproduce this with local embedding, and with the async processing turned off. And apparently there isn't a trailing slash in my settings. This setting used to work, so I believe it is a regression. See my screenshot below.

Image
<!-- gh-comment-id:3587576809 --> @imbible commented on GitHub (Nov 28, 2025): > For anyone here with index list empty/out of range errors > > [#19474 (comment)](https://github.com/open-webui/open-webui/issues/19474#issuecomment-3575806065) > > check if your configured API is correct. It CANNOT Contain a trailing slash at the end > > so it must end with the TLD like .com or .ai or whatever, OR end in /v1 > > It cannot end in / It has nothing to do with that. I can reproduce this with local embedding, and with the async processing turned off. And apparently there isn't a trailing slash in my settings. This setting used to work, so I believe it is a regression. See my screenshot below. <img width="522" height="222" alt="Image" src="https://github.com/user-attachments/assets/670ffdbc-2931-49e1-8423-25ce1edc5829" />
Author
Owner

@Classic298 commented on GitHub (Nov 28, 2025):

@imbible what version are you on? Steps to reproduce? More information about your setup needed please.

<!-- gh-comment-id:3588106209 --> @Classic298 commented on GitHub (Nov 28, 2025): @imbible what version are you on? Steps to reproduce? More information about your setup needed please.
Author
Owner

@imbible commented on GitHub (Nov 28, 2025):

@imbible what version are you on? Steps to reproduce? More information about your setup needed please.

Sure. Version 0.6.40. MacBook Pro 16" with M4 Max 128GB unified memory. macOS Tahoe 26.1.

Here is the docker-compose.yml.

services:
  webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: webui
    ports:
      - 3000:8080
    environment:
      TZ: America/New_York
    extra_hosts:
      - "open-webui.local:host-gateway"
    volumes:
      - ./ollama-webui/data:/app/backend/data
    restart: unless-stopped
    env_file:
      - .env

Hosted on http://localhost:3000/ .

Go to Admin Panel - Settings - Documents, in the Embedding section, select model engine as Ollama, url set to http://host.docker.internal:11434, API Key remains empty, Embedding Model hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0. Embedding Batch Size 1, Async Embedding Processing disabled.

Go to Workspace - Knowledge, click "+ New Knowledge", input "test" in "What are you working on?" and "What are you trying to achieve?", click "Create Knowledge". Drag a pdf to the collection. It starts to process the pdf but eventually displays "list index out of range" regardless of whatever pdf. Tried 5 PDFs that used to work in older version of WebUI and see this issue with all of them in the current version.

<!-- gh-comment-id:3590027771 --> @imbible commented on GitHub (Nov 28, 2025): > [@imbible](https://github.com/imbible) what version are you on? Steps to reproduce? More information about your setup needed please. Sure. Version 0.6.40. MacBook Pro 16" with M4 Max 128GB unified memory. macOS Tahoe 26.1. Here is the docker-compose.yml. ```yml services: webui: image: ghcr.io/open-webui/open-webui:main container_name: webui ports: - 3000:8080 environment: TZ: America/New_York extra_hosts: - "open-webui.local:host-gateway" volumes: - ./ollama-webui/data:/app/backend/data restart: unless-stopped env_file: - .env ``` Hosted on http://localhost:3000/ . Go to Admin Panel - Settings - Documents, in the Embedding section, select model engine as **Ollama**, url set to `http://host.docker.internal:11434`, API Key remains empty, Embedding Model `hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0`. Embedding Batch Size 1, Async Embedding Processing disabled. Go to Workspace - Knowledge, click "+ New Knowledge", input "test" in "What are you working on?" and "What are you trying to achieve?", click "Create Knowledge". Drag a pdf to the collection. It starts to process the pdf but eventually displays "list index out of range" regardless of whatever pdf. Tried 5 PDFs that used to work in older version of WebUI and see this issue with all of them in the current version.
Author
Owner

@Classic298 commented on GitHub (Nov 28, 2025):

Can you verify the request reaches ollama? Gonna need debug logs here on both ends

<!-- gh-comment-id:3590125367 --> @Classic298 commented on GitHub (Nov 28, 2025): Can you verify the request reaches ollama? Gonna need debug logs here on both ends
Author
Owner

@imbible commented on GitHub (Nov 28, 2025):

Can you verify the request reaches ollama? Gonna need debug logs here on both ends

It reaches ollama. It seems to be an issue on ollama's end.
https://github.com/ollama/ollama/issues/12757
https://github.com/ollama/ollama/issues/10824

<!-- gh-comment-id:3590630104 --> @imbible commented on GitHub (Nov 28, 2025): > Can you verify the request reaches ollama? Gonna need debug logs here on both ends It reaches ollama. It seems to be an issue on ollama's end. https://github.com/ollama/ollama/issues/12757 https://github.com/ollama/ollama/issues/10824
Author
Owner

@2fst4u commented on GitHub (Nov 29, 2025):

I've been seeing the same issue for a while now and I have no idea which variable is causing it. I only use remote connections for all the rag settings and I've trained changing back to sentence transformers to no avail. Has the specific setting that causes this been narrowed down?

<!-- gh-comment-id:3591988428 --> @2fst4u commented on GitHub (Nov 29, 2025): I've been seeing the same issue for a while now and I have no idea which variable is causing it. I only use remote connections for all the rag settings and I've trained changing back to sentence transformers to no avail. Has the specific setting that causes this been narrowed down?
Author
Owner

@Classic298 commented on GitHub (Nov 29, 2025):

@2fst4u

same questions to you as to the others:
what version, embedding model, setup, all document settings, did you try what was recommended above and are you affected by what imbible shared here (ollama issue)?

<!-- gh-comment-id:3591995725 --> @Classic298 commented on GitHub (Nov 29, 2025): @2fst4u same questions to you as to the others: what version, embedding model, setup, all document settings, did you try what was recommended above and are you affected by what imbible shared here (ollama issue)?
Author
Owner

@2fst4u commented on GitHub (Nov 29, 2025):

I've tried these settings attached Screenshot_20251130_115606_Firefox.jpg, I've tried turning off hybrid, I've tried sentence transformers, I've tried different chunk sizes and top k. I don't think there's a single setting I have t tried to modify and it still freeze the webui sometimes for 30 minutes or so.

The logs show nothing, it just sits waiting.

I'm running on kubernetes and the resources usage is just idling while it does this, it's not pinning CPU while it waits.

<!-- gh-comment-id:3592009787 --> @2fst4u commented on GitHub (Nov 29, 2025): I've tried these settings attached ![Screenshot_20251130_115606_Firefox.jpg](https://github.com/user-attachments/assets/303d4fe5-8f5e-4f32-858a-ff86da50deae), I've tried turning off hybrid, I've tried sentence transformers, I've tried different chunk sizes and top k. I don't think there's a single setting I have t tried to modify and it still freeze the webui sometimes for 30 minutes or so. The logs show nothing, it just sits waiting. I'm running on kubernetes and the resources usage is just idling while it does this, it's not pinning CPU while it waits.
Author
Owner

@Classic298 commented on GitHub (Nov 29, 2025):

any debug logs? Again, on what version specifically are you? how many users? what database? what vector database? Since you're running a multi-worker environment; did you setup REDIS and all related settings correctly? How does this freeze manifest? When uploading a single small file it is stuck for 30 minutes? Do you use docker or pip installation? please update python-socketio to 5.15.0 to prevent Redis issues (might cause issues here as well)

<!-- gh-comment-id:3592011916 --> @Classic298 commented on GitHub (Nov 29, 2025): any debug logs? Again, on what version specifically are you? how many users? what database? what vector database? Since you're running a multi-worker environment; did you setup REDIS and all related settings correctly? How does this freeze manifest? When uploading a single small file it is stuck for 30 minutes? Do you use docker or pip installation? please update python-socketio to 5.15.0 to prevent Redis issues (might cause issues here as well)
Author
Owner

@Classic298 commented on GitHub (Nov 29, 2025):

Gonna need a lot more information here.
And what do you mean by "for a while now" - when did it start? After an update or just during normal usage and not particularly after an update? If it was after an update, after which version and on what version are you now?

<!-- gh-comment-id:3592012002 --> @Classic298 commented on GitHub (Nov 29, 2025): Gonna need a lot more information here. And what do you mean by "for a while now" - when did it start? After an update or just during normal usage and not particularly after an update? If it was after an update, after which version and on what version are you now?
Author
Owner

@2fst4u commented on GitHub (Nov 29, 2025):

any debug logs? on what version specifically are you? how many users? what database? Since you're running a multi-worker environment; did you setup REDIS and all related settings correctly? How does this freeze manifest? When uploading a single small file it is stuck for 30 minutes? Do you use docker or pip installation? If pip: please update python-socketio to 5.15.0 to prevent Redis issues (might cause issues here as well).

No debug logs, I didn't think to check if that's an option on helm so I'll have to find it and enable it.

1 user, me.

Only one worker. Yes redis is enabled and working via helm.

When doing any query with searching web enabled, the last log entry is something like "saving to vectordb".

<!-- gh-comment-id:3592013399 --> @2fst4u commented on GitHub (Nov 29, 2025): > any debug logs? on what version specifically are you? how many users? what database? Since you're running a multi-worker environment; did you setup REDIS and all related settings correctly? How does this freeze manifest? When uploading a single small file it is stuck for 30 minutes? Do you use docker or pip installation? If pip: please update python-socketio to 5.15.0 to prevent Redis issues (might cause issues here as well). No debug logs, I didn't think to check if that's an option on helm so I'll have to find it and enable it. 1 user, me. Only one worker. Yes redis is enabled and working via helm. When doing any query with searching web enabled, the last log entry is something like "saving to vectordb".
Author
Owner

@Classic298 commented on GitHub (Nov 29, 2025):

Please do answer ALL questions, I cannot even attempt to help you with this little information

Since you keep answering vaguely to my questions above, i have prepared this full checklist for you I need answered.

If the issue is related to "freezing" during web search only, it might also be that your web loader is super slow for some reason - but again i need much more details here, much more. You have not even stated yet when this freezing occurs and when it doesn't. Please answer EVERYTHING.

  • Open WebUI Version (e.g., v0.6.40):
  • Installation Method (Kubernetes/Helm, Docker Compose, Pip):
  • Kubernetes Helm Chart Version (if applicable):
  • Container/Pod Resource Limits (CPU & RAM allocated):
  • Where did you deploy this on? Your phone? A raspberry pi? A server? Your PC? In the cloud?
  • Kubernetes Storage Class for /app/backend/data (Must specify: NFS, Azure Files, EBS, Longhorn, or Local Path):
  • Vector Database Type (Internal ChromaDB or External PGVector or entirely different Vector DB):
  • Where is the Vector DB stored
  • How fast is the storage and / or the network connection to the vector DB
  • How LARGE is your vector DB
  • How large is your open webui installation
  • Database (Internal SQLite or PostgreSQL):
  • Where is the database installed
  • How is the storage and or network speed?
  • How LARGE is the database?
  • Redis Configuration (Enabled/Disabled, Internal/External and ALL related environment variables):
  • Worker Count (Number of uvicorn workers):
  • Embedding Engine used for reproduction steps (Ollama, OpenAI, Azure, SentenceTransformers):
  • Embedding Model Name used for reproduction steps:
  • Embedding Batch Size used for reproduction steps (RAG_EMBEDDING_OPENAI_BATCH_SIZE):
  • Async Embedding Processing (Enabled or Disabled in UI):
  • Web Search Engine (SearxNG, Google, Tavily, etc.):
  • Web Search Result Count:
  • Web Search Concurrent Requests Limit:
  • Python-SocketIO Version you currently have:
  • Exact Manifestation of Freeze (Time duration, specific trigger action):
  • CPU/Memory Usage during Freeze (Is it idling or spiking?):
  • DEBUG Logs (Set LOG_LEVEL=DEBUG, reproduce issue, provide last 100 lines):
  • And most importantly: WHERE and HOW does the freeze manifest
    • EXACT reproduction steps - fully reproducible by you.
    • With video recording so I can understand.
    • Is it only web search?
    • Is it when uploading a file? Both?
    • How does the freeze look like?
    • Does the bug occur on old chats only (long?) or also on new, empty chats?
    • When uploading a single small file it is stuck for 30 minutes there too?
    • "The logs show nothing, it just sits waiting." even on Debug?
  • What is your THREAD POOL SIZE env var?
  • And what do you mean by "for a while now" - when did it start? After an update or just during normal usage and not particularly after an update? If it was after an update, after which version and on what version are you now?
  • FULL list of all env vars you configured.
<!-- gh-comment-id:3592020592 --> @Classic298 commented on GitHub (Nov 29, 2025): Please do answer ALL questions, I cannot even attempt to help you with this little information Since you keep answering vaguely to my questions above, i have prepared this full checklist for you I need answered. If the issue is related to "freezing" during web search only, it might also be that your web loader is super slow for some reason - but again i need much more details here, much more. You have not even stated yet when this freezing occurs and when it doesn't. Please answer EVERYTHING. - [ ] Open WebUI Version (e.g., v0.6.40): - [ ] Installation Method (Kubernetes/Helm, Docker Compose, Pip): - [ ] Kubernetes Helm Chart Version (if applicable): - [ ] Container/Pod Resource Limits (CPU & RAM allocated): - [ ] Where did you deploy this on? Your phone? A raspberry pi? A server? Your PC? In the cloud? - [ ] Kubernetes Storage Class for `/app/backend/data` (Must specify: NFS, Azure Files, EBS, Longhorn, or Local Path): - [ ] Vector Database Type (Internal ChromaDB or External PGVector or entirely different Vector DB): - [ ] Where is the Vector DB stored - [ ] How fast is the storage and / or the network connection to the vector DB - [ ] How LARGE is your vector DB - [ ] How large is your open webui installation - [ ] Database (Internal SQLite or PostgreSQL): - [ ] Where is the database installed - [ ] How is the storage and or network speed? - [ ] How LARGE is the database? - [ ] Redis Configuration (Enabled/Disabled, Internal/External and ALL related environment variables): - [ ] Worker Count (Number of uvicorn workers): - [ ] Embedding Engine used for reproduction steps (Ollama, OpenAI, Azure, SentenceTransformers): - [ ] Embedding Model Name used for reproduction steps: - [ ] Embedding Batch Size used for reproduction steps (`RAG_EMBEDDING_OPENAI_BATCH_SIZE`): - [ ] Async Embedding Processing (Enabled or Disabled in UI): - [ ] Web Search Engine (SearxNG, Google, Tavily, etc.): - [ ] Web Search Result Count: - [ ] Web Search Concurrent Requests Limit: - [ ] Python-SocketIO Version you currently have: - [ ] Exact Manifestation of Freeze (Time duration, specific trigger action): - [ ] CPU/Memory Usage during Freeze (Is it idling or spiking?): - [ ] DEBUG Logs (Set `LOG_LEVEL=DEBUG`, reproduce issue, provide last 100 lines): - [ ] And most importantly: WHERE and HOW does the freeze manifest - [ ] EXACT reproduction steps - fully reproducible by you. - [ ] With video recording so I can understand. - [ ] Is it only web search? - [ ] Is it when uploading a file? Both? - [ ] How does the freeze look like? - [ ] Does the bug occur on old chats only (long?) or also on new, empty chats? - [ ] When uploading a single small file it is stuck for 30 minutes there too? - [ ] "The logs show nothing, it just sits waiting." even on Debug? - [ ] What is your THREAD POOL SIZE env var? - [ ] And what do you mean by "for a while now" - when did it start? After an update or just during normal usage and not particularly after an update? If it was after an update, after which version and on what version are you now? - [ ] FULL list of all env vars you configured.
Author
Owner

@Classic298 commented on GitHub (Nov 29, 2025):

PS: @2fst4u your RAG settings need optimization, to say it kindly - a chunk size of 200 is the opposite of optimal. It is very suboptimal.

If you upload ANY document, you will create 10x more chunks, vectors, embeddings than anyone else - spamming your vector database with semantically useless data and wasting 10x more storage space than necessary, slowing down your system and causing insane retrieval slowness because everytime, 10x the amount of chunks will have to be searched, filtered and approximated.

So just saying, this could also be the culprit depending on your setup
Of course - if you have a powerful PC behind this setup, then the 200 chunk size is not the issue - but if whatever is running your setup is not powerful, then the 200 chunk size has cost you performance, quality, and storage - in a negative way - and a lot of it too.

If you upload a moderately sized document (say, 50 pages), a normal setup would write maybe 50-100 vectors.
Your's would probably write 1000+ vectors.

This kills database performance, this kills retrieval performance, this kills your storage, this kills I/O speed and just everything else. Also it's expensive to call the embedding model 10x more than you would have to if you'd use a chunk size of 1500-2000

Finally, last AND least: your data is making three roundtrips - you are first sending it to mistral OCR (btw - it cannot process all filetypes, I hope you are aware of that), then to the embedding model (a lot) and finally and only then to the reranker - and all three are external. The reranker and the embedding model having 10x the tasks because of the small chunk size setting.

<!-- gh-comment-id:3592035476 --> @Classic298 commented on GitHub (Nov 29, 2025): PS: @2fst4u your RAG settings need optimization, to say it kindly - a chunk size of 200 is the opposite of optimal. It is very suboptimal. If you upload ANY document, you will create 10x more chunks, vectors, embeddings than anyone else - spamming your vector database with semantically useless data and wasting 10x more storage space than necessary, slowing down your system and causing insane retrieval slowness because everytime, 10x the amount of chunks will have to be searched, filtered and approximated. So just saying, this could also be the culprit depending on your setup Of course - if you have a powerful PC behind this setup, then the 200 chunk size is not the issue - but if whatever is running your setup is not powerful, then the 200 chunk size has cost you performance, quality, and storage - in a negative way - and a lot of it too. If you upload a moderately sized document (say, 50 pages), a normal setup would write maybe 50-100 vectors. Your's would probably write 1000+ vectors. This kills database performance, this kills retrieval performance, this kills your storage, this kills I/O speed and just everything else. Also it's expensive to call the embedding model 10x more than you would have to if you'd use a chunk size of 1500-2000 Finally, last AND least: your data is making three roundtrips - you are first sending it to mistral OCR (btw - it cannot process all filetypes, I hope you are aware of that), then to the embedding model (a lot) and finally and only then to the reranker - and all three are external. The reranker and the embedding model having 10x the tasks because of the small chunk size setting.
Author
Owner

@2fst4u commented on GitHub (Nov 30, 2025):

Alrighty so if I make chunks 2000, disable mistral and make it the default, and make sentence transformers default like I mentioned then it still freezes. The database is internal sqlite.

These are the last entries is shows when this happens

2025-11-30 15:04:01.854 | DEBUG    | open_webui.retrieval.utils:agenerate_openai_batch_embeddings:583 - agenerate_openai_batch_embeddings:model text-embedding-3-small batch size: 1
2025-11-30 15:04:02.090 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 108.162.249.37:0 - "GET / HTTP/1.1" 200
2025-11-30 15:04:05.234 | DEBUG    | open_webui.retrieval.utils:async_embedding_function:847 - generate_multiple_async: Generated 399 embeddings from 399 parallel batches
2025-11-30 15:04:05.235 | INFO     | open_webui.routers.retrieval:save_docs_to_vector_db:1407 - embeddings generated 399 for 399 items
2025-11-30 15:04:05.236 | INFO     | open_webui.routers.retrieval:save_docs_to_vector_db:1419 - adding to collection web-search-94e85d04c57745cd7911f8ef5c583b2e67afdcd0d3aa12037e69

When it freezes it literally just stops responding. I don't know how else to explain the interface freezing to you. It doesn't respond and after some time it starts responding again.

<!-- gh-comment-id:3592115449 --> @2fst4u commented on GitHub (Nov 30, 2025): Alrighty so if I make chunks 2000, disable mistral and make it the default, and make sentence transformers default like I mentioned then it still freezes. The database is internal sqlite. These are the last entries is shows when this happens ``` 2025-11-30 15:04:01.854 | DEBUG | open_webui.retrieval.utils:agenerate_openai_batch_embeddings:583 - agenerate_openai_batch_embeddings:model text-embedding-3-small batch size: 1 2025-11-30 15:04:02.090 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 108.162.249.37:0 - "GET / HTTP/1.1" 200 2025-11-30 15:04:05.234 | DEBUG | open_webui.retrieval.utils:async_embedding_function:847 - generate_multiple_async: Generated 399 embeddings from 399 parallel batches 2025-11-30 15:04:05.235 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1407 - embeddings generated 399 for 399 items 2025-11-30 15:04:05.236 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1419 - adding to collection web-search-94e85d04c57745cd7911f8ef5c583b2e67afdcd0d3aa12037e69 ``` When it freezes it literally just stops responding. I don't know how else to explain the interface freezing to you. It doesn't respond and after some time it starts responding again.
Author
Owner

@rgaricano commented on GitHub (Nov 30, 2025):

Chunk size of hundreds are optimal for embedd code, that shouldn't be a problem.

Mainly it's a rate limit of the embedd service.

IndexError: list index out of range in the RAG embedding pipeline occurs because the embedding function returns an empty list while there are still text chunks to process.

It can be due to by a:

  • Embedding API failure: The embedding service (OpenAI, Ollama, etc.) may be unreachable or returning errors
  • Authentication issues: Invalid API keys for the embedding service
  • Rate limiting: Embedding service rate limits exceeded
  • Empty content after processing: Text extraction may result in empty strings
  • Network connectivity: Connection issues with external embedding services

@FBH93 , Frederik,
The HTTP 429 "Too Many Requests" error log returned by your embedding service confirms the root cause: it's limited. This rate limiting causes the embedding function to fail and return None, which leads to the empty embeddings list and subsequent IndexError.

Workarounds, immediate:

  • Reduce Batch Size (env var)
# Set to 1 to avoid rate limiting  
RAG_EMBEDDING_BATCH_SIZE=1
  • Check Embedding Service Configuration and its limits.
  • Use Local Embedding Model (to avoid API rate limits).

As the regular OpenAI embedding function lacks retry logic for 429 errors (Azure OpenAI has it), a permanent solution for prevent its could be add retry logic to embedding function:

140605e660/backend/open_webui/retrieval/utils.py (L535-L609)

with retry logic

def generate_openai_batch_embeddings(
    model: str,
    texts: list[str],
    url: str = "https://api.openai.com/v1",
    key: str = "",
    prefix: str = None,
    user: UserModel = None,
) -> Optional[list[list[float]]]:
    try:
        log.debug(
            f"generate_openai_batch_embeddings:model {model} batch size: {len(texts)}"
        )
        json_data = {"input": texts, "model": model}
        if isinstance(RAG_EMBEDDING_PREFIX_FIELD_NAME, str) and isinstance(prefix, str):
            json_data[RAG_EMBEDDING_PREFIX_FIELD_NAME] = prefix

        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {key}",
        }
        if ENABLE_FORWARD_USER_INFO_HEADERS and user:
            headers = include_user_info_headers(headers, user)

        # Add retry logic for rate limiting
        for attempt in range(5):
            r = requests.post(
                f"{url}/embeddings",
                headers=headers,
                json=json_data,
            )
            if r.status_code == 429:
                retry = float(r.headers.get("Retry-After", "1"))
                time.sleep(retry)
                continue
            r.raise_for_status()
            data = r.json()
            if "data" in data:
                return [elem["embedding"] for elem in data["data"]]
            else:
                raise Exception("Something went wrong :/")
        return None
    except Exception as e:
        log.exception(f"Error generating openai batch embeddings: {e}")
        return None

async def agenerate_openai_batch_embeddings(
    model: str,
    texts: list[str],
    url: str = "https://api.openai.com/v1",
    key: str = "",
    prefix: str = None,
    user: UserModel = None,
) -> Optional[list[list[float]]]:
    try:
        log.debug(
            f"agenerate_openai_batch_embeddings:model {model} batch size: {len(texts)}"
        )
        form_data = {"input": texts, "model": model}
        if isinstance(RAG_EMBEDDING_PREFIX_FIELD_NAME, str) and isinstance(prefix, str):
            form_data[RAG_EMBEDDING_PREFIX_FIELD_NAME] = prefix

        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {key}",
        }
        if ENABLE_FORWARD_USER_INFO_HEADERS and user:
            headers = include_user_info_headers(headers, user)

        # Add retry logic for rate limiting
        for attempt in range(5):
            async with aiohttp.ClientSession(trust_env=True) as session:
                async with session.post(
                    f"{url}/embeddings", headers=headers, json=form_data
                ) as r:
                    if r.status == 429:
                        retry = float(r.headers.get("Retry-After", "1"))
                        await asyncio.sleep(retry)
                        continue
                    r.raise_for_status()
                    data = await r.json()
                    if "data" in data:
                        return [item["embedding"] for item in data["data"]]
                    else:
                        raise Exception("Something went wrong :/")
        return None
    except Exception as e:
        log.exception(f"Error generating openai batch embeddings: {e}")
        return None
<!-- gh-comment-id:3592165770 --> @rgaricano commented on GitHub (Nov 30, 2025): Chunk size of hundreds are optimal for embedd code, that shouldn't be a problem. Mainly it's a rate limit of the embedd service. `IndexError: list index out of range` in the RAG embedding pipeline occurs because the embedding function returns an empty list while there are still text chunks to process. It can be due to by a: - Embedding API failure: The embedding service (OpenAI, Ollama, etc.) may be unreachable or returning errors - Authentication issues: Invalid API keys for the embedding service - Rate limiting: Embedding service rate limits exceeded - Empty content after processing: Text extraction may result in empty strings - Network connectivity: Connection issues with external embedding services @FBH93 , Frederik, The HTTP 429 "Too Many Requests" error log returned by your embedding service confirms the root cause: it's limited. This rate limiting causes the embedding function to fail and return None, which leads to the empty embeddings list and subsequent IndexError. Workarounds, immediate: - Reduce Batch Size (env var) ``` # Set to 1 to avoid rate limiting RAG_EMBEDDING_BATCH_SIZE=1 ``` - Check Embedding Service Configuration and its limits. - Use Local Embedding Model (to avoid API rate limits). As the regular OpenAI embedding function lacks retry logic for 429 errors (Azure OpenAI has it), a permanent solution for prevent its could be add retry logic to embedding function: https://github.com/open-webui/open-webui/blob/140605e660b8186a7d5c79fb3be6ffb147a2f498/backend/open_webui/retrieval/utils.py#L535-L609 with retry logic ``` def generate_openai_batch_embeddings( model: str, texts: list[str], url: str = "https://api.openai.com/v1", key: str = "", prefix: str = None, user: UserModel = None, ) -> Optional[list[list[float]]]: try: log.debug( f"generate_openai_batch_embeddings:model {model} batch size: {len(texts)}" ) json_data = {"input": texts, "model": model} if isinstance(RAG_EMBEDDING_PREFIX_FIELD_NAME, str) and isinstance(prefix, str): json_data[RAG_EMBEDDING_PREFIX_FIELD_NAME] = prefix headers = { "Content-Type": "application/json", "Authorization": f"Bearer {key}", } if ENABLE_FORWARD_USER_INFO_HEADERS and user: headers = include_user_info_headers(headers, user) # Add retry logic for rate limiting for attempt in range(5): r = requests.post( f"{url}/embeddings", headers=headers, json=json_data, ) if r.status_code == 429: retry = float(r.headers.get("Retry-After", "1")) time.sleep(retry) continue r.raise_for_status() data = r.json() if "data" in data: return [elem["embedding"] for elem in data["data"]] else: raise Exception("Something went wrong :/") return None except Exception as e: log.exception(f"Error generating openai batch embeddings: {e}") return None async def agenerate_openai_batch_embeddings( model: str, texts: list[str], url: str = "https://api.openai.com/v1", key: str = "", prefix: str = None, user: UserModel = None, ) -> Optional[list[list[float]]]: try: log.debug( f"agenerate_openai_batch_embeddings:model {model} batch size: {len(texts)}" ) form_data = {"input": texts, "model": model} if isinstance(RAG_EMBEDDING_PREFIX_FIELD_NAME, str) and isinstance(prefix, str): form_data[RAG_EMBEDDING_PREFIX_FIELD_NAME] = prefix headers = { "Content-Type": "application/json", "Authorization": f"Bearer {key}", } if ENABLE_FORWARD_USER_INFO_HEADERS and user: headers = include_user_info_headers(headers, user) # Add retry logic for rate limiting for attempt in range(5): async with aiohttp.ClientSession(trust_env=True) as session: async with session.post( f"{url}/embeddings", headers=headers, json=form_data ) as r: if r.status == 429: retry = float(r.headers.get("Retry-After", "1")) await asyncio.sleep(retry) continue r.raise_for_status() data = await r.json() if "data" in data: return [item["embedding"] for item in data["data"]] else: raise Exception("Something went wrong :/") return None except Exception as e: log.exception(f"Error generating openai batch embeddings: {e}") return None ```
Author
Owner

@Classic298 commented on GitHub (Nov 30, 2025):

@2fst4u everything @rgaricano said - plus: changing the chunk size now will not delete data from your vector database. I am assuming your vector database is huge in size, and if your instance runs on a weak device then that won't really help all the I/O operations that are necessary for semantic seach

<!-- gh-comment-id:3592407745 --> @Classic298 commented on GitHub (Nov 30, 2025): @2fst4u everything @rgaricano said - plus: changing the chunk size now will not delete data from your vector database. I am assuming your vector database is huge in size, and if your instance runs on a weak device then that won't really help all the I/O operations that are necessary for semantic seach
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#34400