[GH-ISSUE #19423] issue: Embedding regression #57538

Closed
opened 2026-05-05 21:04:35 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @scheatkode on GitHub (Nov 24, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19423

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.38

Ollama Version (if applicable)

N/A

Operating System

Arch Linux

Browser (if applicable)

Librewolf

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Embedding works.

Actual Behavior

Embedding doesn't work, we get a IndexError: list index out of range because the embedding process isn't handling 429 Too Many Requests gracefully with exponential backoff or otherwise; therefore the embeddings list doesn't hold enough items.

This is likely a regression from #19296.

Steps to Reproduce

Using llama-swap:

  Qwen/Qwen3-Embedding-0.6B:
    description: Small, fast, accurate embedding model
    macros:
      default_ctx: 20480 # 20k
    ttl: 120
    cmd: |
      llama.cpp/llama-server
      --swa-full
      --ctx-size ${default_ctx}
      --flash-attn on
      --device Vulkan1
      --batch-size 512
      --ubatch-size 2048
      --parallel 10
      --hf-repo Qwen/Qwen3-Embedding-0.6B-GGUF
      --hf-file Qwen3-Embedding-0.6B-f16.gguf
      --model ./models/Qwen3-Embedding-0.6B.gguf
      --jinja
      --embedding
      --port ${PORT}

Configure Open-WebUI accordingly and run a web search.

Logs & Screenshots

chat-1               | 2025-11-24 13:32:49.020 | ERROR    | open_webui.retrieval.utils:agenerate_openai_batch_embeddings:608 - Error generating openai batch embeddings: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings'
chat-1               | Traceback (most recent call last):
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
chat-1               |     self._bootstrap_inner()
chat-1               |     │    └ <function Thread._bootstrap_inner at 0x7f820d4449a0>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140192631486144)>
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
chat-1               |     self.run()
chat-1               |     │    └ <function WorkerThread.run at 0x7f81c541df80>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140192631486144)>
chat-1               |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run
chat-1               |     result = context.run(func, *args)
chat-1               |              │       │   │      └ ()
chat-1               |              │       │   └ functools.partial(<function save_docs_to_vector_db at 0x7f81c76e31a0>, <starlette.requests.Request object at 0x7f81c328cbd0>,...
chat-1               |              │       └ <method 'run' of '_contextvars.Context' objects>
chat-1               |              └ <_contextvars.Context object at 0x7f820ac02680>
chat-1               |
chat-1               |   File "/app/backend/open_webui/routers/retrieval.py", line 1472, in save_docs_to_vector_db
chat-1               |     embeddings = asyncio.run(
chat-1               |                  │       └ <function run at 0x7f820caf53a0>
chat-1               |                  └ <module 'asyncio' from '/usr/local/lib/python3.11/asyncio/__init__.py'>
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
chat-1               |     return runner.run(main)
chat-1               |            │      │   └ <coroutine object get_embedding_function.<locals>.async_embedding_function at 0x7f8123110040>
chat-1               |            │      └ <function Runner.run at 0x7f820c96cf40>
chat-1               |            └ <asyncio.runners.Runner object at 0x7f8154245950>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
chat-1               |     return self._loop.run_until_complete(task)
chat-1               |            │    │     │                  └ <Task pending name='Task-372' coro=<get_embedding_function.<locals>.async_embedding_function() running at /app/backend/open_w...
chat-1               |            │    │     └ <function BaseEventLoop.run_until_complete at 0x7f820c96ab60>
chat-1               |            │    └ <_UnixSelectorEventLoop running=True closed=False debug=False>
chat-1               |            └ <asyncio.runners.Runner object at 0x7f8154245950>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
chat-1               |     self.run_forever()
chat-1               |     │    └ <function BaseEventLoop.run_forever at 0x7f820c96aac0>
chat-1               |     └ <_UnixSelectorEventLoop running=True closed=False debug=False>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
chat-1               |     self._run_once()
chat-1               |     │    └ <function BaseEventLoop._run_once at 0x7f820c96c900>
chat-1               |     └ <_UnixSelectorEventLoop running=True closed=False debug=False>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
chat-1               |     handle._run()
chat-1               |     │      └ <function Handle._run at 0x7f820caaeb60>
chat-1               |     └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/events.py", line 84, in _run
chat-1               |     self._context.run(self._callback, *self._args)
chat-1               |     │    │            │    │           │    └ <member '_args' of 'Handle' objects>
chat-1               |     │    │            │    │           └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |     │    │            │    └ <member '_callback' of 'Handle' objects>
chat-1               |     │    │            └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |     │    └ <member '_context' of 'Handle' objects>
chat-1               |     └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |
chat-1               |   File "/app/backend/open_webui/retrieval/utils.py", line 878, in generate_embeddings
chat-1               |     embeddings = await agenerate_openai_batch_embeddings(
chat-1               |                        └ <function agenerate_openai_batch_embeddings at 0x7f81c771d300>
chat-1               |
chat-1               | > File "/app/backend/open_webui/retrieval/utils.py", line 601, in agenerate_openai_batch_embeddings
chat-1               |     r.raise_for_status()
chat-1               |     │ └ <function ClientResponse.raise_for_status at 0x7f820a6bde40>
chat-1               |     └ <ClientResponse(http://host.docker.internal:3089/v1/embeddings) [429 Too Many Requests]>
chat-1               |       <CIMultiDictProxy('Content-Type': 't...
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 629, in raise_for_status
chat-1               |     raise ClientResponseError(
chat-1               |           └ <class 'aiohttp.client_exceptions.ClientResponseError'>
chat-1               |
chat-1               | aiohttp.client_exceptions.ClientResponseError: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings'

Later:

chat-1               | Traceback (most recent call last):
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
chat-1               |     self._bootstrap_inner()
chat-1               |     │    └ <function Thread._bootstrap_inner at 0x7f5e7fe4c9a0>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140040084645568)>
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
chat-1               |     self.run()
chat-1               |     │    └ <function WorkerThread.run at 0x7f5e37e1df80>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140040084645568)>
chat-1               |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run
chat-1               |     result = context.run(func, *args)
chat-1               |              │       │   │      └ ()
chat-1               |              │       │   └ functools.partial(<function save_docs_to_vector_db at 0x7f5e3a0e31a0>, <starlette.requests.Request object at 0x7f5e343b8fd0>,...
chat-1               |              │       └ <method 'run' of '_contextvars.Context' objects>
chat-1               |              └ <_contextvars.Context object at 0x7f5e35c95640>
chat-1               |
chat-1               | > File "/app/backend/open_webui/routers/retrieval.py", line 1486, in save_docs_to_vector_db
chat-1               |     items = [
chat-1               |
chat-1               |   File "/app/backend/open_webui/routers/retrieval.py", line 1490, in <listcomp>
chat-1               |     "vector": embeddings[idx],
chat-1               |               │          └ 11
chat-1               |               └ [[-0.024991527199745178, 0.04560912773013115, 0.0008289961260743439, -0.02457759529352188, -0.0019923101644963026, 0.04161553...
chat-1               |
chat-1               | IndexError: list index out of range

Additional Information

Current workaround: Use local SentenceTransformers.

Originally created by @scheatkode on GitHub (Nov 24, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/19423 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.38 ### Ollama Version (if applicable) N/A ### Operating System Arch Linux ### Browser (if applicable) Librewolf ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Embedding works. ### Actual Behavior Embedding doesn't work, we get a `IndexError: list index out of range` because the embedding process isn't handling `429 Too Many Requests` gracefully with exponential backoff or otherwise; therefore the [embeddings list](https://github.com/open-webui/open-webui/pull/19296/files#diff-65faf260bc7bceb8e36a6178928cc13d2934be3d4c842a8593dc379e929cd6eeR1471) doesn't hold enough items. This is likely a regression from #19296. ### Steps to Reproduce Using `llama-swap`: ```yaml Qwen/Qwen3-Embedding-0.6B: description: Small, fast, accurate embedding model macros: default_ctx: 20480 # 20k ttl: 120 cmd: | llama.cpp/llama-server --swa-full --ctx-size ${default_ctx} --flash-attn on --device Vulkan1 --batch-size 512 --ubatch-size 2048 --parallel 10 --hf-repo Qwen/Qwen3-Embedding-0.6B-GGUF --hf-file Qwen3-Embedding-0.6B-f16.gguf --model ./models/Qwen3-Embedding-0.6B.gguf --jinja --embedding --port ${PORT} ``` Configure Open-WebUI accordingly and run a web search. ### Logs & Screenshots ``` chat-1 | 2025-11-24 13:32:49.020 | ERROR | open_webui.retrieval.utils:agenerate_openai_batch_embeddings:608 - Error generating openai batch embeddings: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings' chat-1 | Traceback (most recent call last): chat-1 | chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap chat-1 | self._bootstrap_inner() chat-1 | │ └ <function Thread._bootstrap_inner at 0x7f820d4449a0> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140192631486144)> chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner chat-1 | self.run() chat-1 | │ └ <function WorkerThread.run at 0x7f81c541df80> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140192631486144)> chat-1 | File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run chat-1 | result = context.run(func, *args) chat-1 | │ │ │ └ () chat-1 | │ │ └ functools.partial(<function save_docs_to_vector_db at 0x7f81c76e31a0>, <starlette.requests.Request object at 0x7f81c328cbd0>,... chat-1 | │ └ <method 'run' of '_contextvars.Context' objects> chat-1 | └ <_contextvars.Context object at 0x7f820ac02680> chat-1 | chat-1 | File "/app/backend/open_webui/routers/retrieval.py", line 1472, in save_docs_to_vector_db chat-1 | embeddings = asyncio.run( chat-1 | │ └ <function run at 0x7f820caf53a0> chat-1 | └ <module 'asyncio' from '/usr/local/lib/python3.11/asyncio/__init__.py'> chat-1 | chat-1 | File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run chat-1 | return runner.run(main) chat-1 | │ │ └ <coroutine object get_embedding_function.<locals>.async_embedding_function at 0x7f8123110040> chat-1 | │ └ <function Runner.run at 0x7f820c96cf40> chat-1 | └ <asyncio.runners.Runner object at 0x7f8154245950> chat-1 | File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run chat-1 | return self._loop.run_until_complete(task) chat-1 | │ │ │ └ <Task pending name='Task-372' coro=<get_embedding_function.<locals>.async_embedding_function() running at /app/backend/open_w... chat-1 | │ │ └ <function BaseEventLoop.run_until_complete at 0x7f820c96ab60> chat-1 | │ └ <_UnixSelectorEventLoop running=True closed=False debug=False> chat-1 | └ <asyncio.runners.Runner object at 0x7f8154245950> chat-1 | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete chat-1 | self.run_forever() chat-1 | │ └ <function BaseEventLoop.run_forever at 0x7f820c96aac0> chat-1 | └ <_UnixSelectorEventLoop running=True closed=False debug=False> chat-1 | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 608, in run_forever chat-1 | self._run_once() chat-1 | │ └ <function BaseEventLoop._run_once at 0x7f820c96c900> chat-1 | └ <_UnixSelectorEventLoop running=True closed=False debug=False> chat-1 | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once chat-1 | handle._run() chat-1 | │ └ <function Handle._run at 0x7f820caaeb60> chat-1 | └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | File "/usr/local/lib/python3.11/asyncio/events.py", line 84, in _run chat-1 | self._context.run(self._callback, *self._args) chat-1 | │ │ │ │ │ └ <member '_args' of 'Handle' objects> chat-1 | │ │ │ │ └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | │ │ │ └ <member '_callback' of 'Handle' objects> chat-1 | │ │ └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | │ └ <member '_context' of 'Handle' objects> chat-1 | └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | chat-1 | File "/app/backend/open_webui/retrieval/utils.py", line 878, in generate_embeddings chat-1 | embeddings = await agenerate_openai_batch_embeddings( chat-1 | └ <function agenerate_openai_batch_embeddings at 0x7f81c771d300> chat-1 | chat-1 | > File "/app/backend/open_webui/retrieval/utils.py", line 601, in agenerate_openai_batch_embeddings chat-1 | r.raise_for_status() chat-1 | │ └ <function ClientResponse.raise_for_status at 0x7f820a6bde40> chat-1 | └ <ClientResponse(http://host.docker.internal:3089/v1/embeddings) [429 Too Many Requests]> chat-1 | <CIMultiDictProxy('Content-Type': 't... chat-1 | chat-1 | File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 629, in raise_for_status chat-1 | raise ClientResponseError( chat-1 | └ <class 'aiohttp.client_exceptions.ClientResponseError'> chat-1 | chat-1 | aiohttp.client_exceptions.ClientResponseError: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings' ``` Later: ``` chat-1 | Traceback (most recent call last): chat-1 | chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap chat-1 | self._bootstrap_inner() chat-1 | │ └ <function Thread._bootstrap_inner at 0x7f5e7fe4c9a0> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140040084645568)> chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner chat-1 | self.run() chat-1 | │ └ <function WorkerThread.run at 0x7f5e37e1df80> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140040084645568)> chat-1 | File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run chat-1 | result = context.run(func, *args) chat-1 | │ │ │ └ () chat-1 | │ │ └ functools.partial(<function save_docs_to_vector_db at 0x7f5e3a0e31a0>, <starlette.requests.Request object at 0x7f5e343b8fd0>,... chat-1 | │ └ <method 'run' of '_contextvars.Context' objects> chat-1 | └ <_contextvars.Context object at 0x7f5e35c95640> chat-1 | chat-1 | > File "/app/backend/open_webui/routers/retrieval.py", line 1486, in save_docs_to_vector_db chat-1 | items = [ chat-1 | chat-1 | File "/app/backend/open_webui/routers/retrieval.py", line 1490, in <listcomp> chat-1 | "vector": embeddings[idx], chat-1 | │ └ 11 chat-1 | └ [[-0.024991527199745178, 0.04560912773013115, 0.0008289961260743439, -0.02457759529352188, -0.0019923101644963026, 0.04161553... chat-1 | chat-1 | IndexError: list index out of range ``` ### Additional Information Current workaround: Use local SentenceTransformers.
GiteaMirror added the bug label 2026-05-05 21:04:35 -05:00
Author
Owner

@Classic298 commented on GitHub (Nov 24, 2025):

Probably duplicate, please post your steps to repro and detailed setup info there.

<!-- gh-comment-id:3570866029 --> @Classic298 commented on GitHub (Nov 24, 2025): Probably duplicate, please post your steps to repro and detailed setup info there.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#57538