[GH-ISSUE #19423] issue: Embedding regression #57538

New Issue

GiteaMirror · 2026-05-05T21:04:35-05:00

GiteaMirror commented

2026-05-05 21:04:35 -05:00

Originally created by @scheatkode on GitHub (Nov 24, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19423

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.38

Ollama Version (if applicable)

N/A

Operating System

Arch Linux

Browser (if applicable)

Librewolf

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Embedding works.

Actual Behavior

Embedding doesn't work, we get a IndexError: list index out of range because the embedding process isn't handling 429 Too Many Requests gracefully with exponential backoff or otherwise; therefore the embeddings list doesn't hold enough items.

This is likely a regression from #19296.

Steps to Reproduce

Using llama-swap:

  Qwen/Qwen3-Embedding-0.6B:
    description: Small, fast, accurate embedding model
    macros:
      default_ctx: 20480 # 20k
    ttl: 120
    cmd: |
      llama.cpp/llama-server
      --swa-full
      --ctx-size ${default_ctx}
      --flash-attn on
      --device Vulkan1
      --batch-size 512
      --ubatch-size 2048
      --parallel 10
      --hf-repo Qwen/Qwen3-Embedding-0.6B-GGUF
      --hf-file Qwen3-Embedding-0.6B-f16.gguf
      --model ./models/Qwen3-Embedding-0.6B.gguf
      --jinja
      --embedding
      --port ${PORT}

Configure Open-WebUI accordingly and run a web search.

Logs & Screenshots

chat-1               | 2025-11-24 13:32:49.020 | ERROR    | open_webui.retrieval.utils:agenerate_openai_batch_embeddings:608 - Error generating openai batch embeddings: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings'
chat-1               | Traceback (most recent call last):
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
chat-1               |     self._bootstrap_inner()
chat-1               |     │    └ <function Thread._bootstrap_inner at 0x7f820d4449a0>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140192631486144)>
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
chat-1               |     self.run()
chat-1               |     │    └ <function WorkerThread.run at 0x7f81c541df80>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140192631486144)>
chat-1               |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run
chat-1               |     result = context.run(func, *args)
chat-1               |              │       │   │      └ ()
chat-1               |              │       │   └ functools.partial(<function save_docs_to_vector_db at 0x7f81c76e31a0>, <starlette.requests.Request object at 0x7f81c328cbd0>,...
chat-1               |              │       └ <method 'run' of '_contextvars.Context' objects>
chat-1               |              └ <_contextvars.Context object at 0x7f820ac02680>
chat-1               |
chat-1               |   File "/app/backend/open_webui/routers/retrieval.py", line 1472, in save_docs_to_vector_db
chat-1               |     embeddings = asyncio.run(
chat-1               |                  │       └ <function run at 0x7f820caf53a0>
chat-1               |                  └ <module 'asyncio' from '/usr/local/lib/python3.11/asyncio/__init__.py'>
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
chat-1               |     return runner.run(main)
chat-1               |            │      │   └ <coroutine object get_embedding_function.<locals>.async_embedding_function at 0x7f8123110040>
chat-1               |            │      └ <function Runner.run at 0x7f820c96cf40>
chat-1               |            └ <asyncio.runners.Runner object at 0x7f8154245950>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
chat-1               |     return self._loop.run_until_complete(task)
chat-1               |            │    │     │                  └ <Task pending name='Task-372' coro=<get_embedding_function.<locals>.async_embedding_function() running at /app/backend/open_w...
chat-1               |            │    │     └ <function BaseEventLoop.run_until_complete at 0x7f820c96ab60>
chat-1               |            │    └ <_UnixSelectorEventLoop running=True closed=False debug=False>
chat-1               |            └ <asyncio.runners.Runner object at 0x7f8154245950>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
chat-1               |     self.run_forever()
chat-1               |     │    └ <function BaseEventLoop.run_forever at 0x7f820c96aac0>
chat-1               |     └ <_UnixSelectorEventLoop running=True closed=False debug=False>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
chat-1               |     self._run_once()
chat-1               |     │    └ <function BaseEventLoop._run_once at 0x7f820c96c900>
chat-1               |     └ <_UnixSelectorEventLoop running=True closed=False debug=False>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
chat-1               |     handle._run()
chat-1               |     │      └ <function Handle._run at 0x7f820caaeb60>
chat-1               |     └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |   File "/usr/local/lib/python3.11/asyncio/events.py", line 84, in _run
chat-1               |     self._context.run(self._callback, *self._args)
chat-1               |     │    │            │    │           │    └ <member '_args' of 'Handle' objects>
chat-1               |     │    │            │    │           └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |     │    │            │    └ <member '_callback' of 'Handle' objects>
chat-1               |     │    │            └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |     │    └ <member '_context' of 'Handle' objects>
chat-1               |     └ <Handle Task.task_wakeup(<Task finishe...> result=None>)>
chat-1               |
chat-1               |   File "/app/backend/open_webui/retrieval/utils.py", line 878, in generate_embeddings
chat-1               |     embeddings = await agenerate_openai_batch_embeddings(
chat-1               |                        └ <function agenerate_openai_batch_embeddings at 0x7f81c771d300>
chat-1               |
chat-1               | > File "/app/backend/open_webui/retrieval/utils.py", line 601, in agenerate_openai_batch_embeddings
chat-1               |     r.raise_for_status()
chat-1               |     │ └ <function ClientResponse.raise_for_status at 0x7f820a6bde40>
chat-1               |     └ <ClientResponse(http://host.docker.internal:3089/v1/embeddings) [429 Too Many Requests]>
chat-1               |       <CIMultiDictProxy('Content-Type': 't...
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 629, in raise_for_status
chat-1               |     raise ClientResponseError(
chat-1               |           └ <class 'aiohttp.client_exceptions.ClientResponseError'>
chat-1               |
chat-1               | aiohttp.client_exceptions.ClientResponseError: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings'

Later:

chat-1               | Traceback (most recent call last):
chat-1               |
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
chat-1               |     self._bootstrap_inner()
chat-1               |     │    └ <function Thread._bootstrap_inner at 0x7f5e7fe4c9a0>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140040084645568)>
chat-1               |   File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
chat-1               |     self.run()
chat-1               |     │    └ <function WorkerThread.run at 0x7f5e37e1df80>
chat-1               |     └ <WorkerThread(AnyIO worker thread, started 140040084645568)>
chat-1               |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run
chat-1               |     result = context.run(func, *args)
chat-1               |              │       │   │      └ ()
chat-1               |              │       │   └ functools.partial(<function save_docs_to_vector_db at 0x7f5e3a0e31a0>, <starlette.requests.Request object at 0x7f5e343b8fd0>,...
chat-1               |              │       └ <method 'run' of '_contextvars.Context' objects>
chat-1               |              └ <_contextvars.Context object at 0x7f5e35c95640>
chat-1               |
chat-1               | > File "/app/backend/open_webui/routers/retrieval.py", line 1486, in save_docs_to_vector_db
chat-1               |     items = [
chat-1               |
chat-1               |   File "/app/backend/open_webui/routers/retrieval.py", line 1490, in <listcomp>
chat-1               |     "vector": embeddings[idx],
chat-1               |               │          └ 11
chat-1               |               └ [[-0.024991527199745178, 0.04560912773013115, 0.0008289961260743439, -0.02457759529352188, -0.0019923101644963026, 0.04161553...
chat-1               |
chat-1               | IndexError: list index out of range

Additional Information

Current workaround: Use local SentenceTransformers.

Originally created by @scheatkode on GitHub (Nov 24, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/19423 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.38 ### Ollama Version (if applicable) N/A ### Operating System Arch Linux ### Browser (if applicable) Librewolf ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Embedding works. ### Actual Behavior Embedding doesn't work, we get a `IndexError: list index out of range` because the embedding process isn't handling `429 Too Many Requests` gracefully with exponential backoff or otherwise; therefore the [embeddings list](https://github.com/open-webui/open-webui/pull/19296/files#diff-65faf260bc7bceb8e36a6178928cc13d2934be3d4c842a8593dc379e929cd6eeR1471) doesn't hold enough items. This is likely a regression from #19296. ### Steps to Reproduce Using `llama-swap`: ```yaml Qwen/Qwen3-Embedding-0.6B: description: Small, fast, accurate embedding model macros: default_ctx: 20480 # 20k ttl: 120 cmd: | llama.cpp/llama-server --swa-full --ctx-size ${default_ctx} --flash-attn on --device Vulkan1 --batch-size 512 --ubatch-size 2048 --parallel 10 --hf-repo Qwen/Qwen3-Embedding-0.6B-GGUF --hf-file Qwen3-Embedding-0.6B-f16.gguf --model ./models/Qwen3-Embedding-0.6B.gguf --jinja --embedding --port ${PORT} ``` Configure Open-WebUI accordingly and run a web search. ### Logs & Screenshots ``` chat-1 | 2025-11-24 13:32:49.020 | ERROR | open_webui.retrieval.utils:agenerate_openai_batch_embeddings:608 - Error generating openai batch embeddings: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings' chat-1 | Traceback (most recent call last): chat-1 | chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap chat-1 | self._bootstrap_inner() chat-1 | │ └ <function Thread._bootstrap_inner at 0x7f820d4449a0> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140192631486144)> chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner chat-1 | self.run() chat-1 | │ └ <function WorkerThread.run at 0x7f81c541df80> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140192631486144)> chat-1 | File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run chat-1 | result = context.run(func, *args) chat-1 | │ │ │ └ () chat-1 | │ │ └ functools.partial(<function save_docs_to_vector_db at 0x7f81c76e31a0>, <starlette.requests.Request object at 0x7f81c328cbd0>,... chat-1 | │ └ <method 'run' of '_contextvars.Context' objects> chat-1 | └ <_contextvars.Context object at 0x7f820ac02680> chat-1 | chat-1 | File "/app/backend/open_webui/routers/retrieval.py", line 1472, in save_docs_to_vector_db chat-1 | embeddings = asyncio.run( chat-1 | │ └ <function run at 0x7f820caf53a0> chat-1 | └ <module 'asyncio' from '/usr/local/lib/python3.11/asyncio/__init__.py'> chat-1 | chat-1 | File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run chat-1 | return runner.run(main) chat-1 | │ │ └ <coroutine object get_embedding_function.<locals>.async_embedding_function at 0x7f8123110040> chat-1 | │ └ <function Runner.run at 0x7f820c96cf40> chat-1 | └ <asyncio.runners.Runner object at 0x7f8154245950> chat-1 | File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run chat-1 | return self._loop.run_until_complete(task) chat-1 | │ │ │ └ <Task pending name='Task-372' coro=<get_embedding_function.<locals>.async_embedding_function() running at /app/backend/open_w... chat-1 | │ │ └ <function BaseEventLoop.run_until_complete at 0x7f820c96ab60> chat-1 | │ └ <_UnixSelectorEventLoop running=True closed=False debug=False> chat-1 | └ <asyncio.runners.Runner object at 0x7f8154245950> chat-1 | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete chat-1 | self.run_forever() chat-1 | │ └ <function BaseEventLoop.run_forever at 0x7f820c96aac0> chat-1 | └ <_UnixSelectorEventLoop running=True closed=False debug=False> chat-1 | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 608, in run_forever chat-1 | self._run_once() chat-1 | │ └ <function BaseEventLoop._run_once at 0x7f820c96c900> chat-1 | └ <_UnixSelectorEventLoop running=True closed=False debug=False> chat-1 | File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once chat-1 | handle._run() chat-1 | │ └ <function Handle._run at 0x7f820caaeb60> chat-1 | └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | File "/usr/local/lib/python3.11/asyncio/events.py", line 84, in _run chat-1 | self._context.run(self._callback, *self._args) chat-1 | │ │ │ │ │ └ <member '_args' of 'Handle' objects> chat-1 | │ │ │ │ └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | │ │ │ └ <member '_callback' of 'Handle' objects> chat-1 | │ │ └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | │ └ <member '_context' of 'Handle' objects> chat-1 | └ <Handle Task.task_wakeup(<Task finishe...> result=None>)> chat-1 | chat-1 | File "/app/backend/open_webui/retrieval/utils.py", line 878, in generate_embeddings chat-1 | embeddings = await agenerate_openai_batch_embeddings( chat-1 | └ <function agenerate_openai_batch_embeddings at 0x7f81c771d300> chat-1 | chat-1 | > File "/app/backend/open_webui/retrieval/utils.py", line 601, in agenerate_openai_batch_embeddings chat-1 | r.raise_for_status() chat-1 | │ └ <function ClientResponse.raise_for_status at 0x7f820a6bde40> chat-1 | └ <ClientResponse(http://host.docker.internal:3089/v1/embeddings) [429 Too Many Requests]> chat-1 | <CIMultiDictProxy('Content-Type': 't... chat-1 | chat-1 | File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 629, in raise_for_status chat-1 | raise ClientResponseError( chat-1 | └ <class 'aiohttp.client_exceptions.ClientResponseError'> chat-1 | chat-1 | aiohttp.client_exceptions.ClientResponseError: 429, message='Too Many Requests', url='http://host.docker.internal:3089/v1/embeddings' ``` Later: ``` chat-1 | Traceback (most recent call last): chat-1 | chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap chat-1 | self._bootstrap_inner() chat-1 | │ └ <function Thread._bootstrap_inner at 0x7f5e7fe4c9a0> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140040084645568)> chat-1 | File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner chat-1 | self.run() chat-1 | │ └ <function WorkerThread.run at 0x7f5e37e1df80> chat-1 | └ <WorkerThread(AnyIO worker thread, started 140040084645568)> chat-1 | File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 976, in run chat-1 | result = context.run(func, *args) chat-1 | │ │ │ └ () chat-1 | │ │ └ functools.partial(<function save_docs_to_vector_db at 0x7f5e3a0e31a0>, <starlette.requests.Request object at 0x7f5e343b8fd0>,... chat-1 | │ └ <method 'run' of '_contextvars.Context' objects> chat-1 | └ <_contextvars.Context object at 0x7f5e35c95640> chat-1 | chat-1 | > File "/app/backend/open_webui/routers/retrieval.py", line 1486, in save_docs_to_vector_db chat-1 | items = [ chat-1 | chat-1 | File "/app/backend/open_webui/routers/retrieval.py", line 1490, in <listcomp> chat-1 | "vector": embeddings[idx], chat-1 | │ └ 11 chat-1 | └ [[-0.024991527199745178, 0.04560912773013115, 0.0008289961260743439, -0.02457759529352188, -0.0019923101644963026, 0.04161553... chat-1 | chat-1 | IndexError: list index out of range ``` ### Additional Information Current workaround: Use local SentenceTransformers.

GiteaMirror added the bug label 2026-05-05 21:04:35 -05:00

GiteaMirror closed this issue

2026-05-05 21:04:37 -05:00

GiteaMirror commented

2026-05-05 21:04:39 -05:00

@Classic298 commented on GitHub (Nov 24, 2025):

Probably duplicate, please post your steps to repro and detailed setup info there.

@Classic298 commented on GitHub (Nov 24, 2025): Probably duplicate, please post your steps to repro and detailed setup info there.

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#57538