mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 19:38:46 -05:00
[GH-ISSUE #19421] issue: save embedding to vector DB freezes the whole application #34400
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @FBH93 on GitHub (Nov 24, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19421
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
0.6.38
Ollama Version (if applicable)
No response
Operating System
windows 11, but OWUI is running in docker
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
User A uploads document.
OWUI embeds and saves to vectorDB. The save to vectorDB function takes 2 minutes to complete.
While this happens, user B can use OWUI as normal.
Actual Behavior
User A uploads document.
OWUI embeds and saves to vectorDB. The save to vectorDB function takes 2 minutes to complete.
While this happens, user B can take no action, and the app is essentially frozen to their view.
All actions initiated during the freeze will happen when the save is complete.
Steps to Reproduce
Drag and drop a text file to openwebui chat window.
Observe that embedding and saving works as intended, but no other users can do any action while the saving to vector DB is ongoing.
Logs & Screenshots
Notice how there is 2 minutes between the start of the save, and the end of the save. No other logs happen during this time, in spite of other users taking several actions.
2025-11-24T12:20:52.8642231Z stdout F 2025-11-24 12:20:52.864 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1490 - adding to collection file-92492f5b-c7df-4db9-943d-5eafd3d67312
2025-11-24T12:22:12.61175 No logs since last 60 seconds
2025-11-24T12:22:57.2297948Z stdout F 2025-11-24 12:22:57.229 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1496 - added 1 items to collection file-92492f5b-c7df-4db9-943d-5eafd3d67312
Additional Information
This was not an issue before the upgrade to 0.6.37. I was on version 0.6.32.
@Classic298 commented on GitHub (Nov 24, 2025):
What setup do you use? please share detailed information re: vector db, embedding model, embedding model config and setup, how did you connect to the vector db, any concurrency? etc.
@rbsn-cpu commented on GitHub (Nov 24, 2025):
Same issue !
Embedding model : BGE M3, with Reranking
@Classic298 commented on GitHub (Nov 24, 2025):
@rbsn-cpu .38 or .37?
and more info is wanted What setup do you use? please share detailed information re: vector db, embedding model, embedding model config and setup, how did you connect to the vector db, any concurrency? etc.
@scheatkode commented on GitHub (Nov 24, 2025):
Not sure this is relevant but here's additional information from #19423 in case this is somehow related.
Expected Behavior
Embedding works.
Actual Behavior
Embedding doesn't work, we get a
IndexError: list index out of rangebecause the embedding process isn't handling429 Too Many Requestsgracefully with exponential backoff or otherwise; therefore the embeddings list doesn't hold enough items.This is likely a regression from #19296.
Steps to Reproduce
Using
llama-swap:Configure Open-WebUI accordingly and run a web search. Relevant env config:
@Classic298: I appended the requested config as well:
Logs & Screenshots
Later:
Additional Information
Current workaround: Use local SentenceTransformers.
@Classic298 commented on GitHub (Nov 24, 2025):
thanks for the logs.. but why do they say OpenAI 429 error when you use a local embedding model?
@scheatkode commented on GitHub (Nov 24, 2025):
I have configured the embedding engine to
openaias I'm usingllama.cpp/llama-server(throughllama-swapbut I think that's irrelevant) to expose the model to OpenWebUI. In this case 429 errors are likely because it's receiving too many requests at a time (221) and saturating the available slots (even with configured parallelism at10).Relevant logs from
llama-server:@FBH93 commented on GitHub (Nov 24, 2025):
VectorDB setup is default ChromaDB that comes built in with OpenWebUI. So it's running in the same container as OWUI. I have not changed settings related to this.
embedding is handled by Azure OpenAI Embedding.
I have no idea how to access or view the contents of ChromaDB.
@Classic298 commented on GitHub (Nov 24, 2025):
Aha okay. So it would be a simple fix for you to define a maximum number of requests per (second/minute) and this would fix this for local inference?
We specifically tested sentence transformers and OpenAI - and even though we tested thousands of embeddings, even on a tier 1 account, we didnt get 429 errors. Didnt think of that.
@Classic298 commented on GitHub (Nov 24, 2025):
@FBH93
we dont need your chroma db contents, just your full setup info, embedding model info, what vector db (you shared that now) and how you use your embedding models and so forth
@scheatkode commented on GitHub (Nov 24, 2025):
No worries. A way to define a maximum number of requests would be great. Even better if this was automatically handled with retries & backoff.
@Classic298 commented on GitHub (Nov 24, 2025):
Thanks for your detailed setup description. This gives us (at least one, if not the) reason this might fail for some people.
Definitely makes sense - if your server can only handle 10 at a time, to then set a maximum simultaneous requests limit.
@FBH93 commented on GitHub (Nov 24, 2025):
My full setup info:
Front end hosted in azure container, 3 CPU cores, 6GB memory.
File storage in Azure Storage Account File Share (Also where the chromaDB file is located it turns out)
Embedding model Text-Embedding-3-Large on Azure OpenAI
I spun up a copy of the setup without much data in the ChromaDB, and it is significantly faster (almost instant). So I guess the freeze could be related to the size of chromaDB? My current size is 1.27GB, so it could maybe explain why it takes time to load and save new data to it?
@Classic298 commented on GitHub (Nov 24, 2025):
@FBH93 don't mind me asking; is this a small scale setup or do you have many users?
Since you're utilizing azure a lot i had assumed business/enterprise setup, but 3 CPU cores and 6 GB memory for many users (just assuming here) is a bit on the low end.
It might be that embeddings work for you, but if someone uploads a very large document (hundreds if not thousands of chunks) that the large amount of requests this generates might consume much of your 3 CPU cores? What content extraction engine do you use here? Might be CPU heavy as well.
Please also tell us more about the embedding model.
Do you have rate limits there? If yes, how high and do you reach them?
Do the logs show anything? Can you share logs or errors you find?
Content extraction, basically your whole Document settings in the admin panel please.
Please, Thanks
@scheatkode commented on GitHub (Nov 24, 2025):
Here's an attempt at a fix using both solutions. They could even be mixed for even better handling.
Alternatively, defining a concurrency limit would look like this:
These might be enough to fix both issues.
@FBH93 commented on GitHub (Nov 24, 2025):
It's around 150 weekly users, so not small, but not big either.
We have not seen issues with bottlenecking so far, except until now, but it does not seem to relate to either CPU/RAM useage or token limits.
The embedding model has a limit of 350.000 tokens per minute, and we are nowhere near that.
Content extraction is default.
Here's a log of a 2 page document I tried to upload to the chat.
2025-11-24T14:59:57.5981286Z stdout F 2025-11-24 14:59:57.597 | DEBUG | open_webui.routers.retrieval:process_file:1666 - text_content: 25-118 / Doc25-1121 Page 1 of 2 <SNIP> Document content here </SNIP> 2025-11-24T14:59:57.6231690Z stdout F 2025-11-24 14:59:57.622 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1323 - save_docs_to_vector_db: document <document title>.pdf file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T14:59:58.8142945Z stdout F 2025-11-24 14:59:58.814 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1439 - generating embeddings for file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T14:59:58.8145845Z stdout F 2025-11-24 14:59:58.814 | DEBUG | asyncio.selector_events:__init__:54 - Using selector: EpollSelector 2025-11-24T14:59:58.8148496Z stdout F 2025-11-24 14:59:58.814 | DEBUG | open_webui.retrieval.utils:async_embedding_function:819 - generate_multiple_async: Processing 1 batches in parallel 2025-11-24T14:59:58.8149469Z stdout F 2025-11-24 14:59:58.814 | DEBUG | open_webui.retrieval.utils:agenerate_azure_openai_batch_embeddings:670 - agenerate_azure_openai_batch_embeddings:deployment text-embedding-3-large batch size: 3 2025-11-24T14:59:58.8399559Z stdout F 2025-11-24 14:59:58.839 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 94.101.209.81:0 - "GET /api/v1/chats/c30f3db9-9528-4d42-8b10-de0b51770bf7 HTTP/1.1" 200 2025-11-24T14:59:58.8402052Z stdout F 2025-11-24 14:59:58.839 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 94.101.209.81:0 - "GET /api/v1/chats/8aa1e5e3-edc4-4443-81a4-146587f4a6ba HTTP/1.1" 200 2025-11-24T14:59:59.0006063Z stdout F 2025-11-24 14:59:59.000 | DEBUG | open_webui.retrieval.utils:async_embedding_function:836 - generate_multiple_async: Generated 3 embeddings from 1 parallel batches 2025-11-24T14:59:59.0015254Z stdout F 2025-11-24 14:59:59.001 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1478 - embeddings generated 3 for 3 items 2025-11-24T14:59:59.0016218Z stdout F 2025-11-24 14:59:59.001 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1490 - adding to collection file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T15:00:59.50822 No logs since last 60 seconds 2025-11-24T15:01:59.2216817Z stdout F 2025-11-24 15:01:59.221 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1496 - added 3 items to collection file-eaedfc3f-8a40-45bc-a754-6784cc5c941c 2025-11-24T15:01:59.2217935Z stdout F 2025-11-24 15:01:59.221 | INFO | open_webui.routers.retrieval:process_file:1696 - added 2 items to collection file-eaedfc3f-8a40-45bc-a754-6784cc5c941cAs you can see, the extraction and embedding is very fast. But the saving to ChromaDB (adding to collection) takes 2 minutes (almost exactly?).
Here's the settings:
@Classic298 commented on GitHub (Nov 24, 2025):
aha so the issue is it takes long to add to the database! (?)
Does embedding work for you fine then?
Might have conflated these two here, but will use this (and the other discussion) to track it regardless
@FBH93 commented on GitHub (Nov 24, 2025):
Yes exactly, the issue is that embeddings are created just fine, as expected, but when it's saved to DB it takes a long time, and during this time no other users can do anything.
I would be fine-ish with a slow save, if other users were not blocked and the app didn't appear frozen to them.
@nlamarque42 commented on GitHub (Nov 24, 2025):
you need to run your open-webui instance with multiple workers orchestrated by a redis.
@curious-broccoli commented on GitHub (Nov 24, 2025):
similar/same issue here that completely breaks embedding using an API.
Version
v0.6.38 and v0.6.37 running in local development
Expected
embedding works, works fine with v0.6.36
Steps to Reproduce
see exported json, urls and keys redacted
same problem with default local chroma DB and with pgvector
Logs
As the logs say, this is not a 429 error. Which makes sense since it is very little to embed. Bigger files have the same error.
@Classic298 commented on GitHub (Nov 24, 2025):
@curious-broccoli
please more information about your setup specifically embedding models used and reproduction steps.
We currently are aware of two embeddings related issues
one is with the reindex button in documents
the other is with rate limits
@Classic298 commented on GitHub (Nov 24, 2025):
Hey @FBH93
I have a question
Can you try something?
set THREAD_POOL_SIZE env var to 2000 please
@FBH93 commented on GitHub (Nov 25, 2025):
@Classic298
With THREAD_POOL_SIZE set to 2000: Same problem with taking 2 minutes to save to vector DB. Blocking, still, so it is not concurrent.
I have also begun to identify occasional "list index out of range" issues, similar to other users reporting, but I cannot recreate it consistently.
@Classic298 commented on GitHub (Nov 25, 2025):
@FBH93 i must suspect you are running out of resources, perhaps your resources cannot handle the users AND thousands of requests a minute for embedding at the same time.
The next version will introduce a toggle to DISABLE parallel embedding if you have issues with it. That will put you back to the old system of sequential embedding
@Classic298 commented on GitHub (Nov 25, 2025):
reindex issue also fixed in dev btw for everyone else here
@FBH93 commented on GitHub (Nov 25, 2025):
@Classic298 My useage logs show it's only using less than 20% of CPU and 20% of memory. But I will revert to sequential in the next update and see if this fixes the issue.
I suppose it is something specfic to my setup since nobody else seems to face this problem. I will need to experiment with my azure setup...
Thanks for your assist.
@Classic298 commented on GitHub (Nov 25, 2025):
@FBH93 i would recommend using redis + uvicorn workers (properly set up of course) and run a speed test if your storage is fast. if the storage is slow your vector DB will also be very slow since chroma db in this case lies on the storage
@FBH93 commented on GitHub (Nov 25, 2025):
For anyone stumbling across this in the future: Changing the vector DB to pgvector on postgreSQL in azure helped the speed problems. So it was an issue with the chromaDB being stored in azure file share. Do not use the default chromaDB with many users / on cloud.
Now I am consistently getting the same "index out of range" problem that everyone else is getting.
@kumanoko24 commented on GitHub (Nov 25, 2025):
version
v0.6.40@Classic298 commented on GitHub (Nov 25, 2025):
@kumanoko24 what setup? Local embedding? Turn off parallel processing in the Document settings. Likely the backend silently got a 429 error (hence the embedding failed) and therefore list index is out of range, because the list is (almost) empty.
@kumanoko24 commented on GitHub (Nov 25, 2025):
I am still observing why some files are ok (successfully being indexed into knowledge base) but most files are having this kind of error.
Setup:
uv tool install(python 3.11), plus qdrant-client dep.bge-m3:latestas embedding model.Flow:
POST /api/v1/files/?process=true&process_in_background=trueGET /api/v1/files/${fileId}/process/status(polling until ok)POST /api/v1/knowledge/${knowledgeBaseId}/file/add@Classic298 thank you for prompt support, but I am still collecting details, might taking a bit more time.
@Classic298 commented on GitHub (Nov 25, 2025):
Yes you are clearly embedding locally
You should turn of parallel processing
@kumanoko24 commented on GitHub (Nov 25, 2025):
I have turned that off already , as shown in the screenshot
@Classic298 commented on GitHub (Nov 25, 2025):
Even when off you get this issue for some items? @kumanoko24 Are you on .40?
@Classic298 commented on GitHub (Nov 25, 2025):
For anyone here with index list empty/out of range errors
https://github.com/open-webui/open-webui/issues/19474#issuecomment-3575806065
check if your configured API is correct.
It CANNOT Contain a trailing slash at the end
so it must end with the TLD like .com or .ai or whatever, OR end in /v1
It cannot end in /
@RDPPatwork commented on GitHub (Nov 25, 2025):
@FBH93 I have nearly the same setup like you and also stumbled about some problems, especially with rag/embeddings/fileuploads.
If you like, you can email me to share our experiences, as it can be really time consuming to find solutions in this case.
@imbible commented on GitHub (Nov 28, 2025):
It has nothing to do with that. I can reproduce this with local embedding, and with the async processing turned off. And apparently there isn't a trailing slash in my settings. This setting used to work, so I believe it is a regression. See my screenshot below.
@Classic298 commented on GitHub (Nov 28, 2025):
@imbible what version are you on? Steps to reproduce? More information about your setup needed please.
@imbible commented on GitHub (Nov 28, 2025):
Sure. Version 0.6.40. MacBook Pro 16" with M4 Max 128GB unified memory. macOS Tahoe 26.1.
Here is the docker-compose.yml.
Hosted on http://localhost:3000/ .
Go to Admin Panel - Settings - Documents, in the Embedding section, select model engine as Ollama, url set to
http://host.docker.internal:11434, API Key remains empty, Embedding Modelhf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0. Embedding Batch Size 1, Async Embedding Processing disabled.Go to Workspace - Knowledge, click "+ New Knowledge", input "test" in "What are you working on?" and "What are you trying to achieve?", click "Create Knowledge". Drag a pdf to the collection. It starts to process the pdf but eventually displays "list index out of range" regardless of whatever pdf. Tried 5 PDFs that used to work in older version of WebUI and see this issue with all of them in the current version.
@Classic298 commented on GitHub (Nov 28, 2025):
Can you verify the request reaches ollama? Gonna need debug logs here on both ends
@imbible commented on GitHub (Nov 28, 2025):
It reaches ollama. It seems to be an issue on ollama's end.
https://github.com/ollama/ollama/issues/12757
https://github.com/ollama/ollama/issues/10824
@2fst4u commented on GitHub (Nov 29, 2025):
I've been seeing the same issue for a while now and I have no idea which variable is causing it. I only use remote connections for all the rag settings and I've trained changing back to sentence transformers to no avail. Has the specific setting that causes this been narrowed down?
@Classic298 commented on GitHub (Nov 29, 2025):
@2fst4u
same questions to you as to the others:
what version, embedding model, setup, all document settings, did you try what was recommended above and are you affected by what imbible shared here (ollama issue)?
@2fst4u commented on GitHub (Nov 29, 2025):
I've tried these settings attached
, I've tried turning off hybrid, I've tried sentence transformers, I've tried different chunk sizes and top k. I don't think there's a single setting I have t tried to modify and it still freeze the webui sometimes for 30 minutes or so.
The logs show nothing, it just sits waiting.
I'm running on kubernetes and the resources usage is just idling while it does this, it's not pinning CPU while it waits.
@Classic298 commented on GitHub (Nov 29, 2025):
any debug logs? Again, on what version specifically are you? how many users? what database? what vector database? Since you're running a multi-worker environment; did you setup REDIS and all related settings correctly? How does this freeze manifest? When uploading a single small file it is stuck for 30 minutes? Do you use docker or pip installation? please update python-socketio to 5.15.0 to prevent Redis issues (might cause issues here as well)
@Classic298 commented on GitHub (Nov 29, 2025):
Gonna need a lot more information here.
And what do you mean by "for a while now" - when did it start? After an update or just during normal usage and not particularly after an update? If it was after an update, after which version and on what version are you now?
@2fst4u commented on GitHub (Nov 29, 2025):
No debug logs, I didn't think to check if that's an option on helm so I'll have to find it and enable it.
1 user, me.
Only one worker. Yes redis is enabled and working via helm.
When doing any query with searching web enabled, the last log entry is something like "saving to vectordb".
@Classic298 commented on GitHub (Nov 29, 2025):
Please do answer ALL questions, I cannot even attempt to help you with this little information
Since you keep answering vaguely to my questions above, i have prepared this full checklist for you I need answered.
If the issue is related to "freezing" during web search only, it might also be that your web loader is super slow for some reason - but again i need much more details here, much more. You have not even stated yet when this freezing occurs and when it doesn't. Please answer EVERYTHING.
/app/backend/data(Must specify: NFS, Azure Files, EBS, Longhorn, or Local Path):RAG_EMBEDDING_OPENAI_BATCH_SIZE):LOG_LEVEL=DEBUG, reproduce issue, provide last 100 lines):@Classic298 commented on GitHub (Nov 29, 2025):
PS: @2fst4u your RAG settings need optimization, to say it kindly - a chunk size of 200 is the opposite of optimal. It is very suboptimal.
If you upload ANY document, you will create 10x more chunks, vectors, embeddings than anyone else - spamming your vector database with semantically useless data and wasting 10x more storage space than necessary, slowing down your system and causing insane retrieval slowness because everytime, 10x the amount of chunks will have to be searched, filtered and approximated.
So just saying, this could also be the culprit depending on your setup
Of course - if you have a powerful PC behind this setup, then the 200 chunk size is not the issue - but if whatever is running your setup is not powerful, then the 200 chunk size has cost you performance, quality, and storage - in a negative way - and a lot of it too.
If you upload a moderately sized document (say, 50 pages), a normal setup would write maybe 50-100 vectors.
Your's would probably write 1000+ vectors.
This kills database performance, this kills retrieval performance, this kills your storage, this kills I/O speed and just everything else. Also it's expensive to call the embedding model 10x more than you would have to if you'd use a chunk size of 1500-2000
Finally, last AND least: your data is making three roundtrips - you are first sending it to mistral OCR (btw - it cannot process all filetypes, I hope you are aware of that), then to the embedding model (a lot) and finally and only then to the reranker - and all three are external. The reranker and the embedding model having 10x the tasks because of the small chunk size setting.
@2fst4u commented on GitHub (Nov 30, 2025):
Alrighty so if I make chunks 2000, disable mistral and make it the default, and make sentence transformers default like I mentioned then it still freezes. The database is internal sqlite.
These are the last entries is shows when this happens
When it freezes it literally just stops responding. I don't know how else to explain the interface freezing to you. It doesn't respond and after some time it starts responding again.
@rgaricano commented on GitHub (Nov 30, 2025):
Chunk size of hundreds are optimal for embedd code, that shouldn't be a problem.
Mainly it's a rate limit of the embedd service.
IndexError: list index out of rangein the RAG embedding pipeline occurs because the embedding function returns an empty list while there are still text chunks to process.It can be due to by a:
@FBH93 , Frederik,
The HTTP 429 "Too Many Requests" error log returned by your embedding service confirms the root cause: it's limited. This rate limiting causes the embedding function to fail and return None, which leads to the empty embeddings list and subsequent IndexError.
Workarounds, immediate:
As the regular OpenAI embedding function lacks retry logic for 429 errors (Azure OpenAI has it), a permanent solution for prevent its could be add retry logic to embedding function:
140605e660/backend/open_webui/retrieval/utils.py (L535-L609)with retry logic
@Classic298 commented on GitHub (Nov 30, 2025):
@2fst4u everything @rgaricano said - plus: changing the chunk size now will not delete data from your vector database. I am assuming your vector database is huge in size, and if your instance runs on a weak device then that won't really help all the I/O operations that are necessary for semantic seach