mirror of
https://github.com/open-webui/open-webui.git
synced 2026-06-03 15:27:37 -05:00
[GH-ISSUE #20683] feat: Rate limiter for Embedding model requests #34792
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @blackomb on GitHub (Jan 15, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20683
Check Existing Issues
Verify Feature Scope
Problem Description
I am using OpenWebUI with on-premise LLMs and embedding models with OpenAI-compatible API.
This API has request rate limitations because it has limited resources.
When I load text/md/docx/pdf files into chat, openwebui split them into chunks and then tries to vectorize them using embedding model. Usually it works very fast. And if loaded file is large enough, I reach the API rate limit and file loading fails. API returns "HTTP 429 - Too many requests", but openwebui seems to ignore this error and tries to post the chunk anyway. Finally I get "list index out of range".
The same behavior when I try to load knowledge for RAG.
"Async Embedding Processing" turned off.
Desired Solution you'd like
Option in Admin panel -> Documents -> Embedding
when "Open AI" or "Azure Open AI" or "Ollama" engine selected
Option called "Requests per second", text field with RPS limit.
While embedding, if limit reached, openwebui should throttle request rate.
Alternatives Considered
Option
Additional Context
No response