[GH-ISSUE #20683] feat: Rate limiter for Embedding model requests #19263

Closed
opened 2026-04-20 01:39:47 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @blackomb on GitHub (Jan 15, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20683

Check Existing Issues

  • I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

  • I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

I am using OpenWebUI with on-premise LLMs and embedding models with OpenAI-compatible API.
This API has request rate limitations because it has limited resources.

When I load text/md/docx/pdf files into chat, openwebui split them into chunks and then tries to vectorize them using embedding model. Usually it works very fast. And if loaded file is large enough, I reach the API rate limit and file loading fails. API returns "HTTP 429 - Too many requests", but openwebui seems to ignore this error and tries to post the chunk anyway. Finally I get "list index out of range".
The same behavior when I try to load knowledge for RAG.
"Async Embedding Processing" turned off.

Desired Solution you'd like

Option in Admin panel -> Documents -> Embedding
when "Open AI" or "Azure Open AI" or "Ollama" engine selected
Option called "Requests per second", text field with RPS limit.
While embedding, if limit reached, openwebui should throttle request rate.

Alternatives Considered

Option

Additional Context

No response

Originally created by @blackomb on GitHub (Jan 15, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/20683 ### Check Existing Issues - [x] I have searched for all existing **open AND closed** issues and discussions for similar requests. I have found none that is comparable to my request. ### Verify Feature Scope - [x] I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions. ### Problem Description I am using OpenWebUI with on-premise LLMs and embedding models with OpenAI-compatible API. This API has request rate limitations because it has limited resources. When I load text/md/docx/pdf files into chat, openwebui split them into chunks and then tries to vectorize them using embedding model. Usually it works very fast. And if loaded file is large enough, I reach the API rate limit and file loading fails. API returns "HTTP 429 - Too many requests", but openwebui seems to ignore this error and tries to post the chunk anyway. Finally I get "list index out of range". The same behavior when I try to load knowledge for RAG. "Async Embedding Processing" turned off. ### Desired Solution you'd like Option in Admin panel -> Documents -> Embedding when "Open AI" or "Azure Open AI" or "Ollama" engine selected Option called "Requests per second", text field with RPS limit. While embedding, if limit reached, openwebui should throttle request rate. ### Alternatives Considered Option ### Additional Context _No response_
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#19263