[GH-ISSUE #20683] feat: Rate limiter for Embedding model requests #34792

New Issue

2026-04-25T08:58:19-05:00

GiteaMirror commented

2026-04-25 08:58:19 -05:00

Originally created by @blackomb on GitHub (Jan 15, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20683

Check Existing Issues

I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

I am using OpenWebUI with on-premise LLMs and embedding models with OpenAI-compatible API.
This API has request rate limitations because it has limited resources.

When I load text/md/docx/pdf files into chat, openwebui split them into chunks and then tries to vectorize them using embedding model. Usually it works very fast. And if loaded file is large enough, I reach the API rate limit and file loading fails. API returns "HTTP 429 - Too many requests", but openwebui seems to ignore this error and tries to post the chunk anyway. Finally I get "list index out of range".
The same behavior when I try to load knowledge for RAG.
"Async Embedding Processing" turned off.

Desired Solution you'd like

Option in Admin panel -> Documents -> Embedding
when "Open AI" or "Azure Open AI" or "Ollama" engine selected
Option called "Requests per second", text field with RPS limit.
While embedding, if limit reached, openwebui should throttle request rate.

Alternatives Considered

Option

Additional Context

No response

Originally created by @blackomb on GitHub (Jan 15, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/20683 ### Check Existing Issues - [x] I have searched for all existing **open AND closed** issues and discussions for similar requests. I have found none that is comparable to my request. ### Verify Feature Scope - [x] I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions. ### Problem Description I am using OpenWebUI with on-premise LLMs and embedding models with OpenAI-compatible API. This API has request rate limitations because it has limited resources. When I load text/md/docx/pdf files into chat, openwebui split them into chunks and then tries to vectorize them using embedding model. Usually it works very fast. And if loaded file is large enough, I reach the API rate limit and file loading fails. API returns "HTTP 429 - Too many requests", but openwebui seems to ignore this error and tries to post the chunk anyway. Finally I get "list index out of range". The same behavior when I try to load knowledge for RAG. "Async Embedding Processing" turned off. ### Desired Solution you'd like Option in Admin panel -> Documents -> Embedding when "Open AI" or "Azure Open AI" or "Ollama" engine selected Option called "Requests per second", text field with RPS limit. While embedding, if limit reached, openwebui should throttle request rate. ### Alternatives Considered Option ### Additional Context _No response_

GiteaMirror closed this issue

2026-04-25 08:58:19 -05:00

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#34792