mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-24 11:58:31 -05:00
[GH-ISSUE #15134] feat: option to disable parallel web search request #120790
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @AureMargaret on GitHub (Jun 19, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/15134
Originally assigned to: @Classic298 on GitHub.
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
V0.6.15
Ollama Version (if applicable)
llama3:8b-instruct-fp16
Operating System
Truenas Scale v25.04.1
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
The Brave Search API should not return 429 Too Many Requests errors when Concurrent Requests is set to 1.
There should be an option to add a minimum delay between requests (e.g., 1000 ms) so that the request rate complies with Brave’s documented limit of 1 request per second.
With such a throttle or delay setting, the WebUI should space out queries and avoid triggering Brave’s rate limiter.
Actual Behavior
Even with Concurrent Requests set to 1 and Search Result Count set to 1, the Brave Search API returns a 429 Too Many Requests error.
This happens because the WebUI appears to send requests back-to-back without any delay, exceeding Brave’s rate limit of 1 request per second.
There is currently no setting to enforce a delay or throttle between search requests, which leads to repeated rate-limiting errors during normal usage.
Steps to Reproduce
Set up Open WebUI (tested with latest version as of June 2025)
Either self-host or use a local instance
Configure Web Search settings under the "General" tab:
Web Search Engine: brave
Brave Search API Key: [your valid Brave API key]
Search Result Count: 1
Concurrent Requests: 1
Start a conversation with the assistant and ask any query that triggers a web search, e.g.:
What is the latest inflation news?
Repeat this process multiple times (e.g., ask 2–3 questions back to back or refresh chat).
Observe the error response in the WebUI or backend logs:
fastapi.exceptions.HTTPException: 400: 429 Client Error: Too Many Requests for url: https://api.search.brave.com/res/v1/web/search...
Logs & Screenshots
Below is the error log captured from the WebUI backend when using the Brave search API:
fastapi.exceptions.HTTPException: 400: 429 Client Error: Too Many Requests for url: https://api.search.brave.com/res/v1/web/search?q=latest+inflation+news&count=1
This confirms the issue is due to rate limiting (429 Too Many Requests), despite the settings:
Concurrent Requests: 1
Search Result Count: 1
The Brave Search API documentation specifies a limit of 1 request per second, but the WebUI currently issues requests without a delay, triggering this error.
Additional Information
Brave’s official API documentation clearly states a rate limit of 1 request per second.
Other search engines (like Tavily or SerpAPI) seem to work fine because they either allow higher request rates or the backend handles throttling.
The issue affects users with valid Brave API keys, even with minimal search load, due to the lack of a built-in delay/throttle mechanism in Open WebUI.
A simple feature such as “Request Delay (ms)” or automatic rate-limit handling would likely resolve this.
Tested using both Docker and local installs of Open WebUI.
@cvaz1306 commented on GitHub (Jul 21, 2025):
I'm having the same issue and am working on a PR.
@chip902 commented on GitHub (Jul 25, 2025):
Thank you sir @cvaz1306 !
@DocShotgun commented on GitHub (Aug 3, 2025):
This is very needed - brave as a search backend is effectively unusable because of this. I suspect that it's due to the model generating more than 1 search query (usually like 3ish), and that triggering too many requests.
@Rudd-O commented on GitHub (Aug 30, 2025):
I am bitten by this too. My AI generates three requests, OWU attempts to search all three simultaneously. Cool that it can do that, but basically that stops search from working with the free tier in Brave API. Brave free tier has a 1 rps limit.
Here's how a proper fix ought to work: when the API responds with HTTP 429, response headers instructing the client how to back off should be obeyed and then the request should be retried.
@rgaricano commented on GitHub (Aug 30, 2025):
@Rudd-O
You can set Concurret Request to 1
in adminSettings/WebSearch/Loader Concurret Request
@DocShotgun commented on GitHub (Aug 31, 2025):
FYI setting max concurrent requests to 1 does not prevent this error from occurring because the limit is max 1 request per second specifically in the backend.
@e-dervieux commented on GitHub (Sep 8, 2025):
@cvaz1306 thank you!!! Any news on this?
Related discussion here: https://github.com/open-webui/open-webui/discussions/14107
@scionaltera commented on GitHub (Sep 13, 2025):
Also seeing this. It would be really nice to configure it to keep to 1/sec somehow.


@cvaz1306 commented on GitHub (Sep 15, 2025):
Just to update you on progress, I've got web search working with brave web search (the free version). I need to finish testing, and then I will submit the PR.
@chip902 commented on GitHub (Sep 15, 2025):
You are a saint and a scholar good sir.
@cvaz1306 commented on GitHub (Sep 15, 2025):
@chip902 Here is the pull request if you want to check it out, or draw attention to the maintainers: https://github.com/open-webui/open-webui/pull/17449
FYI if you want to use this, it currently uses environment variables, because I haven't quite figured out how the admin settings interface code works. Hopefully that isn't a problem for you. It's not quite finished yet, and im working on getting the tests to pass.
@glantucan commented on GitHub (Nov 2, 2025):
I am still having this problem and it's not only with the free version of the API. It's happening also with the base plan which is supposed to support 20 requests/second
@mp3bruh commented on GitHub (Nov 16, 2025):
A temporary solution is to specify in the prompt that the AI should use only one search request.
@cvaz1306 commented on GitHub (Nov 18, 2025):
I tried that and it didn't solve the issue.
@atomlab commented on GitHub (Dec 19, 2025):
I'm having the exact same issue!
I'm using the Brave Search API on the free tier and constantly hitting this "1 request per second" limit. OpenWebUI just sends requests one after another with no pause, and I get a 429 error after the very first request.
This really disrupts my workflow and makes the web search feature unusable. It would be great if OpenWebUI had an option for "delay between search requests" or at least some basic throttle in the settings. Even a simple 1-2 second delay would solve 90% of the problems with this API limit.
@Classic298 commented on GitHub (Dec 19, 2025):
PR welcome, still. Alternative: upgrade to the paid brave search
@Classic298 commented on GitHub (Dec 21, 2025):
should be addressed by
https://github.com/open-webui/open-webui/pull/20070
@jocull commented on GitHub (Dec 26, 2025):
With the latest release this is still happening for me. I've set the new concurrency setting to 1, but the Brave API (free tier) is 1 request per second.
If the LLM decides to generate multiple search terms, the second request will fire immediately after the first, and is almost guaranteed to trip the rate limit. A simple backoff, retry, or even just user-specified delay between requests would do the trick 🙏
@Classic298 commented on GitHub (Dec 26, 2025):
Ok so besides setting a concurrency limit you also need something like a "requests per second" .. no.. something like a "time between requests" environment variable?
@jocull commented on GitHub (Dec 26, 2025):
Maybe -- it's just like a quick and dirty workaround. Respecting the 429 responses for rate limiting and applying any back off would be more proper.
Here's a quick example of what I did just to test the theory:
b44bddad9aIt seems to work here for me. Just trying to pitch solutions with minimum friction 🙏
@Rudd-O commented on GitHub (Dec 30, 2025):
Yes if the software respected 429 then everything would work flawlessly. That would be the real fix.
@Classic298 commented on GitHub (Dec 30, 2025):
Someone in the discussion for brave wrote that setting the concurrency to 1 fixed it for them. But for you it didn't fix it. Weird. Brave api is strange
@Rudd-O commented on GitHub (Dec 30, 2025):
The problem is that it isn't enough to limit concurrency to 1 request for the Brave API . If your internet is fast enough, you will hit the additional limit of one request per second. The docs are clear: it's not just "one request, none in parallel" -- it's one request per second.
Again, if all 429s were retried by duly following what the HTTP headers say on the 429 reply, it would work perfectly. That's the real fix.
@Classic298 commented on GitHub (Dec 30, 2025):
thanks let me attempt to make a fix in case your internet is fast enough to send more than 1 request per second despite concurrency being set to 1 specifically for brave
@Classic298 commented on GitHub (Dec 30, 2025):
https://github.com/open-webui/open-webui/pull/20255 this should do it fam, testing wanted, but not necessarily needed
@Classic298 commented on GitHub (Dec 30, 2025):
fixed by PR
@jocull commented on GitHub (Dec 31, 2025):
Thanks for the PR! I still have concerns this won’t address the issue in many cases since it only retries once. It seems like if the LLM decides to generate many search terms and requests were trying to run concurrently there could still be contention. A small retry loop may resolve more reliably. Maybe 5-10 attempts before giving up?
I am also surprised to see no test coverage required for this, but not my project so I won’t complain. I was just scared to contribute for stepping into complex testing 😅
@Classic298 commented on GitHub (Dec 31, 2025):
@jocull set the concurrency to 1 and based on everything you guys reported here it has to work
Sleep of 1 second
And concurrency of 1 meaning only 1 request at a time will keep in line with brave' free rate limits.
@jocull commented on GitHub (Jan 1, 2026):
Ahhh good thought, thanks. I do have that set to 1 already.
Will that also be true in a multi-user environment where multiple chats may be going on at the same time?
@Classic298 commented on GitHub (Jan 1, 2026):
@jocull
No
The concurrency is per web search request
If somehow... three users use the web search at the very exact same time (same second) then there'd be three separate threadpools, all three of which are bound to single query concurrency, therefore there will be three requests sent in that second to the configured API endpoint.
(to be fair, in an environment where this is a problem where you have multiple truly concurrent web searches in the very same second (implying hundreds of concurrent users) you would usually use a paid search api and not brave free tier.
@Rudd-O commented on GitHub (Jan 4, 2026):
KING! BASED!