[GH-ISSUE #15134] feat: option to disable parallel web search request #120790

Closed
opened 2026-05-20 22:30:12 -05:00 by GiteaMirror · 31 comments
Owner

Originally created by @AureMargaret on GitHub (Jun 19, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/15134

Originally assigned to: @Classic298 on GitHub.

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

V0.6.15

Ollama Version (if applicable)

llama3:8b-instruct-fp16

Operating System

Truenas Scale v25.04.1

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The Brave Search API should not return 429 Too Many Requests errors when Concurrent Requests is set to 1.
There should be an option to add a minimum delay between requests (e.g., 1000 ms) so that the request rate complies with Brave’s documented limit of 1 request per second.
With such a throttle or delay setting, the WebUI should space out queries and avoid triggering Brave’s rate limiter.

Actual Behavior

Even with Concurrent Requests set to 1 and Search Result Count set to 1, the Brave Search API returns a 429 Too Many Requests error.
This happens because the WebUI appears to send requests back-to-back without any delay, exceeding Brave’s rate limit of 1 request per second.
There is currently no setting to enforce a delay or throttle between search requests, which leads to repeated rate-limiting errors during normal usage.

Steps to Reproduce

  1. Set up Open WebUI (tested with latest version as of June 2025)
    Either self-host or use a local instance

  2. Configure Web Search settings under the "General" tab:
    Web Search Engine: brave
    Brave Search API Key: [your valid Brave API key]
    Search Result Count: 1
    Concurrent Requests: 1

  3. Start a conversation with the assistant and ask any query that triggers a web search, e.g.:
    What is the latest inflation news?

  4. Repeat this process multiple times (e.g., ask 2–3 questions back to back or refresh chat).

  5. Observe the error response in the WebUI or backend logs:

fastapi.exceptions.HTTPException: 400: 429 Client Error: Too Many Requests for url: https://api.search.brave.com/res/v1/web/search...

  1. Note that Brave allows only 1 request per second, and Open WebUI does not enforce a delay between sequential requests — causing the 429 error.

Logs & Screenshots

Below is the error log captured from the WebUI backend when using the Brave search API:

fastapi.exceptions.HTTPException: 400: 429 Client Error: Too Many Requests for url: https://api.search.brave.com/res/v1/web/search?q=latest+inflation+news&count=1

This confirms the issue is due to rate limiting (429 Too Many Requests), despite the settings:

Concurrent Requests: 1

Search Result Count: 1

The Brave Search API documentation specifies a limit of 1 request per second, but the WebUI currently issues requests without a delay, triggering this error.

Additional Information

Brave’s official API documentation clearly states a rate limit of 1 request per second.

Other search engines (like Tavily or SerpAPI) seem to work fine because they either allow higher request rates or the backend handles throttling.

The issue affects users with valid Brave API keys, even with minimal search load, due to the lack of a built-in delay/throttle mechanism in Open WebUI.

A simple feature such as “Request Delay (ms)” or automatic rate-limit handling would likely resolve this.

Tested using both Docker and local installs of Open WebUI.

Originally created by @AureMargaret on GitHub (Jun 19, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/15134 Originally assigned to: @Classic298 on GitHub. ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version V0.6.15 ### Ollama Version (if applicable) llama3:8b-instruct-fp16 ### Operating System Truenas Scale v25.04.1 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior The Brave Search API should not return 429 Too Many Requests errors when Concurrent Requests is set to 1. There should be an option to add a minimum delay between requests (e.g., 1000 ms) so that the request rate complies with Brave’s documented limit of 1 request per second. With such a throttle or delay setting, the WebUI should space out queries and avoid triggering Brave’s rate limiter. ### Actual Behavior Even with Concurrent Requests set to 1 and Search Result Count set to 1, the Brave Search API returns a 429 Too Many Requests error. This happens because the WebUI appears to send requests back-to-back without any delay, exceeding Brave’s rate limit of 1 request per second. There is currently no setting to enforce a delay or throttle between search requests, which leads to repeated rate-limiting errors during normal usage. ### Steps to Reproduce 1. Set up Open WebUI (tested with latest version as of June 2025) Either self-host or use a local instance 2. Configure Web Search settings under the "General" tab: Web Search Engine: brave Brave Search API Key: [your valid Brave API key] Search Result Count: 1 Concurrent Requests: 1 3. Start a conversation with the assistant and ask any query that triggers a web search, e.g.: What is the latest inflation news? 4. Repeat this process multiple times (e.g., ask 2–3 questions back to back or refresh chat). 5. Observe the error response in the WebUI or backend logs: fastapi.exceptions.HTTPException: 400: 429 Client Error: Too Many Requests for url: https://api.search.brave.com/res/v1/web/search... 6. Note that Brave allows only 1 request per second, and Open WebUI does not enforce a delay between sequential requests — causing the 429 error. ### Logs & Screenshots Below is the error log captured from the WebUI backend when using the Brave search API: fastapi.exceptions.HTTPException: 400: 429 Client Error: Too Many Requests for url: https://api.search.brave.com/res/v1/web/search?q=latest+inflation+news&count=1 This confirms the issue is due to rate limiting (429 Too Many Requests), despite the settings: Concurrent Requests: 1 Search Result Count: 1 The Brave Search API documentation specifies a limit of 1 request per second, but the WebUI currently issues requests without a delay, triggering this error. ### Additional Information Brave’s official API documentation clearly states a rate limit of 1 request per second. Other search engines (like Tavily or SerpAPI) seem to work fine because they either allow higher request rates or the backend handles throttling. The issue affects users with valid Brave API keys, even with minimal search load, due to the lack of a built-in delay/throttle mechanism in Open WebUI. A simple feature such as “Request Delay (ms)” or automatic rate-limit handling would likely resolve this. Tested using both Docker and local installs of Open WebUI.
GiteaMirror added the bug label 2026-05-20 22:30:12 -05:00
Author
Owner

@cvaz1306 commented on GitHub (Jul 21, 2025):

I'm having the same issue and am working on a PR.

<!-- gh-comment-id:3099931723 --> @cvaz1306 commented on GitHub (Jul 21, 2025): I'm having the same issue and am working on a PR.
Author
Owner

@chip902 commented on GitHub (Jul 25, 2025):

Thank you sir @cvaz1306 !

<!-- gh-comment-id:3120567361 --> @chip902 commented on GitHub (Jul 25, 2025): Thank you sir @cvaz1306 !
Author
Owner

@DocShotgun commented on GitHub (Aug 3, 2025):

This is very needed - brave as a search backend is effectively unusable because of this. I suspect that it's due to the model generating more than 1 search query (usually like 3ish), and that triggering too many requests.

<!-- gh-comment-id:3146932549 --> @DocShotgun commented on GitHub (Aug 3, 2025): This is very needed - brave as a search backend is effectively unusable because of this. I suspect that it's due to the model generating more than 1 search query (usually like 3ish), and that triggering too many requests.
Author
Owner

@Rudd-O commented on GitHub (Aug 30, 2025):

I am bitten by this too. My AI generates three requests, OWU attempts to search all three simultaneously. Cool that it can do that, but basically that stops search from working with the free tier in Brave API. Brave free tier has a 1 rps limit.

Here's how a proper fix ought to work: when the API responds with HTTP 429, response headers instructing the client how to back off should be obeyed and then the request should be retried.

<!-- gh-comment-id:3239567476 --> @Rudd-O commented on GitHub (Aug 30, 2025): I am bitten by this too. My AI generates three requests, OWU attempts to search all three simultaneously. Cool that it can do that, but basically that stops search from working with the free tier in Brave API. Brave free tier has a 1 rps limit. Here's how a proper fix *ought* to work: when the API responds with HTTP 429, response headers instructing the client _how_ to back off should be obeyed and then the request should be retried.
Author
Owner

@rgaricano commented on GitHub (Aug 30, 2025):

@Rudd-O
You can set Concurret Request to 1
in adminSettings/WebSearch/Loader Concurret Request

<!-- gh-comment-id:3239581385 --> @rgaricano commented on GitHub (Aug 30, 2025): @Rudd-O You can set Concurret Request to 1 in adminSettings/WebSearch/Loader Concurret Request
Author
Owner

@DocShotgun commented on GitHub (Aug 31, 2025):

FYI setting max concurrent requests to 1 does not prevent this error from occurring because the limit is max 1 request per second specifically in the backend.

<!-- gh-comment-id:3239737124 --> @DocShotgun commented on GitHub (Aug 31, 2025): FYI setting max concurrent requests to 1 does not prevent this error from occurring because the limit is max 1 request per *second* specifically in the backend.
Author
Owner

@e-dervieux commented on GitHub (Sep 8, 2025):

@cvaz1306 thank you!!! Any news on this?


Related discussion here: https://github.com/open-webui/open-webui/discussions/14107

<!-- gh-comment-id:3265513589 --> @e-dervieux commented on GitHub (Sep 8, 2025): @cvaz1306 thank you!!! Any news on this? --- Related discussion here: https://github.com/open-webui/open-webui/discussions/14107
Author
Owner

@scionaltera commented on GitHub (Sep 13, 2025):

Also seeing this. It would be really nice to configure it to keep to 1/sec somehow.
Image
Image

<!-- gh-comment-id:3288972212 --> @scionaltera commented on GitHub (Sep 13, 2025): Also seeing this. It would be really nice to configure it to keep to 1/sec somehow. <img width="1255" height="331" alt="Image" src="https://github.com/user-attachments/assets/8a4d00c4-0b61-4807-bc97-01788c59ec31" /> <img width="1728" height="423" alt="Image" src="https://github.com/user-attachments/assets/c2ef5eab-ef10-4b7f-b80b-f367f0427e0e" />
Author
Owner

@cvaz1306 commented on GitHub (Sep 15, 2025):

Just to update you on progress, I've got web search working with brave web search (the free version). I need to finish testing, and then I will submit the PR.

<!-- gh-comment-id:3290155849 --> @cvaz1306 commented on GitHub (Sep 15, 2025): Just to update you on progress, I've got web search working with brave web search (the free version). I need to finish testing, and then I will submit the PR.
Author
Owner

@chip902 commented on GitHub (Sep 15, 2025):

You are a saint and a scholar good sir.

<!-- gh-comment-id:3290181027 --> @chip902 commented on GitHub (Sep 15, 2025): You are a saint and a scholar good sir.
Author
Owner

@cvaz1306 commented on GitHub (Sep 15, 2025):

@chip902 Here is the pull request if you want to check it out, or draw attention to the maintainers: https://github.com/open-webui/open-webui/pull/17449
FYI if you want to use this, it currently uses environment variables, because I haven't quite figured out how the admin settings interface code works. Hopefully that isn't a problem for you. It's not quite finished yet, and im working on getting the tests to pass.

<!-- gh-comment-id:3290219147 --> @cvaz1306 commented on GitHub (Sep 15, 2025): @chip902 Here is the pull request if you want to check it out, or draw attention to the maintainers: https://github.com/open-webui/open-webui/pull/17449 FYI if you want to use this, it currently uses environment variables, because I haven't quite figured out how the admin settings interface code works. Hopefully that isn't a problem for you. It's not quite finished yet, and im working on getting the tests to pass.
Author
Owner

@glantucan commented on GitHub (Nov 2, 2025):

I am still having this problem and it's not only with the free version of the API. It's happening also with the base plan which is supposed to support 20 requests/second

<!-- gh-comment-id:3478160056 --> @glantucan commented on GitHub (Nov 2, 2025): I am still having this problem and it's not only with the free version of the API. It's happening also with the base plan which is supposed to support 20 requests/second
Author
Owner

@mp3bruh commented on GitHub (Nov 16, 2025):

A temporary solution is to specify in the prompt that the AI should use only one search request.

<!-- gh-comment-id:3539407871 --> @mp3bruh commented on GitHub (Nov 16, 2025): A temporary solution is to specify in the prompt that the AI should use only one search request.
Author
Owner

@cvaz1306 commented on GitHub (Nov 18, 2025):

A temporary solution is to specify in the prompt that the AI should use only one search request.

I tried that and it didn't solve the issue.

<!-- gh-comment-id:3544652935 --> @cvaz1306 commented on GitHub (Nov 18, 2025): > A temporary solution is to specify in the prompt that the AI should use only one search request. I tried that and it didn't solve the issue.
Author
Owner

@atomlab commented on GitHub (Dec 19, 2025):

I'm having the exact same issue!

I'm using the Brave Search API on the free tier and constantly hitting this "1 request per second" limit. OpenWebUI just sends requests one after another with no pause, and I get a 429 error after the very first request.

This really disrupts my workflow and makes the web search feature unusable. It would be great if OpenWebUI had an option for "delay between search requests" or at least some basic throttle in the settings. Even a simple 1-2 second delay would solve 90% of the problems with this API limit.

<!-- gh-comment-id:3675462658 --> @atomlab commented on GitHub (Dec 19, 2025): I'm having the exact same issue! I'm using the Brave Search API on the free tier and constantly hitting this "1 request per second" limit. OpenWebUI just sends requests one after another with no pause, and I get a 429 error after the very first request. This really disrupts my workflow and makes the web search feature unusable. It would be great if OpenWebUI had an option for "delay between search requests" or at least some basic throttle in the settings. Even a simple 1-2 second delay would solve 90% of the problems with this API limit.
Author
Owner

@Classic298 commented on GitHub (Dec 19, 2025):

PR welcome, still. Alternative: upgrade to the paid brave search

<!-- gh-comment-id:3675607655 --> @Classic298 commented on GitHub (Dec 19, 2025): PR welcome, still. Alternative: upgrade to the paid brave search
Author
Owner

@Classic298 commented on GitHub (Dec 21, 2025):

should be addressed by

https://github.com/open-webui/open-webui/pull/20070

<!-- gh-comment-id:3678748338 --> @Classic298 commented on GitHub (Dec 21, 2025): should be addressed by https://github.com/open-webui/open-webui/pull/20070
Author
Owner

@jocull commented on GitHub (Dec 26, 2025):

With the latest release this is still happening for me. I've set the new concurrency setting to 1, but the Brave API (free tier) is 1 request per second.

If the LLM decides to generate multiple search terms, the second request will fire immediately after the first, and is almost guaranteed to trip the rate limit. A simple backoff, retry, or even just user-specified delay between requests would do the trick 🙏

<!-- gh-comment-id:3693387290 --> @jocull commented on GitHub (Dec 26, 2025): With the latest release this is still happening for me. I've set the new concurrency setting to 1, but the Brave API (free tier) is 1 request per second. If the LLM decides to generate multiple search terms, the second request will fire immediately after the first, and is almost guaranteed to trip the rate limit. A simple backoff, retry, or even just user-specified delay between requests would do the trick 🙏
Author
Owner

@Classic298 commented on GitHub (Dec 26, 2025):

Ok so besides setting a concurrency limit you also need something like a "requests per second" .. no.. something like a "time between requests" environment variable?

<!-- gh-comment-id:3693394306 --> @Classic298 commented on GitHub (Dec 26, 2025): Ok so besides setting a concurrency limit you also need something like a "requests per second" .. no.. something like a "time between requests" environment variable?
Author
Owner

@jocull commented on GitHub (Dec 26, 2025):

Maybe -- it's just like a quick and dirty workaround. Respecting the 429 responses for rate limiting and applying any back off would be more proper.

Here's a quick example of what I did just to test the theory: b44bddad9a

It seems to work here for me. Just trying to pitch solutions with minimum friction 🙏

<!-- gh-comment-id:3693400482 --> @jocull commented on GitHub (Dec 26, 2025): Maybe -- it's just like a quick and dirty workaround. Respecting the 429 responses for rate limiting and applying any back off would be more proper. Here's a quick example of what I did just to test the theory: https://github.com/open-webui/open-webui/commit/b44bddad9a4498324c2328bceeaf4bba8a0bf58f It seems to work here for me. Just trying to pitch solutions with minimum friction 🙏
Author
Owner

@Rudd-O commented on GitHub (Dec 30, 2025):

Yes if the software respected 429 then everything would work flawlessly. That would be the real fix.

<!-- gh-comment-id:3697870607 --> @Rudd-O commented on GitHub (Dec 30, 2025): Yes if the software respected 429 then everything would work flawlessly. That would be the real fix.
Author
Owner

@Classic298 commented on GitHub (Dec 30, 2025):

Someone in the discussion for brave wrote that setting the concurrency to 1 fixed it for them. But for you it didn't fix it. Weird. Brave api is strange

<!-- gh-comment-id:3698924901 --> @Classic298 commented on GitHub (Dec 30, 2025): Someone in the discussion for brave wrote that setting the concurrency to 1 fixed it for them. But for you it didn't fix it. Weird. Brave api is strange
Author
Owner

@Rudd-O commented on GitHub (Dec 30, 2025):

The problem is that it isn't enough to limit concurrency to 1 request for the Brave API . If your internet is fast enough, you will hit the additional limit of one request per second. The docs are clear: it's not just "one request, none in parallel" -- it's one request per second.

Again, if all 429s were retried by duly following what the HTTP headers say on the 429 reply, it would work perfectly. That's the real fix.

<!-- gh-comment-id:3699059961 --> @Rudd-O commented on GitHub (Dec 30, 2025): The problem is that **it isn't enough to limit concurrency** to 1 request for the Brave API . If your internet is fast enough, you will hit the additional limit of one request **per second**. The docs are clear: it's not just "one request, none in parallel" -- it's _one request_ **per second**. Again, if all 429s were retried by duly following what the HTTP headers say on the 429 reply, it would work perfectly. That's the real fix.
Author
Owner

@Classic298 commented on GitHub (Dec 30, 2025):

thanks let me attempt to make a fix in case your internet is fast enough to send more than 1 request per second despite concurrency being set to 1 specifically for brave

<!-- gh-comment-id:3699154522 --> @Classic298 commented on GitHub (Dec 30, 2025): thanks let me attempt to make a fix in case your internet is fast enough to send more than 1 request per second despite concurrency being set to 1 specifically for brave
Author
Owner

@Classic298 commented on GitHub (Dec 30, 2025):

https://github.com/open-webui/open-webui/pull/20255 this should do it fam, testing wanted, but not necessarily needed

<!-- gh-comment-id:3699163720 --> @Classic298 commented on GitHub (Dec 30, 2025): https://github.com/open-webui/open-webui/pull/20255 this should do it fam, testing wanted, but not necessarily needed
Author
Owner

@Classic298 commented on GitHub (Dec 30, 2025):

fixed by PR

<!-- gh-comment-id:3699383234 --> @Classic298 commented on GitHub (Dec 30, 2025): fixed by PR
Author
Owner

@jocull commented on GitHub (Dec 31, 2025):

Thanks for the PR! I still have concerns this won’t address the issue in many cases since it only retries once. It seems like if the LLM decides to generate many search terms and requests were trying to run concurrently there could still be contention. A small retry loop may resolve more reliably. Maybe 5-10 attempts before giving up?

I am also surprised to see no test coverage required for this, but not my project so I won’t complain. I was just scared to contribute for stepping into complex testing 😅

<!-- gh-comment-id:3702731178 --> @jocull commented on GitHub (Dec 31, 2025): Thanks for the PR! I still have concerns this won’t address the issue in many cases since it only retries once. It seems like if the LLM decides to generate many search terms and requests were trying to run concurrently there could still be contention. A small retry loop may resolve more reliably. Maybe 5-10 attempts before giving up? I am also surprised to see no test coverage required for this, but not my project so I won’t complain. I was just scared to contribute for stepping into complex testing 😅
Author
Owner

@Classic298 commented on GitHub (Dec 31, 2025):

@jocull set the concurrency to 1 and based on everything you guys reported here it has to work

Sleep of 1 second
And concurrency of 1 meaning only 1 request at a time will keep in line with brave' free rate limits.

<!-- gh-comment-id:3702944049 --> @Classic298 commented on GitHub (Dec 31, 2025): @jocull set the concurrency to 1 and based on everything you guys reported here it has to work Sleep of 1 second And concurrency of 1 meaning only 1 request at a time will keep in line with brave' free rate limits.
Author
Owner

@jocull commented on GitHub (Jan 1, 2026):

Ahhh good thought, thanks. I do have that set to 1 already.

Will that also be true in a multi-user environment where multiple chats may be going on at the same time?

<!-- gh-comment-id:3703674104 --> @jocull commented on GitHub (Jan 1, 2026): Ahhh good thought, thanks. I do have that set to 1 already. Will that also be true in a multi-user environment where multiple chats may be going on at the same time?
Author
Owner

@Classic298 commented on GitHub (Jan 1, 2026):

@jocull

No

The concurrency is per web search request

If somehow... three users use the web search at the very exact same time (same second) then there'd be three separate threadpools, all three of which are bound to single query concurrency, therefore there will be three requests sent in that second to the configured API endpoint.

(to be fair, in an environment where this is a problem where you have multiple truly concurrent web searches in the very same second (implying hundreds of concurrent users) you would usually use a paid search api and not brave free tier.

<!-- gh-comment-id:3703709502 --> @Classic298 commented on GitHub (Jan 1, 2026): @jocull No The concurrency is per web search request If somehow... three users use the web search at the very exact same time (same second) then there'd be three separate threadpools, all three of which are bound to single query concurrency, therefore there will be three requests sent in that second to the configured API endpoint. (to be fair, in an environment where this is a problem where you have **multiple truly <ins>concurrent web searches in the very same second</ins>** (implying hundreds of concurrent users) you would usually use a paid search api and not brave free tier.
Author
Owner

@Rudd-O commented on GitHub (Jan 4, 2026):

https://github.com/open-webui/open-webui/pull/20255 this should do it fam, testing wanted, but not necessarily needed

KING! BASED!

<!-- gh-comment-id:3708053815 --> @Rudd-O commented on GitHub (Jan 4, 2026): >https://github.com/open-webui/open-webui/pull/20255 this should do it fam, testing wanted, but not necessarily needed KING! BASED!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#120790