[PR #23810] fix: honour HTTP proxy env vars for DuckDuckGo search and URL fetching #27383

Open
opened 2026-04-20 07:02:48 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/23810
Author: @imoes
Created: 4/16/2026
Status: 🔄 Open

Base: devHead: fix/corporate-proxy-web-search


📝 Commits (3)

  • 4b610a3 fix: honour HTTP proxy env vars for DuckDuckGo search and URL fetching
  • cfca838 fix: robustly detect HTTP proxy for DuckDuckGo search across ddgs versions
  • 0c68d26 fix: pass HTTP proxy env vars to Tencent Cloud SDK in sougou.py

📊 Changes

3 files changed (+36 additions, -7 deletions)

View changed files

📝 backend/open_webui/retrieval/web/duckduckgo.py (+14 -1)
📝 backend/open_webui/retrieval/web/sougou.py (+10 -0)
📝 backend/open_webui/retrieval/web/utils.py (+12 -6)

📄 Description

Pull Request Checklist

  • Target branch: dev
  • Description: Provided below.
  • Changelog: Added below.
  • Documentation: No new env vars; existing WEB_SEARCH_TRUST_ENV and standard http_proxy/https_proxy behaviour unchanged.
  • Dependencies: No new dependencies.
  • Testing: Manually verified in a Docker container behind a corporate HTTP proxy (proxy.ippen.media:80). Both DuckDuckGo search and URL fetching confirmed working after fix.
  • Agentic AI Code: AI-assisted, with additional human review and manual testing in production.
  • Code review: Self-reviewed.
  • Git Hygiene: Single logical change, rebased on dev.

Problem

Web search failed when Open WebUI runs behind a corporate HTTP proxy with http_proxy/https_proxy set as Docker environment variables.

1. DuckDuckGo search (duckduckgo.py)

DDGS() was instantiated without a proxy argument, so all outgoing requests bypassed the proxy and were blocked.

A partial fix (passing proxy=os.environ.get("https_proxy") to DDGS()) worked with one version of the ddgs package but broke again when the ghcr.io/open-webui/open-webui:main image was repulled with a newer ddgs version. The newer version uses a different internal httpx client configuration that requires uppercase HTTPS_PROXY/HTTP_PROXY env vars to reliably detect proxy settings.

2. URL fetching after search (utils.py)

SafeWebBaseLoader._fetch created an aiohttp.ClientSession with trust_env effectively hardcoded to False. Because WEB_SEARCH_TRUST_ENV is managed via PersistentConfig (database-backed), a False value stored in the DB on first run silently overrides the WEB_SEARCH_TRUST_ENV=true env var on every subsequent container restart.

Fix

duckduckgo.py

  • Read proxy from all four env var variants (https_proxy, HTTPS_PROXY, http_proxy, HTTP_PROXY).
  • Use os.environ.setdefault() to ensure uppercase HTTPS_PROXY/HTTP_PROXY are set before DDGS is instantiated, so httpx picks up the proxy automatically regardless of which ddgs version is installed.
  • Pass the proxy explicitly to DDGS(proxy=proxy) as well.

utils.py

  • Change all trust_env: bool = False parameter defaults to True across all loader classes and get_web_loader().
  • In SafeWebBaseLoader._fetch, compute effective_trust_env = self.trust_env or bool(os.environ.get("https_proxy") or os.environ.get("http_proxy")) so the aiohttp session always uses the proxy when proxy env vars are present, regardless of the DB-cached config value.

Reproduction

Deploy Open WebUI in Docker with:

environment:
  http_proxy: "http://proxy.example.com:80"
  https_proxy: "http://proxy.example.com:80"
  • Before fix: DuckDuckGo search → ConnectError; URL fetching → Connection timeout
  • After fix: Both work correctly (verified in production)

Changelog Entry

Fixed

  • duckduckgo.py: Robustly detect HTTP proxy across ddgs versions by reading all four proxy env var variants and ensuring uppercase HTTPS_PROXY/HTTP_PROXY are set for httpx compatibility. Also pass proxy explicitly to DDGS(proxy=proxy).
  • utils.py: SafeWebBaseLoader._fetch now auto-enables trust_env when proxy env vars are present, bypassing the PersistentConfig DB-cached False value that would otherwise silently ignore the proxy.
  • utils.py: Changed trust_env default from False to True in all loader classes and get_web_loader().

Contributor License Agreement


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/23810 **Author:** [@imoes](https://github.com/imoes) **Created:** 4/16/2026 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `fix/corporate-proxy-web-search` --- ### 📝 Commits (3) - [`4b610a3`](https://github.com/open-webui/open-webui/commit/4b610a3e1c37059152b5f3a70cf51c1b77fcd2d8) fix: honour HTTP proxy env vars for DuckDuckGo search and URL fetching - [`cfca838`](https://github.com/open-webui/open-webui/commit/cfca838549c8512cd8f06c99887e883c4b84c6d9) fix: robustly detect HTTP proxy for DuckDuckGo search across ddgs versions - [`0c68d26`](https://github.com/open-webui/open-webui/commit/0c68d269a8c79de90095cd75561fe0a0c4451b19) fix: pass HTTP proxy env vars to Tencent Cloud SDK in sougou.py ### 📊 Changes **3 files changed** (+36 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/web/duckduckgo.py` (+14 -1) 📝 `backend/open_webui/retrieval/web/sougou.py` (+10 -0) 📝 `backend/open_webui/retrieval/web/utils.py` (+12 -6) </details> ### 📄 Description # Pull Request Checklist - [x] **Target branch:** `dev` - [x] **Description:** Provided below. - [x] **Changelog:** Added below. - [ ] **Documentation:** No new env vars; existing `WEB_SEARCH_TRUST_ENV` and standard `http_proxy`/`https_proxy` behaviour unchanged. - [x] **Dependencies:** No new dependencies. - [x] **Testing:** Manually verified in a Docker container behind a corporate HTTP proxy (`proxy.ippen.media:80`). Both DuckDuckGo search and URL fetching confirmed working after fix. - [x] **Agentic AI Code:** AI-assisted, with additional human review and manual testing in production. - [x] **Code review:** Self-reviewed. - [x] **Git Hygiene:** Single logical change, rebased on `dev`. --- ## Problem Web search failed when Open WebUI runs behind a corporate HTTP proxy with `http_proxy`/`https_proxy` set as Docker environment variables. ### 1. DuckDuckGo search (`duckduckgo.py`) `DDGS()` was instantiated without a `proxy` argument, so all outgoing requests bypassed the proxy and were blocked. A partial fix (passing `proxy=os.environ.get("https_proxy")` to `DDGS()`) worked with one version of the `ddgs` package but broke again when the `ghcr.io/open-webui/open-webui:main` image was repulled with a newer `ddgs` version. The newer version uses a different internal httpx client configuration that requires uppercase `HTTPS_PROXY`/`HTTP_PROXY` env vars to reliably detect proxy settings. ### 2. URL fetching after search (`utils.py`) `SafeWebBaseLoader._fetch` created an `aiohttp.ClientSession` with `trust_env` effectively hardcoded to `False`. Because `WEB_SEARCH_TRUST_ENV` is managed via `PersistentConfig` (database-backed), a `False` value stored in the DB on first run silently overrides the `WEB_SEARCH_TRUST_ENV=true` env var on every subsequent container restart. ## Fix **`duckduckgo.py`** - Read proxy from all four env var variants (`https_proxy`, `HTTPS_PROXY`, `http_proxy`, `HTTP_PROXY`). - Use `os.environ.setdefault()` to ensure uppercase `HTTPS_PROXY`/`HTTP_PROXY` are set before `DDGS` is instantiated, so httpx picks up the proxy automatically regardless of which `ddgs` version is installed. - Pass the proxy explicitly to `DDGS(proxy=proxy)` as well. **`utils.py`** - Change all `trust_env: bool = False` parameter defaults to `True` across all loader classes and `get_web_loader()`. - In `SafeWebBaseLoader._fetch`, compute `effective_trust_env = self.trust_env or bool(os.environ.get("https_proxy") or os.environ.get("http_proxy"))` so the aiohttp session always uses the proxy when proxy env vars are present, regardless of the DB-cached config value. ## Reproduction Deploy Open WebUI in Docker with: ```yaml environment: http_proxy: "http://proxy.example.com:80" https_proxy: "http://proxy.example.com:80" ``` - **Before fix:** DuckDuckGo search → `ConnectError`; URL fetching → `Connection timeout` - **After fix:** Both work correctly (verified in production) --- # Changelog Entry ### Fixed - `duckduckgo.py`: Robustly detect HTTP proxy across `ddgs` versions by reading all four proxy env var variants and ensuring uppercase `HTTPS_PROXY`/`HTTP_PROXY` are set for httpx compatibility. Also pass proxy explicitly to `DDGS(proxy=proxy)`. - `utils.py`: `SafeWebBaseLoader._fetch` now auto-enables `trust_env` when proxy env vars are present, bypassing the `PersistentConfig` DB-cached `False` value that would otherwise silently ignore the proxy. - `utils.py`: Changed `trust_env` default from `False` to `True` in all loader classes and `get_web_loader()`. --- ### Contributor License Agreement - [x] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 07:02:48 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#27383