bug: Allow Web Search RAG to Continue on Invalid Hostname Resolution #1137

Closed
opened 2025-11-11 14:38:28 -06:00 by GiteaMirror · 2 comments
Owner

Originally created by @0zheermao0 on GitHub (Jun 5, 2024).

Bug Report

Description

Bug Summary:
When using the Web Search RAG feature, if the search engine returns a hostname that the server cannot resolve, all search results are blocked from proceeding to the next step of RAG processing.

Steps to Reproduce:
1.Use the Web Search RAG feature with a URL that points to a hostname that cannot be resolved by the DNS.
2.Observe that once the unresolvable hostname is encountered, all other valid search results are also blocked and do not proceed to the next step.

Expected Behavior:
If a hostname cannot be resolved, that specific URL should be ignored, and the Web Search RAG should continue processing the remaining valid URLs without interruption.

Actual Behavior:
The Web Search RAG stops processing all URLs and answer directly through the models when encountering an unresolvable hostname.

Environment

  • Open WebUI Version: v0.2.4

  • Ollama (if applicable): 0.1.39

  • Operating System: Windows 11, wsl2, Debian

  • Browser (if applicable): [e.g., Chrome 100.0, Firefox 98.0]

Reproduction Details

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.

Logs and Screenshots

Command Console Logs:

ERROR:apps.rag.main:[Errno -2] Name or service not known
Traceback (most recent call last):
  File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 827, in store_web_search
    loader = get_web_loader(urls)
  File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 694, in get_web_loader
    if not validate_url(url):
  File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 723, in validate_url
    return all(validate_url(u) for u in url)
  File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 723, in <genexpr>
    return all(validate_url(u) for u in url)
  File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 712, in validate_url
    ipv4_addresses, ipv6_addresses = resolve_hostname(parsed_url.hostname)
  File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 731, in resolve_hostname
    addr_info = socket.getaddrinfo(hostname, None)
  File "/root/anaconda3/envs/fresh_pytorch/lib/python3.10/socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
INFO:     127.0.0.1:61451 - "POST /rag/api/v1/web/search HTTP/1.1" 400 Bad Request

image

Installation Method

manual installation

Originally created by @0zheermao0 on GitHub (Jun 5, 2024). # Bug Report ## Description **Bug Summary:** When using the Web Search RAG feature, if the search engine returns a hostname that the server cannot resolve, all search results are blocked from proceeding to the next step of RAG processing. **Steps to Reproduce:** 1.Use the Web Search RAG feature with a URL that points to a hostname that cannot be resolved by the DNS. 2.Observe that once the unresolvable hostname is encountered, all other valid search results are also blocked and do not proceed to the next step. **Expected Behavior:** If a hostname cannot be resolved, that specific URL should be ignored, and the Web Search RAG should continue processing the remaining valid URLs without interruption. **Actual Behavior:** The Web Search RAG stops processing all URLs and answer directly through the models when encountering an unresolvable hostname. ## Environment - **Open WebUI Version:** v0.2.4 - **Ollama (if applicable):** 0.1.39 - **Operating System:** Windows 11, wsl2, Debian - **Browser (if applicable):** [e.g., Chrome 100.0, Firefox 98.0] ## Reproduction Details **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I am on the latest version of both Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. ## Logs and Screenshots **Command Console Logs:** ``` ERROR:apps.rag.main:[Errno -2] Name or service not known Traceback (most recent call last): File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 827, in store_web_search loader = get_web_loader(urls) File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 694, in get_web_loader if not validate_url(url): File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 723, in validate_url return all(validate_url(u) for u in url) File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 723, in <genexpr> return all(validate_url(u) for u in url) File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 712, in validate_url ipv4_addresses, ipv6_addresses = resolve_hostname(parsed_url.hostname) File "/mnt/d/github_project/nlp/open-webui/backend/apps/rag/main.py", line 731, in resolve_hostname addr_info = socket.getaddrinfo(hostname, None) File "/root/anaconda3/envs/fresh_pytorch/lib/python3.10/socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known INFO: 127.0.0.1:61451 - "POST /rag/api/v1/web/search HTTP/1.1" 400 Bad Request ``` ![image](https://github.com/open-webui/open-webui/assets/57663132/62106d80-708d-4bc1-803f-6662cdcfefab) ## Installation Method manual installation
Author
Owner

@qhmhl commented on GitHub (Jun 9, 2024):

i add at the end, now the error as below.
Something went wrong :/ [Errno -2] Name or service not known

@qhmhl commented on GitHub (Jun 9, 2024): i add <query> at the end, now the error as below. Something went wrong :/ [Errno -2] Name or service not known
Author
Owner

@andrebarsotti commented on GitHub (Jun 13, 2024):

Hello,

I'm still encountering this issue; here is the debug prompt. Could anyone assist?

2024-06-13 16:56:31 INFO:root:trying to web search with ('searxng', 'O que é um large language model?')
2024-06-13 16:56:31 DEBUG:apps.rag.search.searxng:searching http://host.docker.internal:8080/search
2024-06-13 16:56:31 DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): host.docker.internal:8080
2024-06-13 16:56:32 DEBUG:urllib3.connectionpool:http://host.docker.internal:8080 "GET /search?q=O+que+%C3%A9+um+large+language+model%3F&format=json&pageno=1&safesearch=1&language=en-US&time_range=&categories=&theme=simple&image_proxy=0 HTTP/1.1" 200 47778
2024-06-13 16:56:33 ERROR:apps.rag.main:[Errno -2] Name or service not known
2024-06-13 16:56:33 Traceback (most recent call last):
2024-06-13 16:56:33   File "/app/backend/apps/rag/main.py", line 852, in store_web_search
2024-06-13 16:56:33     loader = get_web_loader(urls)
2024-06-13 16:56:33              ^^^^^^^^^^^^^^^^^^^^
2024-06-13 16:56:33   File "/app/backend/apps/rag/main.py", line 705, in get_web_loader
2024-06-13 16:56:33     if not validate_url(url):
2024-06-13 16:56:33            ^^^^^^^^^^^^^^^^^
2024-06-13 16:56:33   File "/app/backend/apps/rag/main.py", line 734, in validate_url
2024-06-13 16:56:33     return all(validate_url(u) for u in url)
2024-06-13 16:56:33            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-06-13 16:56:33   File "/app/backend/apps/rag/main.py", line 734, in <genexpr>
2024-06-13 16:56:33     return all(validate_url(u) for u in url)
2024-06-13 16:56:33                ^^^^^^^^^^^^^^^
2024-06-13 16:56:33   File "/app/backend/apps/rag/main.py", line 723, in validate_url
2024-06-13 16:56:33     ipv4_addresses, ipv6_addresses = resolve_hostname(parsed_url.hostname)
2024-06-13 16:56:33                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-06-13 16:56:33   File "/app/backend/apps/rag/main.py", line 741, in resolve_hostname
2024-06-13 16:56:33     addr_info = socket.getaddrinfo(hostname, None)
2024-06-13 16:56:33                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-06-13 16:56:33   File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
2024-06-13 16:56:33     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
2024-06-13 16:56:33                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-06-13 16:56:33 socket.gaierror: [Errno -2] Name or service not known
2024-06-13 16:56:33 INFO:     172.21.0.1:56662 - "POST /rag/api/v1/web/search HTTP/1.1" 400 Bad Request
2024-06-13 16:56:33 DEBUG:main:request.url.path: /ollama/api/chat
@andrebarsotti commented on GitHub (Jun 13, 2024): Hello, I'm still encountering this issue; here is the debug prompt. Could anyone assist? ```Prompt 2024-06-13 16:56:31 INFO:root:trying to web search with ('searxng', 'O que é um large language model?') 2024-06-13 16:56:31 DEBUG:apps.rag.search.searxng:searching http://host.docker.internal:8080/search 2024-06-13 16:56:31 DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): host.docker.internal:8080 2024-06-13 16:56:32 DEBUG:urllib3.connectionpool:http://host.docker.internal:8080 "GET /search?q=O+que+%C3%A9+um+large+language+model%3F&format=json&pageno=1&safesearch=1&language=en-US&time_range=&categories=&theme=simple&image_proxy=0 HTTP/1.1" 200 47778 2024-06-13 16:56:33 ERROR:apps.rag.main:[Errno -2] Name or service not known 2024-06-13 16:56:33 Traceback (most recent call last): 2024-06-13 16:56:33 File "/app/backend/apps/rag/main.py", line 852, in store_web_search 2024-06-13 16:56:33 loader = get_web_loader(urls) 2024-06-13 16:56:33 ^^^^^^^^^^^^^^^^^^^^ 2024-06-13 16:56:33 File "/app/backend/apps/rag/main.py", line 705, in get_web_loader 2024-06-13 16:56:33 if not validate_url(url): 2024-06-13 16:56:33 ^^^^^^^^^^^^^^^^^ 2024-06-13 16:56:33 File "/app/backend/apps/rag/main.py", line 734, in validate_url 2024-06-13 16:56:33 return all(validate_url(u) for u in url) 2024-06-13 16:56:33 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-06-13 16:56:33 File "/app/backend/apps/rag/main.py", line 734, in <genexpr> 2024-06-13 16:56:33 return all(validate_url(u) for u in url) 2024-06-13 16:56:33 ^^^^^^^^^^^^^^^ 2024-06-13 16:56:33 File "/app/backend/apps/rag/main.py", line 723, in validate_url 2024-06-13 16:56:33 ipv4_addresses, ipv6_addresses = resolve_hostname(parsed_url.hostname) 2024-06-13 16:56:33 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-06-13 16:56:33 File "/app/backend/apps/rag/main.py", line 741, in resolve_hostname 2024-06-13 16:56:33 addr_info = socket.getaddrinfo(hostname, None) 2024-06-13 16:56:33 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-06-13 16:56:33 File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo 2024-06-13 16:56:33 for res in _socket.getaddrinfo(host, port, family, type, proto, flags): 2024-06-13 16:56:33 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-06-13 16:56:33 socket.gaierror: [Errno -2] Name or service not known 2024-06-13 16:56:33 INFO: 172.21.0.1:56662 - "POST /rag/api/v1/web/search HTTP/1.1" 400 Bad Request 2024-06-13 16:56:33 DEBUG:main:request.url.path: /ollama/api/chat ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1137