issue: [Timeout] Large file processing hangs and never finishes KB ingestion #5372

Closed
opened 2025-11-11 16:19:02 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @kcambrek on GitHub (May 28, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.11

Ollama Version (if applicable)

No response

Operating System

OpenShift/Kubernetes

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

  • Even if the HTTP request times out, the file should still get added to the KB once processing completes.
  • Some kind of queueing system would be nice so that I get confirmation that something is added to a queue and so that the processing is done by other workers than Openweb UI

Actual Behavior

  • Both API calls return HTTP 504.
  • File processing continues in the background (per logs), but the file never ends up in the KB.

Steps to Reproduce

Paste the snippet below into issue_demo.py, filling in WEBUI_URL, TOKEN and a large FILE_BINARY.
Run python issue_demo.py.
Observe the 504 responses from process/file and .../file/add.
Check OpenWebUI logs: embeddings are generated, but KB stays empty.

Logs & Screenshots

import logging
import requests

# ——— Config ———
WEBUI_URL = "https://…your-webui-host…"
TOKEN     = "YOUR_TOKEN_HERE"
KB_NAME   = "my_kb"
FILE_NAME = "large_file.txt"
FILE_BINARY = b"…"  # your file bytes here

# ——— Logging ———
logging.basicConfig(level=logging.INFO)

# ——— API Helpers ———
def get_knowledge_list():
    r = requests.get(
        f"{WEBUI_URL}/api/v1/knowledge/list",
        headers={"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"},
        timeout=10
    )
    return r.json()

def create_knowledge_base(name):
    r = requests.post(
        f"{WEBUI_URL}/api/v1/knowledge/create",
        headers={
            "Authorization": f"Bearer {TOKEN}",
            "Content-Type": "application/json"
        },
        json={"name": name},
        timeout=10
    )
    return r.json()

def upload_file_data(file_data, filename):
    r = requests.post(
        f"{WEBUI_URL}/api/v1/files/",
        headers={"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"},
        params={"process": False},
        files={"file": (filename, file_data)},
        timeout=(10, 300)
    )
    logging.info("Upload returned %s", r.status_code)
    return r.json()

def process_file(file_id):
    r = requests.post(
        f"{WEBUI_URL}/api/v1/retrieval/process/file",
        headers={"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"},
        json={"file_id": file_id},
        timeout=10
    )
    logging.info("Process returned %s", r.status_code)

def add_file_to_knowledge(knowledge_id, file_id):
    r = requests.post(
        f"{WEBUI_URL}/api/v1/knowledge/{knowledge_id}/file/add",
        headers={
            "Authorization": f"Bearer {TOKEN}",
            "Content-Type": "application/json"
        },
        json={"file_id": file_id},
        timeout=10
    )
    logging.info("Attach returned %s", r.status_code)

# ——— Main Flow ———
if __name__ == "__main__":
    # 1. Ensure KB exists
    kbs = get_knowledge_list()
    kb = next((x for x in kbs if x["name"] == KB_NAME), None)
    if not kb:
        kb = create_knowledge_base(KB_NAME)

    # 2. Upload binary
    upload_resp = upload_file_data(FILE_BINARY, FILE_NAME)
    file_id = upload_resp.get("id")

    # 3. Process → expect 504/time-out
    process_file(file_id)

    # 4. Attach to KB → also may 504 but should still work
    add_file_to_knowledge(kb["id"], file_id)

Sample logs

2025-05-28 16:34:07.226 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:969 - save_docs_to_vector_db: document long_file.txt 912ca4e7-66fe-45cf-a1bb-43aac84391c5 - {}
2025-05-28 16:34:07.242 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:994 - Using token text splitter: cl100k_base - {}
2025-05-28 16:34:07.443 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1052 - adding to collection 912ca4e7-66fe-45cf-a1bb-43aac84391c5 - {}
2025-05-28 16:34:07.445 | DEBUG | open_webui.retrieval.utils:generate_ollama_batch_embeddings:709 - generate_ollama_batch_embeddings:model yxchia/multilingual-e5-base batch size: 16 - {}
2025-05-28 16:34:07.449 | DEBUG | urllib3.connectionpool:_new_conn:241 - Starting new HTTP connection (1): llm-host.svc.cluster.local:8000 - {}

Additional Information

Notes

  • Setting process=False at upload avoids immediate timeout, and returns a file_id that is needed later for adding the processed file to the knowledge base. Manual process/file still 504s.
  • Even after waiting 30+ minutes, the file doesn’t show up in KB.
Originally created by @kcambrek on GitHub (May 28, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.11 ### Ollama Version (if applicable) _No response_ ### Operating System OpenShift/Kubernetes ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior * Even if the HTTP request times out, the file should still get added to the KB once processing completes. * Some kind of queueing system would be nice so that I get confirmation that something is added to a queue and so that the processing is done by other workers than Openweb UI ### Actual Behavior * Both API calls return HTTP 504. * File processing continues in the background (per logs), but the file never ends up in the KB. ### Steps to Reproduce Paste the snippet below into issue_demo.py, filling in WEBUI_URL, TOKEN and a large FILE_BINARY. Run python issue_demo.py. Observe the 504 responses from process/file and .../file/add. Check OpenWebUI logs: embeddings are generated, but KB stays empty. ### Logs & Screenshots ```python import logging import requests # ——— Config ——— WEBUI_URL = "https://…your-webui-host…" TOKEN = "YOUR_TOKEN_HERE" KB_NAME = "my_kb" FILE_NAME = "large_file.txt" FILE_BINARY = b"…" # your file bytes here # ——— Logging ——— logging.basicConfig(level=logging.INFO) # ——— API Helpers ——— def get_knowledge_list(): r = requests.get( f"{WEBUI_URL}/api/v1/knowledge/list", headers={"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"}, timeout=10 ) return r.json() def create_knowledge_base(name): r = requests.post( f"{WEBUI_URL}/api/v1/knowledge/create", headers={ "Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json" }, json={"name": name}, timeout=10 ) return r.json() def upload_file_data(file_data, filename): r = requests.post( f"{WEBUI_URL}/api/v1/files/", headers={"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"}, params={"process": False}, files={"file": (filename, file_data)}, timeout=(10, 300) ) logging.info("Upload returned %s", r.status_code) return r.json() def process_file(file_id): r = requests.post( f"{WEBUI_URL}/api/v1/retrieval/process/file", headers={"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"}, json={"file_id": file_id}, timeout=10 ) logging.info("Process returned %s", r.status_code) def add_file_to_knowledge(knowledge_id, file_id): r = requests.post( f"{WEBUI_URL}/api/v1/knowledge/{knowledge_id}/file/add", headers={ "Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json" }, json={"file_id": file_id}, timeout=10 ) logging.info("Attach returned %s", r.status_code) # ——— Main Flow ——— if __name__ == "__main__": # 1. Ensure KB exists kbs = get_knowledge_list() kb = next((x for x in kbs if x["name"] == KB_NAME), None) if not kb: kb = create_knowledge_base(KB_NAME) # 2. Upload binary upload_resp = upload_file_data(FILE_BINARY, FILE_NAME) file_id = upload_resp.get("id") # 3. Process → expect 504/time-out process_file(file_id) # 4. Attach to KB → also may 504 but should still work add_file_to_knowledge(kb["id"], file_id) ``` **Sample logs** ``` 2025-05-28 16:34:07.226 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:969 - save_docs_to_vector_db: document long_file.txt 912ca4e7-66fe-45cf-a1bb-43aac84391c5 - {} 2025-05-28 16:34:07.242 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:994 - Using token text splitter: cl100k_base - {} 2025-05-28 16:34:07.443 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1052 - adding to collection 912ca4e7-66fe-45cf-a1bb-43aac84391c5 - {} 2025-05-28 16:34:07.445 | DEBUG | open_webui.retrieval.utils:generate_ollama_batch_embeddings:709 - generate_ollama_batch_embeddings:model yxchia/multilingual-e5-base batch size: 16 - {} 2025-05-28 16:34:07.449 | DEBUG | urllib3.connectionpool:_new_conn:241 - Starting new HTTP connection (1): llm-host.svc.cluster.local:8000 - {} ``` --- ### Additional Information **Notes** * Setting `process=False` at upload avoids immediate timeout, and returns a file_id that is needed later for adding the processed file to the knowledge base. Manual `process/file` still 504s. * Even after waiting 30+ minutes, the file doesn’t show up in KB.
GiteaMirror added the bug label 2025-11-11 16:19:02 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#5372