[GH-ISSUE #22571] issue: Silent File Drop: Embedding 503 When Ollama Model TTL Expires During Docling Processing #19752

Closed
opened 2026-04-20 02:15:31 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @jannefleischer on GitHub (Mar 11, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22571

Disclaimer: The analysis of this issue has main been done by gpt5-mini and claude sonnet 4.6 :/

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

main branch, tested March 2026 ( WEBUI_BUILD_VERSION=a7271532f8a38da46785afcaa7e65f9a45e7d753 )

Ollama Version (if applicable)

0.17.7

Operating System

Linux (WSL2 on Windows, Ubuntu-based).

Browser (if applicable)

Not relevant — error occurs fully server-side and is not surfaced in the browser at all.

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When a large file (e.g. a multi-hundred-page PDF) is added to a Knowledge Collection
and processed by an external document processor (docling-serve), the resulting text
chunks should be embedded and the file should appear in the collection with its chunks
searchable.

If embedding fails for any reason, the file should be marked as failed in the UI
with a visible error message, so the user knows to retry.

Actual Behavior

The file silently disappears. No error is shown in the UI. The file upload appears
to succeed, but the Knowledge Collection remains empty (or the file is not linked).

Internally, the file's data.status in the SQLite DB either stays null or never
reaches completed. The vector store has no chunks for this file.

Steps to Reproduce

Prerequisites:

  • Open WebUI with an external document processor (e.g. docling-serve) configured as
    the content extraction backend
  • Ollama as the embedding provider with default keep-alive TTL (5 minutes)
  • A large PDF (200+ pages, taking >5 minutes for docling to process)

Steps:

  1. Ensure the Ollama embedding model (e.g. bge-m3) is loaded (make any embedding
    request to warm it up, or simply open a chat)
  2. Wait at least 5 minutes without any embedding activity so the model TTL expires
    and Ollama unloads it (verify: curl http://localhost:11434/api/ps should show
    no loaded models)
  3. Navigate to Workspace → Knowledge → open a Knowledge Collection
  4. Click "+ Add Files" or drag-drop a large PDF (200+ pages)
  5. Observe the upload spinner — it will complete without error
  6. Wait for docling to finish processing (monitor via docker logs -f docling-serve;
    look for completion of the file)
  7. Reload the Knowledge Collection page
  8. Result: The file does not appear in the collection, or appears with 0 chunks

What is happening internally:

  • After docling finishes, open-webui calls process_file() which fires all embedding
    batch requests simultaneously as parallel async coroutines
  • Ollama is in the process of reloading the model (triggered by the first request)
  • All concurrent requests receive 503 Service Unavailable
  • The embedding code raises IndexError: list index out of range (the response body
    is empty/unexpected and index 0 does not exist)
  • This exception propagates and the entire file processing is aborted silently

Logs & Screenshots

Docker container logs (docker logs open-webui 2>&1 | grep -A2 "503\|IndexError"):

503 Service Unavailable — ollama:11434/api/embed
...
IndexError: list index out of range

SQLite state after the failed upload:

-- file.data is NULL or {"status": "pending"} — never reached "completed"
SELECT id, filename, data FROM file ORDER BY created_at DESC LIMIT 5;

-- file is NOT in knowledge_file
SELECT COUNT(*) FROM knowledge_file WHERE knowledge_id = '<your-collection-id>';

Additional Information

Root cause (code):

  • routers/knowledge.py ~L407–465: add_file_to_knowledge_by_id calls process_file()
    synchronously with no retry logic
  • The embedding utility fires all batches in parallel; a single 503 on any batch
    causes an unhandled IndexError

Workaround (confirmed working):

Add to .env:

OLLAMA_KEEP_ALIVE=30m
RAG_EMBEDDING_BATCH_SIZE=64

OLLAMA_KEEP_ALIVE=30m prevents TTL expiry during long docling processing runs.
RAG_EMBEDDING_BATCH_SIZE=64 reduces the number of parallel requests, lowering
collision probability if Ollama is briefly busy.

Suggested fix:

Add retry with exponential backoff for 503 responses in the embedding utility
(retrieval/utils.py or wherever batches are assembled):

async def embed_with_retry(batch, max_retries=5, base_delay=2.0):
    for attempt in range(max_retries):
        try:
            return await embed(batch)
        except Exception as e:
            if "503" in str(e) and attempt < max_retries - 1:
                await asyncio.sleep(base_delay * (2 ** attempt))
                continue
            raise

Alternatively, mark the file as failed with a user-visible error instead of
crashing silently.



Originally created by @jannefleischer on GitHub (Mar 11, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/22571 ## Disclaimer: The analysis of this issue has main been done by gpt5-mini and claude sonnet 4.6 :/ ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version `main` branch, tested March 2026 ( WEBUI_BUILD_VERSION=a7271532f8a38da46785afcaa7e65f9a45e7d753 ) ### Ollama Version (if applicable) 0.17.7 ### Operating System Linux (WSL2 on Windows, Ubuntu-based). ### Browser (if applicable) Not relevant — error occurs fully server-side and is not surfaced in the browser at all. ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When a large file (e.g. a multi-hundred-page PDF) is added to a Knowledge Collection and processed by an external document processor (docling-serve), the resulting text chunks should be embedded and the file should appear in the collection with its chunks searchable. If embedding fails for any reason, the file should be marked as `failed` in the UI with a visible error message, so the user knows to retry. ### Actual Behavior The file silently disappears. No error is shown in the UI. The file upload appears to succeed, but the Knowledge Collection remains empty (or the file is not linked). Internally, the file's `data.status` in the SQLite DB either stays `null` or never reaches `completed`. The vector store has no chunks for this file. ### Steps to Reproduce **Prerequisites:** - Open WebUI with an external document processor (e.g. docling-serve) configured as the content extraction backend - Ollama as the embedding provider with default keep-alive TTL (5 minutes) - A large PDF (200+ pages, taking >5 minutes for docling to process) **Steps:** 1. Ensure the Ollama embedding model (e.g. `bge-m3`) is loaded (make any embedding request to warm it up, or simply open a chat) 2. Wait at least 5 minutes without any embedding activity so the model TTL expires and Ollama unloads it (verify: `curl http://localhost:11434/api/ps` should show no loaded models) 3. Navigate to Workspace → Knowledge → open a Knowledge Collection 4. Click "+ Add Files" or drag-drop a large PDF (200+ pages) 5. Observe the upload spinner — it will complete without error 6. Wait for docling to finish processing (monitor via `docker logs -f docling-serve`; look for completion of the file) 7. Reload the Knowledge Collection page 8. **Result:** The file does not appear in the collection, or appears with 0 chunks **What is happening internally:** - After docling finishes, open-webui calls `process_file()` which fires all embedding batch requests **simultaneously as parallel async coroutines** - Ollama is in the process of reloading the model (triggered by the first request) - All concurrent requests receive `503 Service Unavailable` - The embedding code raises `IndexError: list index out of range` (the response body is empty/unexpected and index 0 does not exist) - This exception propagates and the entire file processing is aborted silently ### Logs & Screenshots Docker container logs (`docker logs open-webui 2>&1 | grep -A2 "503\|IndexError"`): ``` 503 Service Unavailable — ollama:11434/api/embed ... IndexError: list index out of range ``` SQLite state after the failed upload: ```sql -- file.data is NULL or {"status": "pending"} — never reached "completed" SELECT id, filename, data FROM file ORDER BY created_at DESC LIMIT 5; -- file is NOT in knowledge_file SELECT COUNT(*) FROM knowledge_file WHERE knowledge_id = '<your-collection-id>'; ``` ### Additional Information **Root cause (code):** - `routers/knowledge.py` ~L407–465: `add_file_to_knowledge_by_id` calls `process_file()` synchronously with no retry logic - The embedding utility fires all batches in parallel; a single 503 on any batch causes an unhandled `IndexError` **Workaround** (confirmed working): Add to `.env`: ```env OLLAMA_KEEP_ALIVE=30m RAG_EMBEDDING_BATCH_SIZE=64 ``` `OLLAMA_KEEP_ALIVE=30m` prevents TTL expiry during long docling processing runs. `RAG_EMBEDDING_BATCH_SIZE=64` reduces the number of parallel requests, lowering collision probability if Ollama is briefly busy. **Suggested fix:** Add retry with exponential backoff for 503 responses in the embedding utility (`retrieval/utils.py` or wherever batches are assembled): ```python async def embed_with_retry(batch, max_retries=5, base_delay=2.0): for attempt in range(max_retries): try: return await embed(batch) except Exception as e: if "503" in str(e) and attempt < max_retries - 1: await asyncio.sleep(base_delay * (2 ** attempt)) continue raise ``` Alternatively, mark the file as `failed` with a user-visible error instead of crashing silently. --- ---
GiteaMirror added the bug label 2026-04-20 02:15:31 -05:00
Author
Owner

@tjbck commented on GitHub (Mar 25, 2026):

Addressed in dev.

<!-- gh-comment-id:4122477694 --> @tjbck commented on GitHub (Mar 25, 2026): Addressed in dev.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#19752