[GH-ISSUE #22573] issue: Files Not Linked to Knowledge Collection When Dropped Simultaneously (FILE_NOT_PROCESSED Race Condition) #58419

Closed
opened 2026-05-05 23:07:14 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @jannefleischer on GitHub (Mar 11, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22573

Disclaimer: The analysis of this issue has main been done by gpt5-mini and claude sonnet 4.6 :/

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

main branch, tested March 2026

Ollama Version (if applicable)

Not directly relevant, but Ollama used as embedding provider. 0.17.7

Operating System

Linux (WSL2 on Windows, Ubuntu-based).

Browser (if applicable)

Tested on Chrome and Firefox. Behavior is frontend-driven and browser-independent.

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When multiple files are drag-dropped onto a Knowledge Collection simultaneously,
all files should eventually be processed and linked to the collection. The UI should
either wait for each file to finish processing before linking, or link them
automatically once processing completes.

Actual Behavior

When 5+ files are drag-dropped simultaneously, only the first 1–2 files typically
appear in the Knowledge Collection after processing. The remaining files are uploaded
and physically processed by docling (embeddings are created in the vector store), but
they are never linked to the collection — they become permanently invisible.

No error is shown to the user. The upload progress indicator shows success for all
files.

Steps to Reproduce

Prerequisites:

  • Open WebUI with an external document processor (docling-serve) that takes

    10 seconds to process a file

  • A Knowledge Collection

Steps:

  1. Navigate to Workspace → Knowledge → open a Knowledge Collection
  2. Select 5+ PDF files in your file manager
  3. Drag-drop all files onto the Knowledge Collection simultaneously (single drag)
  4. Observe: all files show an upload progress bar and complete without error
  5. Wait for all docling processing to finish — monitor via:
    docker logs -f docling-serve
    
    Wait until you see completion messages for all files.
  6. Check the SQLite DB:
    sqlite3 open-webui-data/webui.db \
      "SELECT json_extract(data,'$.status') as status, COUNT(*) FROM file GROUP BY status"
    
    You will see many completed entries.
  7. Check how many are linked to the collection:
    sqlite3 open-webui-data/webui.db \
      "SELECT COUNT(*) FROM knowledge_file WHERE knowledge_id='<your-collection-id>'"
    
  8. Result: The count in step 7 is much lower than the completed count in step 6.
    Reload the Knowledge Collection in the UI — missing files confirm the discrepancy.

What is happening internally:

The frontend fires POST /api/v1/knowledge/{id}/file/add immediately after
POST /api/v1/files/ returns for each file — without waiting for docling to
finish
. At that point, file.data is null or {"status": "pending"}.

The backend endpoint checks (source: routers/knowledge.py ~L437):

if not file.data:
    raise HTTPException(
        status_code=status.HTTP_400_BAD_REQUEST,
        detail=ERROR_MESSAGES.FILE_NOT_PROCESSED,
        # "Extracted content is not available for this file.
        #  Please ensure that the file is processed before proceeding."
    )

The HTTP 400 is returned and silently ignored by the frontend. The knowledge_file
row is never created. When docling later finishes and sets data.status = "completed",
the embeddings exist in the vector store but no knowledge_file record ever gets
created — the file is permanently orphaned.

Logs & Screenshots

Docker container logs (docker logs open-webui 2>&1 | grep "file/add"):

POST /api/v1/knowledge/<id>/file/add HTTP/1.1" 400
{"detail":"Extracted content is not available for this file. Please ensure that the file is processed before proceeding."}

This appears once per dropped file (except for the 1–2 files lucky enough to have
been fully processed before the file/add call).

SQLite state confirms orphaned files:

-- Find completed files not linked to the collection
SELECT f.id, f.filename
FROM file f
WHERE json_extract(f.data, '$.status') = 'completed'
  AND f.id NOT IN (
    SELECT file_id FROM knowledge_file
    WHERE knowledge_id = '<your-collection-id>'
  );

Additional Information

Root cause (code):

  • routers/knowledge.py ~L407: add_file_to_knowledge_by_id — the if not file.data
    guard correctly rejects unprocessed files, but there is no mechanism to retry or
    defer the linking once processing completes
  • constants.py ~L106: FILE_NOT_PROCESSED error message

No user-facing workaround exists. Files must be added one at a time and the
user must wait for each to fully process before dropping the next.

Suggested fixes (pick one):

  • Option 1 — Frontend polling (minimal backend change): After POST /api/v1/files/
    returns, poll GET /api/v1/files/{id} until data.status == "completed" before
    calling POST /knowledge/{id}/file/add.

  • Option 2 — Backend deferred linking: When file/add is called but the file
    is still pending, enqueue the link creation as a background task that fires once
    the file transitions to completed.

  • Option 3 — Sync endpoint: Add POST /api/v1/knowledge/{id}/sync that finds all
    completed files not yet in knowledge_file for the collection and links them
    atomically — callable from the UI after a bulk upload session.

Originally created by @jannefleischer on GitHub (Mar 11, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/22573 ## Disclaimer: The analysis of this issue has main been done by gpt5-mini and claude sonnet 4.6 :/ ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version `main` branch, tested March 2026 ### Ollama Version (if applicable) Not directly relevant, but Ollama used as embedding provider. 0.17.7 ### Operating System Linux (WSL2 on Windows, Ubuntu-based). ### Browser (if applicable) Tested on Chrome and Firefox. Behavior is frontend-driven and browser-independent. ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When multiple files are drag-dropped onto a Knowledge Collection simultaneously, all files should eventually be processed and linked to the collection. The UI should either wait for each file to finish processing before linking, or link them automatically once processing completes. ### Actual Behavior When 5+ files are drag-dropped simultaneously, only the first 1–2 files typically appear in the Knowledge Collection after processing. The remaining files are uploaded and physically processed by docling (embeddings are created in the vector store), but they are **never linked** to the collection — they become permanently invisible. No error is shown to the user. The upload progress indicator shows success for all files. ### Steps to Reproduce **Prerequisites:** - Open WebUI with an external document processor (docling-serve) that takes >10 seconds to process a file - A Knowledge Collection **Steps:** 1. Navigate to Workspace → Knowledge → open a Knowledge Collection 2. Select 5+ PDF files in your file manager 3. Drag-drop all files onto the Knowledge Collection **simultaneously** (single drag) 4. Observe: all files show an upload progress bar and complete without error 5. Wait for all docling processing to finish — monitor via: ```bash docker logs -f docling-serve ``` Wait until you see completion messages for all files. 6. Check the SQLite DB: ```bash sqlite3 open-webui-data/webui.db \ "SELECT json_extract(data,'$.status') as status, COUNT(*) FROM file GROUP BY status" ``` You will see many `completed` entries. 7. Check how many are linked to the collection: ```bash sqlite3 open-webui-data/webui.db \ "SELECT COUNT(*) FROM knowledge_file WHERE knowledge_id='<your-collection-id>'" ``` 8. **Result:** The count in step 7 is much lower than the `completed` count in step 6. Reload the Knowledge Collection in the UI — missing files confirm the discrepancy. **What is happening internally:** The frontend fires `POST /api/v1/knowledge/{id}/file/add` immediately after `POST /api/v1/files/` returns for each file — **without waiting for docling to finish**. At that point, `file.data` is `null` or `{"status": "pending"}`. The backend endpoint checks (source: `routers/knowledge.py` ~L437): ```python if not file.data: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.FILE_NOT_PROCESSED, # "Extracted content is not available for this file. # Please ensure that the file is processed before proceeding." ) ``` The HTTP 400 is returned and silently ignored by the frontend. The `knowledge_file` row is never created. When docling later finishes and sets `data.status = "completed"`, the embeddings exist in the vector store but no `knowledge_file` record ever gets created — the file is permanently orphaned. ### Logs & Screenshots Docker container logs (`docker logs open-webui 2>&1 | grep "file/add"`): ``` POST /api/v1/knowledge/<id>/file/add HTTP/1.1" 400 {"detail":"Extracted content is not available for this file. Please ensure that the file is processed before proceeding."} ``` This appears once per dropped file (except for the 1–2 files lucky enough to have been fully processed before the `file/add` call). SQLite state confirms orphaned files: ```sql -- Find completed files not linked to the collection SELECT f.id, f.filename FROM file f WHERE json_extract(f.data, '$.status') = 'completed' AND f.id NOT IN ( SELECT file_id FROM knowledge_file WHERE knowledge_id = '<your-collection-id>' ); ``` ### Additional Information **Root cause (code):** - `routers/knowledge.py` ~L407: `add_file_to_knowledge_by_id` — the `if not file.data` guard correctly rejects unprocessed files, but there is no mechanism to retry or defer the linking once processing completes - `constants.py` ~L106: `FILE_NOT_PROCESSED` error message **No user-facing workaround exists.** Files must be added one at a time and the user must wait for each to fully process before dropping the next. **Suggested fixes (pick one):** - **Option 1 — Frontend polling** (minimal backend change): After `POST /api/v1/files/` returns, poll `GET /api/v1/files/{id}` until `data.status == "completed"` before calling `POST /knowledge/{id}/file/add`. - **Option 2 — Backend deferred linking**: When `file/add` is called but the file is still `pending`, enqueue the link creation as a background task that fires once the file transitions to `completed`. - **Option 3 — Sync endpoint**: Add `POST /api/v1/knowledge/{id}/sync` that finds all `completed` files not yet in `knowledge_file` for the collection and links them atomically — callable from the UI after a bulk upload session.
GiteaMirror added the bug label 2026-05-05 23:07:14 -05:00
Author
Owner

@jannefleischer commented on GitHub (Mar 11, 2026):

Sorry, I am not that deep into the actual development of openwebui, so I needed to rely on an LLM-driven analysis of my issue. The above is the result of that prompting (in github copilot). I have had Claude write me a workaround-script for the processed, but unlinked files, in case anyone wants to use this.

relink_processed_files_to_knowledge.py

<!-- gh-comment-id:4037787121 --> @jannefleischer commented on GitHub (Mar 11, 2026): Sorry, I am not that deep into the actual development of openwebui, so I needed to rely on an LLM-driven analysis of my issue. The above is the result of that prompting (in github copilot). I have had Claude write me a workaround-script for the processed, but unlinked files, in case anyone wants to use this. [relink_processed_files_to_knowledge.py](https://github.com/user-attachments/files/25896935/relink_processed_files_to_knowledge.py)
Author
Owner

@jannefleischer commented on GitHub (Mar 11, 2026):

Also disclaimer: I did the checking of related Issues with gpt5-mini, it could only process a small number of issues, I hope I haven't missed a crucial one, when sweeping over the results on my own.

this is from copilot:


What I did

Searched open-webui/open-webui issues for terms in your report (docling, knowledge, EMPTY_CONTENT, FILE_NOT_PROCESSED, Ollama, 503, IndexError).
Reviewed the returned issues and compared them to the two race conditions you described (A: Ollama 503 → IndexError → silent drop; B: frontend calls file/add before docling finished → EMPTY_CONTENT and missing knowledge links).
Note: the GitHub search API returned 101 matching issues in total, but the API response is limited to 30 results. I was able to fetch the first page (30 results). You can view the full search in GitHub here: https://github.com/search?q=is%3Aissue+is%3Aopen+repo%3Aopen-webui%2Fopen-webui+docling+OR+knowledge+OR+EMPTY_CONTENT+OR+FILE_NOT_PROCESSED+OR+Ollama+503+OR+%22IndexError%3A+list+index+out+of+range%22&type=Issues

Quick answer

Parts of the problems you reported are covered by existing issues, but I did NOT find an issue that exactly matches the two-race-conditions scenario you documented (especially the Ollama 503 → IndexError → silent drop behavior).
Related/overlapping issues exist for:
    batch add / missing linking / duplicates / sync problems for knowledge collections,
    general docling/large-file/knowledge UI sync problems,
    various Ollama integration bugs (but none that explicitly describe model-unload-while-embedding → 503 → IndexError causing silent file drop).
Recommendation: open a new issue (or discussion) that links to your bugreport_race_condition_knowledge.md, references the related issues below, and calls out the two distinct root causes and your suggested fixes/workarounds. That will make it actionable for maintainers.

Most relevant issues I found (first page / sample)

issue #10679 — "Batch add file to knowledge doesn't check for existence"
https://github.com/open-webui/open-webui/issues/10679
Why relevant: batch add endpoint issues and duplicate handling — overlaps with your Bug B (batch behavior / batch endpoints not handling duplicates / processing state).
issue #19190 — "issue: Synchronization with Folder Leads to Empty Knowledge Base"
https://github.com/open-webui/open-webui/issues/19190
Why relevant: UI shows empty knowledge base after interrupted sync; describes backend/frontend state mismatch — conceptually close to your Bug B where processed files end up not linked and invisible.
issue #12087 — "issue: Browser Freezes When Pasting Large Content"
https://github.com/open-webui/open-webui/issues/12087
Why relevant: large-file processing / UI problems while docling processes large files — not identical but related operational context.
issue #10679 (again as relevant to batch add) — see above.
issue #14768 — "RAG_ALLOWED_FILE_EXTENSIONS is too rigid ..."
https://github.com/open-webui/open-webui/issues/14768
Why relevant: errors returned by add-to-knowledge flow for certain file types; indicates the file-processing / add-to-knowledge flow has corner cases.
issue #20896 — "Generation stops after tool call when routing Ollama through WebUI"
https://github.com/open-webui/open-webui/issues/20896
Why relevant: Ollama integration issues exist in the repo; helpful context but not the same 503 + silent-drop failure mode.
(There are many other open issues touching knowledge/sync/embedding/Ollama behavior — the search returned 101 results in total; the first page contains ~30 items. Use the search link above to see all.)

Interpretation / mapping to your two bugs

Bug B (EMPTY_CONTENT / file.add called before docling complete): Several existing issues describe knowledge sync, missing files in knowledge after sync, or batch endpoints misbehaving (e.g., #10679, #19190). Those issues overlap with the symptoms and the frontend/backend boundary you described, but I did not find a public issue that documents exactly the race condition where the frontend immediately calls POST /knowledge/{id}/file/add right after POST /files/ (before docling completes), returns 400 FILE_NOT_PROCESSED, and the file is later embedded but never linked. So while related work exists, your precise failure mode (leading to embeddings present but missing knowledge_link rows, and no UI error) appears to be unreported or at least not documented in one canonical issue.
Bug A (Ollama model unloaded → concurrent embedding requests → 503 → IndexError → silent drop): I did not find an issue that explicitly documents Ollama model TTL/unloading + parallel embedding batches causing 503 Service Unavailable responses which then lead to an IndexError and silent drop of the file. There are Ollama-related bugs, but not this exact chain (503 → IndexError → silent file drop in the knowledge pipeline). So this also appears to be new / not explicitly reported.

...

<!-- gh-comment-id:4037808009 --> @jannefleischer commented on GitHub (Mar 11, 2026): Also disclaimer: I did the checking of related Issues with gpt5-mini, it could only process a small number of issues, I hope I haven't missed a crucial one, when sweeping over the results on my own. ## this is from copilot: --- What I did Searched open-webui/open-webui issues for terms in your report (docling, knowledge, EMPTY_CONTENT, FILE_NOT_PROCESSED, Ollama, 503, IndexError). Reviewed the returned issues and compared them to the two race conditions you described (A: Ollama 503 → IndexError → silent drop; B: frontend calls file/add before docling finished → EMPTY_CONTENT and missing knowledge links). Note: the GitHub search API returned 101 matching issues in total, but the API response is limited to 30 results. I was able to fetch the first page (30 results). You can view the full search in GitHub here: https://github.com/search?q=is%3Aissue+is%3Aopen+repo%3Aopen-webui%2Fopen-webui+docling+OR+knowledge+OR+EMPTY_CONTENT+OR+FILE_NOT_PROCESSED+OR+Ollama+503+OR+%22IndexError%3A+list+index+out+of+range%22&type=Issues Quick answer Parts of the problems you reported are covered by existing issues, but I did NOT find an issue that exactly matches the two-race-conditions scenario you documented (especially the Ollama 503 → IndexError → silent drop behavior). Related/overlapping issues exist for: batch add / missing linking / duplicates / sync problems for knowledge collections, general docling/large-file/knowledge UI sync problems, various Ollama integration bugs (but none that explicitly describe model-unload-while-embedding → 503 → IndexError causing silent file drop). Recommendation: open a new issue (or discussion) that links to your bugreport_race_condition_knowledge.md, references the related issues below, and calls out the two distinct root causes and your suggested fixes/workarounds. That will make it actionable for maintainers. Most relevant issues I found (first page / sample) issue #10679 — "Batch add file to knowledge doesn't check for existence" https://github.com/open-webui/open-webui/issues/10679 Why relevant: batch add endpoint issues and duplicate handling — overlaps with your Bug B (batch behavior / batch endpoints not handling duplicates / processing state). issue #19190 — "issue: Synchronization with Folder Leads to Empty Knowledge Base" https://github.com/open-webui/open-webui/issues/19190 Why relevant: UI shows empty knowledge base after interrupted sync; describes backend/frontend state mismatch — conceptually close to your Bug B where processed files end up not linked and invisible. issue #12087 — "issue: Browser Freezes When Pasting Large Content" https://github.com/open-webui/open-webui/issues/12087 Why relevant: large-file processing / UI problems while docling processes large files — not identical but related operational context. issue #10679 (again as relevant to batch add) — see above. issue #14768 — "RAG_ALLOWED_FILE_EXTENSIONS is too rigid ..." https://github.com/open-webui/open-webui/issues/14768 Why relevant: errors returned by add-to-knowledge flow for certain file types; indicates the file-processing / add-to-knowledge flow has corner cases. issue #20896 — "Generation stops after tool call when routing Ollama through WebUI" https://github.com/open-webui/open-webui/issues/20896 Why relevant: Ollama integration issues exist in the repo; helpful context but not the same 503 + silent-drop failure mode. (There are many other open issues touching knowledge/sync/embedding/Ollama behavior — the search returned 101 results in total; the first page contains ~30 items. Use the search link above to see all.) Interpretation / mapping to your two bugs Bug B (EMPTY_CONTENT / file.add called before docling complete): Several existing issues describe knowledge sync, missing files in knowledge after sync, or batch endpoints misbehaving (e.g., #10679, #19190). Those issues overlap with the symptoms and the frontend/backend boundary you described, but I did not find a public issue that documents exactly the race condition where the frontend immediately calls POST /knowledge/{id}/file/add right after POST /files/ (before docling completes), returns 400 FILE_NOT_PROCESSED, and the file is later embedded but never linked. So while related work exists, your precise failure mode (leading to embeddings present but missing knowledge_link rows, and no UI error) appears to be unreported or at least not documented in one canonical issue. Bug A (Ollama model unloaded → concurrent embedding requests → 503 → IndexError → silent drop): I did not find an issue that explicitly documents Ollama model TTL/unloading + parallel embedding batches causing 503 Service Unavailable responses which then lead to an IndexError and silent drop of the file. There are Ollama-related bugs, but not this exact chain (503 → IndexError → silent file drop in the knowledge pipeline). So this also appears to be new / not explicitly reported. ...
Author
Owner

@jannefleischer commented on GitHub (Mar 11, 2026):

I realise now, that DOCLING_SERVE_MAX_SYNC_WAIT=2400 for docling-serve is a good heal for this issue; still if I close the browser window early, I end up with unlinked results...

Nevertheless, I will close this for know.

<!-- gh-comment-id:4039771279 --> @jannefleischer commented on GitHub (Mar 11, 2026): I realise now, that DOCLING_SERVE_MAX_SYNC_WAIT=2400 for docling-serve is a good heal for this issue; still if I close the browser window early, I end up with unlinked results... Nevertheless, I will close this for know.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58419