mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #22573] issue: Files Not Linked to Knowledge Collection When Dropped Simultaneously (FILE_NOT_PROCESSED Race Condition)
#35282
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jannefleischer on GitHub (Mar 11, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22573
Disclaimer: The analysis of this issue has main been done by gpt5-mini and claude sonnet 4.6 :/
Check Existing Issues
Installation Method
Docker
Open WebUI Version
mainbranch, tested March 2026Ollama Version (if applicable)
Not directly relevant, but Ollama used as embedding provider. 0.17.7
Operating System
Linux (WSL2 on Windows, Ubuntu-based).
Browser (if applicable)
Tested on Chrome and Firefox. Behavior is frontend-driven and browser-independent.
Confirmation
README.md.Expected Behavior
When multiple files are drag-dropped onto a Knowledge Collection simultaneously,
all files should eventually be processed and linked to the collection. The UI should
either wait for each file to finish processing before linking, or link them
automatically once processing completes.
Actual Behavior
When 5+ files are drag-dropped simultaneously, only the first 1–2 files typically
appear in the Knowledge Collection after processing. The remaining files are uploaded
and physically processed by docling (embeddings are created in the vector store), but
they are never linked to the collection — they become permanently invisible.
No error is shown to the user. The upload progress indicator shows success for all
files.
Steps to Reproduce
Prerequisites:
Steps:
completedentries.completedcount in step 6.Reload the Knowledge Collection in the UI — missing files confirm the discrepancy.
What is happening internally:
The frontend fires
POST /api/v1/knowledge/{id}/file/addimmediately afterPOST /api/v1/files/returns for each file — without waiting for docling tofinish. At that point,
file.dataisnullor{"status": "pending"}.The backend endpoint checks (source:
routers/knowledge.py~L437):The HTTP 400 is returned and silently ignored by the frontend. The
knowledge_filerow is never created. When docling later finishes and sets
data.status = "completed",the embeddings exist in the vector store but no
knowledge_filerecord ever getscreated — the file is permanently orphaned.
Logs & Screenshots
Docker container logs (
docker logs open-webui 2>&1 | grep "file/add"):This appears once per dropped file (except for the 1–2 files lucky enough to have
been fully processed before the
file/addcall).SQLite state confirms orphaned files:
Additional Information
Root cause (code):
routers/knowledge.py~L407:add_file_to_knowledge_by_id— theif not file.dataguard correctly rejects unprocessed files, but there is no mechanism to retry or
defer the linking once processing completes
constants.py~L106:FILE_NOT_PROCESSEDerror messageNo user-facing workaround exists. Files must be added one at a time and the
user must wait for each to fully process before dropping the next.
Suggested fixes (pick one):
Option 1 — Frontend polling (minimal backend change): After
POST /api/v1/files/returns, poll
GET /api/v1/files/{id}untildata.status == "completed"beforecalling
POST /knowledge/{id}/file/add.Option 2 — Backend deferred linking: When
file/addis called but the fileis still
pending, enqueue the link creation as a background task that fires oncethe file transitions to
completed.Option 3 — Sync endpoint: Add
POST /api/v1/knowledge/{id}/syncthat finds allcompletedfiles not yet inknowledge_filefor the collection and links thematomically — callable from the UI after a bulk upload session.
@jannefleischer commented on GitHub (Mar 11, 2026):
Sorry, I am not that deep into the actual development of openwebui, so I needed to rely on an LLM-driven analysis of my issue. The above is the result of that prompting (in github copilot). I have had Claude write me a workaround-script for the processed, but unlinked files, in case anyone wants to use this.
relink_processed_files_to_knowledge.py
@jannefleischer commented on GitHub (Mar 11, 2026):
Also disclaimer: I did the checking of related Issues with gpt5-mini, it could only process a small number of issues, I hope I haven't missed a crucial one, when sweeping over the results on my own.
this is from copilot:
What I did
Quick answer
Most relevant issues I found (first page / sample)
Interpretation / mapping to your two bugs
...
@jannefleischer commented on GitHub (Mar 11, 2026):
I realise now, that DOCLING_SERVE_MAX_SYNC_WAIT=2400 for docling-serve is a good heal for this issue; still if I close the browser window early, I end up with unlinked results...
Nevertheless, I will close this for know.