mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #18689] issue: Knowledge file access inconsistency — some files not accessible due to mismatched collection_name #34202
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @acwoo97 on GitHub (Oct 28, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18689
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
0.6.30
Ollama Version (if applicable)
No response
Operating System
Mac
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
All files added to a Knowledge Base with restricted group access should be retrievable when accessed by authorized users.
Actual Behavior
Some files within the same Knowledge Base cannot be accessed, even though group permissions are correctly configured.
Upon debugging, inaccessible files have collection_name values in the form of file-{uuid}, while accessible files correctly use the Knowledge Base’s ID as their collection name.
Steps to Reproduce
⚠️ Note: This issue occurs intermittently and may not always be reproducible.
Create a Knowledge Base with restricted access (specific user group).
Upload multiple files to the Knowledge Base using the UI
Observe that some files are accessible, while others intermittently return “not found” or permission errors, even though access configuration is correct.
Check the database entries for the affected files — their collection_name values are file-{uuid} instead of the Knowledge Base ID.
Attempt to retrieve these files through the /api/v1/knowledge/{id}/file endpoint.
Only files with a collection_name that exactly matches the Knowledge Base ID are accessible.
Logs & Screenshots
The issue does not affect all users.
Admin users can access all files in the Knowledge Base without any errors.
For non-admin users with proper group permissions, the issue occurs only for specific files, not for all files in the same Knowledge Base.
When affected, those users see “Not found” or a permission-related error when trying to open the file, even though they have valid access rights.
Additional Information
From reviewing the codebase and runtime behavior, my current hypothesis (not yet confirmed) is as follows:
Files are uploaded first via POST /api/v1/files/, followed by a separate call to /api/v1/knowledge/{id}/file/add to associate them with a Knowledge Base.
The upload endpoint initiates vector database operations asynchronously in a background task.
If the Knowledge Base association (knowledge.add) completes before the background task finishes, the async job may overwrite the collection_name field with the default file-{uuid} format.
As a result, when the Knowledge Base later checks access permissions, it compares the file’s collection_name (which still has the file- prefix) against the Knowledge Base ID. This mismatch likely causes the file to appear inaccessible despite correct group permissions.
This is still a hypothesis based on observation and code tracing, and the exact sequence or locking behavior may differ depending on task timing.
@rgaricano commented on GitHub (Oct 28, 2025):
I have to recheck this situation (i'm working in a PR that is related), but I suspect that it's is due to duplicated files added in more than one collection, then the duplicate file prevention and different user/collection permission give that behaviour.
@acwoo97 commented on GitHub (Oct 28, 2025):
@rgaricano
Thanks for checking this out!
For now, to get things working quickly, I’ve patched the permission check logic on my side — in addition to verifying knowledge_id matching, I also check whether the file’s collection_name (after removing the file- prefix, if present) exists in the Knowledge Base’s list of associated IDs.
This workaround seems to resolve the access issue temporarily.
I’ll keep digging into the root cause by tracing the code more closely, especially around how the async upload and knowledge association tasks interact.
@rgaricano commented on GitHub (Oct 28, 2025):
@acwoo97
I did a quick check and the problem seem that came because in the file metadata is only stored the last collection where the file was uploaded (if is already uploaded it is added to collection and the file metadata is updated).
When a file is added to multiple Knowledge Bases, its
file.metaonly stores the lastcollection_nameit was added to. This creates a critical access control issue becausehas_access_to_file()uses this single collection_name to determine permissions.During RAG retrieving documents operations, the system does check Knowledge Base permissions correctly through
get_sources_from_items(). It verifies that the user has access to the Knowledge Base being queried before retrieving documents.The problem arise when trying to access to this file through endpoints like
/api/v1/files/{id}, which rely on the single collection_name metadata field.Some ways to solve it:
Any of those is "easy" to patch, its need a more deep implement, but seem that the most robust is the option 2 (also it implementation is more complex) :
knowledge.data.file_idsarrays.If you can try, I left the junction table implementation for reference:
Draft of Implementation of Option 2 (Junction Table) to fix the multi-collection file access control issue.
Step 1: Create the Junction Table Model
First, create a new model file
backend/open_webui/models/file_knowledge.py:Step 2: Update
add_file_to_knowledge_by_id()Modify
backend/open_webui/routers/knowledge.pyto insert junction records:Step 3: Update
remove_file_from_knowledge_by_id()Modify the removal logic to delete junction records and only delete files when no longer referenced:
Step 4: Update
has_access_to_file()Rewrite the access control check in
backend/open_webui/routers/files.py:Step 5: Update
reset_knowledge_by_id()Clean up junction records when resetting a Knowledge Base:
Step 6: Database Migration Script
Step 7: Update Knowledge Base Deletion
Modify the delete endpoint to clean up junction records:
Step 8: Update Batch File Processing
Modify the batch processing endpoint to insert junction records:
Step 9: Testing the Implementation
After implementing these changes, test the following scenarios:
knowledge.data.file_idsare properly migrated to the junction tablefile.meta.collection_nameworks for unmigrated dataNotes
This implementation provides a complete solution for the multi-collection file access control issue. The junction table approach properly models the many-to-many relationship between files and Knowledge Bases, ensuring that access control checks work correctly regardless of which Knowledge Base a file was most recently added to.
The migration script populates the junction table from existing
knowledge.data.file_idsarrays, maintaining backward compatibility. The foreign key constraints withondelete='CASCADE'ensure that junction records are automatically cleaned up when either a file or Knowledge Base is deleted.@acwoo97 commented on GitHub (Oct 29, 2025):
@rgaricano
Thanks again for taking the time to look into this and for explaining everything in detail!
I’ll go through your suggestions and verify whether the behavior you described matches what I’m seeing on my side.
I might have misunderstood some parts of the flow — I actually didn’t realize that a file could be added to multiple Knowledge Bases. From what I observed, each file upload through the UI seemed to generate a new UUID, and the frontend used that UUID to call the /knowledge/add API. So I assumed even identical files were always created as new entries. I’ll review this part again to confirm how it really works.
Also, one interesting thing I noticed during debugging:
for the files that fail to load, not only does their metadata contain a collection_name with the file-{uuid} prefix, but that same UUID also appears in the Knowledge Base’s file_ids list.
That’s why my temporary workaround (as I mentioned earlier) — checking whether the file’s UUID exists in the Knowledge Base’s file_ids list — actually resolves the issue for now.
Lastly, it seems my earlier hypothesis about a possible race condition between the upload and knowledge-add APIs was incorrect. After reviewing the frontend code, I noticed it waits for the upload status before proceeding, so it’s unlikely to be a timing issue.
@acwoo97 commented on GitHub (Oct 29, 2025):
think there might be another possible cause as well.
When calling knowledge/add, the process involves process_file, which internally calls Files.update_file_metadata_by_id().
If that function raises an exception, it only logs the error and simply returns None without propagating it.
So, if that step fails silently, the file would just keep its original file-{uuid} collection name from the initial upload.
I intentionally triggered a failure in that part, and it seems to reproduce the same behavior.
@athoik commented on GitHub (Nov 27, 2025):
Hi,
A simple fix also created here: https://github.com/open-webui/open-webui/pull/19523
It checks if file id exists in kb file ids.
Become draft due to https://github.com/open-webui/open-webui/pull/19278
Would be nice to consider code in this thread applied on #19278
@Classic298 commented on GitHub (Nov 27, 2025):
the knowledge file table migration is probably the blocker here
@tjbck commented on GitHub (Dec 2, 2025):
Should be addressed in dev with
9f6c91987f