Bug: Model fails to detect files in a private collection #6358

Closed
opened 2025-11-11 16:52:31 -06:00 by GiteaMirror · 8 comments
Owner

Originally created by @chayaziv on GitHub (Sep 10, 2025).

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.6.27

Ollama Version (if applicable)

No response

Operating System

Windows 10

Browser (if applicable)

Chrome 100.0

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

  • Users should be able to ask questions about content in their private collection

  • When using #collection_name command or when the model is linked to the collection, the model should have access to the content

  • Questions like "How many sections are in the file?" should work with private collection

  • The model should be able to retrieve and reference content from private collection owned by the user

Actual Behavior

  • The model cannot find any relevant information from private collection

  • Questions about file content return no results

  • The system only works for:

    • Admin users

    • Public Collection

    • Collection where the user is a member of a group that has access permissions

Steps to Reproduce

  1. Create a new collection with USER permission level

  2. Set it to Private

  3. Upload files to the collection

  4. Attach the collection to the chat using the “#” selector or create a model and link it to the collection

  5. Try to ask questions about the content in chat:

    • "How many sections are in the file?"
  6. The model will not find any relevant information

Logs & Screenshots

Additional Information

Bug Description

When a user creates a private collection and tries to use it in chat, the model cannot access the content from the collection, even though:

  1. The user is the owner of the collection

  2. The collection is visible in the knowledge tab

  3. The collection is properly attached to the model

Root Cause Analysis

The issue is in backend/open_webui/retrieval/utils.py at lines 582-585:


if knowledge_base and (

    user.role == "admin"

    or has_access(user.id, "read", knowledge_base.access_control)  # Missing owner check

):

The has_access function doesn't include the owner in the permitted users list for private knowledge bases.

Comparison with Working Code

In backend/open_webui/models/knowledge.py (lines 171-174), the correct pattern is used:


if knowledge_base.user_id == user_id  # Owner check

or has_access(user_id, permission, knowledge_base.access_control)

Proposed Fix

Add the missing owner check in backend/open_webui/retrieval/utils.py:


if knowledge_base and (

    user.role == "admin"

    or knowledge_base.user_id == user.id  # Add this line

    or has_access(user.id, "read", knowledge_base.access_control)

):

Originally created by @chayaziv on GitHub (Sep 10, 2025). ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.6.27 ### Ollama Version (if applicable) _No response_ ### Operating System Windows 10 ### Browser (if applicable) Chrome 100.0 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior - Users should be able to ask questions about content in **their** private collection - When using `#collection_name` command or when the model is linked to the collection, the model should have access to the content - Questions like "How many sections are in the file?" should work with private collection - The model should be able to retrieve and reference content from private collection owned by the user ### Actual Behavior - The model cannot find any relevant information from private collection - Questions about file content return no results - The system only works for: - Admin users - Public Collection - Collection where the user is a member of a group that has access permissions ### Steps to Reproduce 1. Create a new collection with **USER** permission level 2. Set it to **Private** 3. Upload files to the collection 4. Attach the collection to the chat using the “#” selector or create a model and link it to the collection 5. Try to ask questions about the content in chat: - "How many sections are in the file?" 6. The model will not find any relevant information ### Logs & Screenshots <!-- Failed to upload "Screenshot_10-9-2025_122346.jpeg" --> ### Additional Information ## Bug Description When a user creates a private collection and tries to use it in chat, the model cannot access the content from the collection, even though: 1. The user is the owner of the collection 2. The collection is visible in the **knowledge tab** 3. The collection is properly attached to the model ## Root Cause Analysis The issue is in `backend/open_webui/retrieval/utils.py` at lines 582-585: ```python if knowledge_base and ( user.role == "admin" or has_access(user.id, "read", knowledge_base.access_control) # Missing owner check ): ``` The `has_access` function doesn't include the owner in the permitted users list for private knowledge bases. ## Comparison with Working Code In `backend/open_webui/models/knowledge.py` (lines 171-174), the correct pattern is used: ```python if knowledge_base.user_id == user_id # Owner check or has_access(user_id, permission, knowledge_base.access_control) ``` ## Proposed Fix Add the missing owner check in `backend/open_webui/retrieval/utils.py`: ```python if knowledge_base and ( user.role == "admin" or knowledge_base.user_id == user.id # Add this line or has_access(user.id, "read", knowledge_base.access_control) ): ```
GiteaMirror added the bug label 2025-11-11 16:52:31 -06:00
Author
Owner

@tjbck commented on GitHub (Sep 10, 2025):

How many sections are in the file? will not work as metadata isn't injected in the context.

@tjbck commented on GitHub (Sep 10, 2025): `How many sections are in the file?` will not work as metadata isn't injected in the context.
Author
Owner

@chayaziv commented on GitHub (Sep 11, 2025):

Hi, just a quick clarification

The example question "How many sections are in the file?" was not the best choice. But the bug itself is still valid, regardless of the exact phrasing.

The core problem is that the model has no access to the file content in a private collection (even when the user is the owner). As a result, it always responds with no information or says the data is missing.

Instead of the example question, take any other question, for example: "Which design library should be used according to the requirements in the file?"

React Homework Assignment Requirements.pdf

I would be happy to see the issue reopened.

@chayaziv commented on GitHub (Sep 11, 2025): Hi, just a quick clarification The example question "How many sections are in the file?" was not the best choice. But the bug itself is still valid, regardless of the exact phrasing. The core problem is that the model has no access to the file content in a private collection (even when the user is the owner). As a result, it always responds with no information or says the data is missing. Instead of the example question, take any other question, for example: "Which design library should be used according to the requirements in the file?" [React Homework Assignment Requirements.pdf](https://github.com/user-attachments/files/22281843/React.Homework.Assignment.Requirements.pdf) I would be happy to see the issue reopened.
Author
Owner

@rgaricano commented on GitHub (Sep 11, 2025):

as user & as admin:
(added file to a collection with private access, added collection with # command)

Image
Image

Image

Image

Image

@rgaricano commented on GitHub (Sep 11, 2025): as user & as admin: (added file to a collection with private access, added collection with # command) ![Image](https://github.com/user-attachments/assets/a9c79684-bbfd-4f44-8500-00805902fd9a) ![Image](https://github.com/user-attachments/assets/47071c12-6673-4de7-a81d-34d7edb47945) ![Image](https://github.com/user-attachments/assets/470a36a1-93ac-43d2-833a-35fccd8bda80) ![Image](https://github.com/user-attachments/assets/f755bcbe-1f62-4077-9ae5-b6778bdb8d1a) ![Image](https://github.com/user-attachments/assets/55ed623b-e460-4c44-b890-772a0f8a7139)
Author
Owner

@chayaziv commented on GitHub (Sep 11, 2025):

If you define a group for the collection it should really work because defining a group includes the creator

I'm talking about:

  • The collection is private.
  • It is created and used with user permission
  • No group is defined for the collection.
@chayaziv commented on GitHub (Sep 11, 2025): If you define a group for the collection it should really work because defining a group **includes the creator** I'm talking about: - The collection is private. - It is created and **used** with user permission - **No group** is defined for the collection.
Author
Owner

@rgaricano commented on GitHub (Sep 11, 2025):

I removed the group,

Image

same results, for user & admin....also tested creating the collection as private & with no group from the beginning.

@rgaricano commented on GitHub (Sep 11, 2025): I removed the group, ![Image](https://github.com/user-attachments/assets/7199f409-7010-4ffb-9207-d8b55e78bd4a) same results, for user & admin....also tested creating the collection as private & with no group from the beginning.
Author
Owner

@chayaziv commented on GitHub (Sep 11, 2025):

I need to check how this makes sense,
because I had the problem, and after I added the missing line in the code, everything worked!
Maybe I have a different issue.
I need to investigate, and it will take me a few days.
But in the meantime, how does this fit with what I see in the code, black on white, as I demonstrated above?
I really want to understand,
so I would appreciate an answer.

@chayaziv commented on GitHub (Sep 11, 2025): I need to check how this makes sense, because I had the problem, and after I added the missing line in the code, everything worked! Maybe I have a different issue. I need to investigate, and it will take me a few days. But in the meantime, how does this fit with what I see in the code, black on white, as I demonstrated above? I really want to understand, so I would appreciate an answer.
Author
Owner

@rgaricano commented on GitHub (Sep 11, 2025):

In backend/open_webui/retrieval/utils.py at lines 582-585:

Yes, is a check, but inside of a IF condition, in here it retrieve FULL FILE CONTENT, getting the full content directly from the file...

In
backend/open_webui/models/knowledge.py (lines 171-174):
is inside the get_knowledge_bases_by_user_id function.

@rgaricano commented on GitHub (Sep 11, 2025): In backend/open_webui/retrieval/utils.py at lines 582-585: Yes, is a check, but inside of a IF condition, in here it retrieve FULL FILE CONTENT, getting the full content directly from the file... In backend/open_webui/models/knowledge.py (lines 171-174): is inside the get_knowledge_bases_**by_user_id** function.
Author
Owner

@chayaziv commented on GitHub (Sep 12, 2025):

You're right that the condition (lines 582–585) is inside a condition on the environment variable
BYPASS_EMBEDDING_AND_RETRIEVAL.
And now I understand why it works for you but not for me, because on my side the variable BYPASS_EMBEDDING_AND_RETRIEVAL = TRUE.

In such a case, a problematic condition is checked where a line is missing.

Please change the variable to TRUE and then again:

  • Log in with USER permission

  • Create a private collection without a group

  • Add the attached file to the collection

  • Add the collection to the chat with #

  • And ask: "Which design library should be used according to the instructions in the file?"

I’m curious to hear what happens.

@chayaziv commented on GitHub (Sep 12, 2025): You're right that the condition (lines 582–585) is inside a condition on the environment variable BYPASS_EMBEDDING_AND_RETRIEVAL. And now I understand why it works for you but not for me, because on my side the variable BYPASS_EMBEDDING_AND_RETRIEVAL = **TRUE.** In such a case, a problematic condition is checked where a line is missing. Please **change the variable to TRUE** and then again: - Log in with USER permission - Create a private collection without a group - Add the attached file to the collection - Add the collection to the chat with # - And ask: "Which design library should be used according to the instructions in the file?" I’m curious to hear what happens.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#6358