[GH-ISSUE #18689] issue: Knowledge file access inconsistency — some files not accessible due to mismatched collection_name #57339

New Issue

GiteaMirror · 2026-05-05T20:51:59-05:00

GiteaMirror commented

2026-05-05 20:51:59 -05:00

Originally created by @acwoo97 on GitHub (Oct 28, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18689

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

0.6.30

Ollama Version (if applicable)

No response

Operating System

Mac

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

All files added to a Knowledge Base with restricted group access should be retrievable when accessed by authorized users.

Actual Behavior

Some files within the same Knowledge Base cannot be accessed, even though group permissions are correctly configured.
Upon debugging, inaccessible files have collection_name values in the form of file-{uuid}, while accessible files correctly use the Knowledge Base’s ID as their collection name.

Steps to Reproduce

⚠️ Note: This issue occurs intermittently and may not always be reproducible.

Create a Knowledge Base with restricted access (specific user group).
Upload multiple files to the Knowledge Base using the UI
Observe that some files are accessible, while others intermittently return “not found” or permission errors, even though access configuration is correct.
Check the database entries for the affected files — their collection_name values are file-{uuid} instead of the Knowledge Base ID.
Attempt to retrieve these files through the /api/v1/knowledge/{id}/file endpoint.
Only files with a collection_name that exactly matches the Knowledge Base ID are accessible.

Logs & Screenshots

The issue does not affect all users.
Admin users can access all files in the Knowledge Base without any errors.
For non-admin users with proper group permissions, the issue occurs only for specific files, not for all files in the same Knowledge Base.
When affected, those users see “Not found” or a permission-related error when trying to open the file, even though they have valid access rights.

Additional Information

From reviewing the codebase and runtime behavior, my current hypothesis (not yet confirmed) is as follows:

Files are uploaded first via POST /api/v1/files/, followed by a separate call to /api/v1/knowledge/{id}/file/add to associate them with a Knowledge Base.

The upload endpoint initiates vector database operations asynchronously in a background task.

If the Knowledge Base association (knowledge.add) completes before the background task finishes, the async job may overwrite the collection_name field with the default file-{uuid} format.

As a result, when the Knowledge Base later checks access permissions, it compares the file’s collection_name (which still has the file- prefix) against the Knowledge Base ID. This mismatch likely causes the file to appear inaccessible despite correct group permissions.

This is still a hypothesis based on observation and code tracing, and the exact sequence or locking behavior may differ depending on task timing.

Originally created by @acwoo97 on GitHub (Oct 28, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/18689 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [ ] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version 0.6.30 ### Ollama Version (if applicable) _No response_ ### Operating System Mac ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior All files added to a Knowledge Base with restricted group access should be retrievable when accessed by authorized users. ### Actual Behavior Some files within the same Knowledge Base cannot be accessed, even though group permissions are correctly configured. Upon debugging, inaccessible files have collection_name values in the form of file-{uuid}, while accessible files correctly use the Knowledge Base’s ID as their collection name. ### Steps to Reproduce ⚠️ Note: This issue occurs intermittently and may not always be reproducible. 1. Create a Knowledge Base with restricted access (specific user group). 2. Upload multiple files to the Knowledge Base using the UI 3. Observe that some files are accessible, while others intermittently return “not found” or permission errors, even though access configuration is correct. 4. Check the database entries for the affected files — their collection_name values are file-{uuid} instead of the Knowledge Base ID. 5. Attempt to retrieve these files through the /api/v1/knowledge/{id}/file endpoint. 6. Only files with a collection_name that exactly matches the Knowledge Base ID are accessible. ### Logs & Screenshots <img width="470" height="82" alt="Image" src="https://github.com/user-attachments/assets/5afbdb41-1e62-45f6-87ce-2c4428bef780" /> <img width="860" height="28" alt="Image" src="https://github.com/user-attachments/assets/d021b28b-3469-4cee-b10b-19009c3d2efd" /> - The issue does not affect all users. - Admin users can access all files in the Knowledge Base without any errors. - For non-admin users with proper group permissions, the issue occurs only for specific files, not for all files in the same Knowledge Base. - When affected, those users see “Not found” or a permission-related error when trying to open the file, even though they have valid access rights. ### Additional Information From reviewing the codebase and runtime behavior, my current hypothesis (not yet confirmed) is as follows: Files are uploaded first via POST /api/v1/files/, followed by a separate call to /api/v1/knowledge/{id}/file/add to associate them with a Knowledge Base. The upload endpoint initiates vector database operations asynchronously in a background task. If the Knowledge Base association (knowledge.add) completes before the background task finishes, the async job may overwrite the collection_name field with the default file-{uuid} format. As a result, when the Knowledge Base later checks access permissions, it compares the file’s collection_name (which still has the file- prefix) against the Knowledge Base ID. This mismatch likely causes the file to appear inaccessible despite correct group permissions. This is still a hypothesis based on observation and code tracing, and the exact sequence or locking behavior may differ depending on task timing.

GiteaMirror added the bug label 2026-05-05 20:51:59 -05:00

GiteaMirror closed this issue

2026-05-05 20:52:00 -05:00

GiteaMirror commented

2026-05-05 20:52:01 -05:00

@rgaricano commented on GitHub (Oct 28, 2025):

I have to recheck this situation (i'm working in a PR that is related), but I suspect that it's is due to duplicated files added in more than one collection, then the duplicate file prevention and different user/collection permission give that behaviour.

@rgaricano commented on GitHub (Oct 28, 2025): I have to recheck this situation (i'm working in a PR that is related), but I suspect that it's is due to duplicated files added in more than one collection, then the duplicate file prevention and different user/collection permission give that behaviour.

GiteaMirror commented

2026-05-05 20:52:03 -05:00

@acwoo97 commented on GitHub (Oct 28, 2025):

@rgaricano
Thanks for checking this out!

For now, to get things working quickly, I’ve patched the permission check logic on my side — in addition to verifying knowledge_id matching, I also check whether the file’s collection_name (after removing the file- prefix, if present) exists in the Knowledge Base’s list of associated IDs.

This workaround seems to resolve the access issue temporarily.
I’ll keep digging into the root cause by tracing the code more closely, especially around how the async upload and knowledge association tasks interact.

@acwoo97 commented on GitHub (Oct 28, 2025): @rgaricano Thanks for checking this out! For now, to get things working quickly, I’ve patched the permission check logic on my side — in addition to verifying knowledge_id matching, I also check whether the file’s collection_name (after removing the file- prefix, if present) exists in the Knowledge Base’s list of associated IDs. This workaround seems to resolve the access issue temporarily. I’ll keep digging into the root cause by tracing the code more closely, especially around how the async upload and knowledge association tasks interact.

GiteaMirror commented

2026-05-05 20:52:04 -05:00

@rgaricano commented on GitHub (Oct 28, 2025):

@acwoo97
I did a quick check and the problem seem that came because in the file metadata is only stored the last collection where the file was uploaded (if is already uploaded it is added to collection and the file metadata is updated).

When a file is added to multiple Knowledge Bases, its file.meta only stores the last collection_name it was added to. This creates a critical access control issue because has_access_to_file() uses this single collection_name to determine permissions.

During RAG retrieving documents operations, the system does check Knowledge Base permissions correctly through get_sources_from_items(). It verifies that the user has access to the Knowledge Base being queried before retrieving documents.
The problem arise when trying to access to this file through endpoints like /api/v1/files/{id}, which rely on the single collection_name metadata field.

Some ways to solve it:

Storing an array of collection_names in file metadata instead of a single value.
Creating a separate junction table to track file-to-Knowledge Base relationships.
Modifying has_access_to_file() to check all Knowledge Bases containing the file's content in the vector database.

Any of those is "easy" to patch, its need a more deep implement, but seem that the most robust is the option 2 (also it implementation is more complex) :

It properly models the many-to-many relationship.
Access checks are efficient SQL queries rather than vector DB lookups.
It maintains data integrity with foreign key constraints.
The migration path is clear: populate junction table from existing knowledge.data.file_ids arrays.

If you can try, I left the junction table implementation for reference:

Draft of Implementation of Option 2 (Junction Table) to fix the multi-collection file access control issue.

Step 1: Create the Junction Table Model

First, create a new model file backend/open_webui/models/file_knowledge.py:

import time
import uuid
from typing import Optional, List
from sqlalchemy import Column, String, BigInteger, ForeignKey, UniqueConstraint
from pydantic import BaseModel, ConfigDict

from open_webui.internal.db import Base, get_db

class FileKnowledge(Base):
    __tablename__ = "file_knowledge"

    id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
    file_id = Column(String, ForeignKey("file.id", ondelete="CASCADE"), nullable=False)
    knowledge_id = Column(String, ForeignKey("knowledge.id", ondelete="CASCADE"), nullable=False)
    created_at = Column(BigInteger)

    __table_args__ = (UniqueConstraint('file_id', 'knowledge_id', name='uix_file_knowledge'),)


class FileKnowledgeModel(BaseModel):
    model_config = ConfigDict(from_attributes=True)

    id: str
    file_id: str
    knowledge_id: str
    created_at: int


class FileKnowledgeTable:
    def insert_file_knowledge(
        self, file_id: str, knowledge_id: str
    ) -> Optional[FileKnowledgeModel]:
        with get_db() as db:
            # Check if relationship already exists
            existing = db.query(FileKnowledge).filter_by(
                file_id=file_id, knowledge_id=knowledge_id
            ).first()

            if existing:
                return FileKnowledgeModel.model_validate(existing)

            file_knowledge = FileKnowledgeModel(
                id=str(uuid.uuid4()),
                file_id=file_id,
                knowledge_id=knowledge_id,
                created_at=int(time.time())
            )

            try:
                result = FileKnowledge(**file_knowledge.model_dump())
                db.add(result)
                db.commit()
                db.refresh(result)
                return FileKnowledgeModel.model_validate(result)
            except Exception:
                return None

    def get_knowledge_ids_by_file_id(self, file_id: str) -> List[str]:
        with get_db() as db:
            file_knowledges = db.query(FileKnowledge).filter_by(file_id=file_id).all()
            return [fk.knowledge_id for fk in file_knowledges]

    def get_file_ids_by_knowledge_id(self, knowledge_id: str) -> List[str]:
        with get_db() as db:
            file_knowledges = db.query(FileKnowledge).filter_by(knowledge_id=knowledge_id).all()
            return [fk.file_id for fk in file_knowledges]

    def delete_file_knowledge(self, file_id: str, knowledge_id: str) -> bool:
        with get_db() as db:
            result = db.query(FileKnowledge).filter_by(
                file_id=file_id, knowledge_id=knowledge_id
            ).delete()
            db.commit()
            return result > 0

    def delete_by_file_id(self, file_id: str) -> bool:
        with get_db() as db:
            result = db.query(FileKnowledge).filter_by(file_id=file_id).delete()
            db.commit()
            return result > 0

    def delete_by_knowledge_id(self, knowledge_id: str) -> bool:
        with get_db() as db:
            result = db.query(FileKnowledge).filter_by(knowledge_id=knowledge_id).delete()
            db.commit()
            return result > 0


FileKnowledges = FileKnowledgeTable()

Step 2: Update `add_file_to_knowledge_by_id()`

Modify backend/open_webui/routers/knowledge.py to insert junction records:

# Add import at top of file
from open_webui.models.file_knowledge import FileKnowledges

# Update the function
@router.post("/{id}/file/add", response_model=Optional[KnowledgeFilesResponse])
def add_file_to_knowledge_by_id(
    request: Request,
    id: str,
    form_data: KnowledgeFileIdForm,
    user=Depends(get_verified_user),
):
    knowledge = Knowledges.get_knowledge_by_id(id=id)

    if not knowledge:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

    if (
        knowledge.user_id != user.id
        and not has_access(user.id, "write", knowledge.access_control)
        and user.role != "admin"
    ):
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.ACCESS_PROHIBITED,
        )

    file = Files.get_file_by_id(form_data.file_id)
    if not file:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )
    if not file.data:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.FILE_NOT_PROCESSED,
        )

    # Add content to the vector database
    try:
        process_file(
            request,
            ProcessFileForm(file_id=form_data.file_id, collection_name=id),
            user=user,
        )
    except Exception as e:
        log.debug(e)
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=str(e),
        )

    if knowledge:
        data = knowledge.data or {}
        file_ids = data.get("file_ids", [])

        if form_data.file_id not in file_ids:
            file_ids.append(form_data.file_id)
            data["file_ids"] = file_ids

            knowledge = Knowledges.update_knowledge_data_by_id(id=id, data=data)

            # NEW: Insert junction table record
            FileKnowledges.insert_file_knowledge(
                file_id=form_data.file_id,
                knowledge_id=id
            )

            if knowledge:
                files = Files.get_file_metadatas_by_ids(file_ids)

                return KnowledgeFilesResponse(
                    **knowledge.model_dump(),
                    files=files,
                )
            else:
                raise HTTPException(
                    status_code=status.HTTP_400_BAD_REQUEST,
                    detail=ERROR_MESSAGES.DEFAULT("knowledge"),
                )
        else:
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=ERROR_MESSAGES.DEFAULT("file_id"),
            )
    else:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

Step 3: Update `remove_file_from_knowledge_by_id()`

Modify the removal logic to delete junction records and only delete files when no longer referenced:

@router.post("/{id}/file/remove", response_model=Optional[KnowledgeFilesResponse])
def remove_file_from_knowledge_by_id(
    id: str,
    form_data: KnowledgeFileIdForm,
    delete_file: bool = Query(True),
    user=Depends(get_verified_user),
):
    knowledge = Knowledges.get_knowledge_by_id(id=id)
    if not knowledge:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

    if (
        knowledge.user_id != user.id
        and not has_access(user.id, "write", knowledge.access_control)
        and user.role != "admin"
    ):
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.ACCESS_PROHIBITED,
        )

    file = Files.get_file_by_id(form_data.file_id)
    if not file:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

    # Remove content from the vector database
    try:
        VECTOR_DB_CLIENT.delete(
            collection_name=knowledge.id, filter={"file_id": form_data.file_id}
        )
    except Exception as e:
        log.debug("This was most likely caused by bypassing embedding processing")
        log.debug(e)
        pass

    # NEW: Delete junction table record
    FileKnowledges.delete_file_knowledge(
        file_id=form_data.file_id,
        knowledge_id=id
    )

    # NEW: Only delete file if it's not referenced by any other Knowledge Base
    if delete_file:
        remaining_knowledge_ids = FileKnowledges.get_knowledge_ids_by_file_id(form_data.file_id)

        if not remaining_knowledge_ids:
            try:
                # Remove the file's collection from vector database
                file_collection = f"file-{form_data.file_id}"
                if VECTOR_DB_CLIENT.has_collection(collection_name=file_collection):
                    VECTOR_DB_CLIENT.delete_collection(collection_name=file_collection)
            except Exception as e:
                log.debug("This was most likely caused by bypassing embedding processing")
                log.debug(e)
                pass

            # Delete file from database
            Files.delete_file_by_id(form_data.file_id)

    if knowledge:
        data = knowledge.data or {}
        file_ids = data.get("file_ids", [])

        if form_data.file_id in file_ids:
            file_ids.remove(form_data.file_id)
            data["file_ids"] = file_ids

            knowledge = Knowledges.update_knowledge_data_by_id(id=id, data=data)

            if knowledge:
                files = Files.get_file_metadatas_by_ids(file_ids)

                return KnowledgeFilesResponse(
                    **knowledge.model_dump(),
                    files=files,
                )
            else:
                raise HTTPException(
                    status_code=status.HTTP_400_BAD_REQUEST,
                    detail=ERROR_MESSAGES.DEFAULT("knowledge"),
                )
        else:
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=ERROR_MESSAGES.DEFAULT("file_id"),
            )
    else:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

Step 4: Update `has_access_to_file()`

Rewrite the access control check in backend/open_webui/routers/files.py:

# Add import at top
from open_webui.models.file_knowledge import FileKnowledges

def has_access_to_file(
    file_id: Optional[str], access_type: str, user=Depends(get_verified_user)
) -> bool:
    file = Files.get_file_by_id(file_id)
    log.debug(f"Checking if user has {access_type} access to file")

    if not file:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

    # NEW: Get all Knowledge Bases this file belongs to via junction table
    knowledge_ids = FileKnowledges.get_knowledge_ids_by_file_id(file_id)

    if not knowledge_ids:
        # Fallback to old metadata-based check for backward compatibility
        knowledge_base_id = file.meta.get("collection_name") if file.meta else None
        if knowledge_base_id:
            knowledge_ids = [knowledge_base_id]
        else:
            return False

    # Check if user has access to any of these Knowledge Bases
    knowledge_bases = Knowledges.get_knowledge_bases_by_user_id(user.id, access_type)
    accessible_kb_ids = {kb.id for kb in knowledge_bases}

    has_access = any(kb_id in accessible_kb_ids for kb_id in knowledge_ids)

    return has_access

Step 5: Update `reset_knowledge_by_id()`

Clean up junction records when resetting a Knowledge Base:

@router.post("/{id}/reset", response_model=Optional[KnowledgeResponse])
async def reset_knowledge_by_id(id: str, user=Depends(get_verified_user)):
    knowledge = Knowledges.get_knowledge_by_id(id=id)
    if not knowledge:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

    if (
        knowledge.user_id != user.id
        and not has_access(user.id, "write", knowledge.access_control)
        and user.role != "admin"
    ):
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.ACCESS_PROHIBITED,
        )

    try:
        VECTOR_DB_CLIENT.delete_collection(collection_name=id)
    except Exception as e:
        log.debug(e)
        pass

    # NEW: Delete all junction records for this Knowledge Base
    FileKnowledges.delete_by_knowledge_id(id)

    knowledge = Knowledges.update_knowledge_data_by_id(id=id, data={"file_ids": []})

    return knowledge

Step 6: Database Migration Script

"""add file_knowledge junction table

Revision ID: xxxx
Revises: yyyy  # Replace with actual previous revision
Create Date: 2025-10-28

"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy import text
import time

revision = 'xxxx'
down_revision = 'yyyy'  # Replace with actual previous revision
branch_labels = None
depends_on = None


def upgrade():
    # Create the junction table
    op.create_table(
        'file_knowledge',
        sa.Column('id', sa.String(), nullable=False),
        sa.Column('file_id', sa.String(), nullable=False),
        sa.Column('knowledge_id', sa.String(), nullable=False),
        sa.Column('created_at', sa.BigInteger(), nullable=True),
        sa.ForeignKeyConstraint(['file_id'], ['file.id'], ondelete='CASCADE'),
        sa.ForeignKeyConstraint(['knowledge_id'], ['knowledge.id'], ondelete='CASCADE'),
        sa.PrimaryKeyConstraint('id'),
        sa.UniqueConstraint('file_id', 'knowledge_id', name='uix_file_knowledge')
    )

    # Migrate existing data from knowledge.data.file_ids to junction table
    conn = op.get_bind()

    # Get all knowledge bases with file_ids
    result = conn.execute(text("""
        SELECT id, data FROM knowledge WHERE data IS NOT NULL
    """))

    import json
    import uuid

    for row in result:
        knowledge_id = row[0]
        data = json.loads(row[1]) if isinstance(row[1], str) else row[1]

        if data and 'file_ids' in data:
            file_ids = data['file_ids']

            # Insert junction records for each file
            for file_id in file_ids:
                conn.execute(text("""
                    INSERT INTO file_knowledge (id, file_id, knowledge_id, created_at)
                    VALUES (:id, :file_id, :knowledge_id, :created_at)
                    ON CONFLICT (file_id, knowledge_id) DO NOTHING
                """), {
                    'id': str(uuid.uuid4()),
                    'file_id': file_id,
                    'knowledge_id': knowledge_id,
                    'created_at': int(time.time())
                })

    conn.commit()


def downgrade():
    # Drop the junction table
    op.drop_table('file_knowledge')

Step 7: Update Knowledge Base Deletion

Modify the delete endpoint to clean up junction records:

# In backend/open_webui/routers/knowledge.py, add to delete endpoint

@router.delete("/{id}/delete", response_model=bool)
async def delete_knowledge_by_id(id: str, user=Depends(get_verified_user)):
    knowledge = Knowledges.get_knowledge_by_id(id=id)
    if not knowledge:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

    if (
        knowledge.user_id != user.id
        and not has_access(user.id, "write", knowledge.access_control)
        and user.role != "admin"
    ):
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.ACCESS_PROHIBITED,
        )

    try:
        VECTOR_DB_CLIENT.delete_collection(collection_name=id)
    except Exception as e:
        log.debug(e)
        pass

    # NEW: Delete all junction records for this Knowledge Base
    FileKnowledges.delete_by_knowledge_id(id)

    result = Knowledges.delete_knowledge_by_id(id=id)
    return result

Step 8: Update Batch File Processing

Modify the batch processing endpoint to insert junction records:

# In backend/open_webui/routers/knowledge.py

@router.post("/{id}/file/batch/add", response_model=Optional[KnowledgeFilesResponse])
def add_files_to_knowledge_by_id(
    request: Request,
    id: str,
    form_data: BatchProcessFilesForm,
    user=Depends(get_verified_user),
):
    knowledge = Knowledges.get_knowledge_by_id(id=id)

    if not knowledge:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

    if (
        knowledge.user_id != user.id
        and not has_access(user.id, "write", knowledge.access_control)
        and user.role != "admin"
    ):
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.ACCESS_PROHIBITED,
        )

    # Process files in batch
    try:
        process_files_batch(
            request,
            BatchProcessFilesForm(
                file_ids=form_data.file_ids,
                collection_name=id
            ),
            user=user,
        )
    except Exception as e:
        log.debug(e)
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=str(e),
        )

    if knowledge:
        data = knowledge.data or {}
        file_ids = data.get("file_ids", [])

        # Add new file IDs
        for file_id in form_data.file_ids:
            if file_id not in file_ids:
                file_ids.append(file_id)

                # NEW: Insert junction table record
                FileKnowledges.insert_file_knowledge(
                    file_id=file_id,
                    knowledge_id=id
                )

        data["file_ids"] = file_ids
        knowledge = Knowledges.update_knowledge_data_by_id(id=id, data=data)

        if knowledge:
            files = Files.get_file_metadatas_by_ids(file_ids)
            return KnowledgeFilesResponse(
                **knowledge.model_dump(),
                files=files,
            )
        else:
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=ERROR_MESSAGES.DEFAULT("knowledge"),
            )
    else:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

Step 9: Testing the Implementation

After implementing these changes, test the following scenarios:

Add file to multiple Knowledge Bases: Verify that junction records are created for each relationship
Access control: Confirm that users can access files through any Knowledge Base they have permissions for
File removal: Ensure files are only deleted when removed from all Knowledge Bases
Migration: Test that existing knowledge.data.file_ids are properly migrated to the junction table
Backward compatibility: Verify that the fallback to file.meta.collection_name works for unmigrated data

Notes

This implementation provides a complete solution for the multi-collection file access control issue. The junction table approach properly models the many-to-many relationship between files and Knowledge Bases, ensuring that access control checks work correctly regardless of which Knowledge Base a file was most recently added to.

The migration script populates the junction table from existing knowledge.data.file_ids arrays, maintaining backward compatibility. The foreign key constraints with ondelete='CASCADE' ensure that junction records are automatically cleaned up when either a file or Knowledge Base is deleted.

@rgaricano commented on GitHub (Oct 28, 2025): @acwoo97 I did a quick check and the problem seem that came because in the file metadata is only stored the last collection where the file was uploaded (if is already uploaded it is added to collection and the file metadata is updated). When a file is added to multiple Knowledge Bases, its `file.meta` only stores the last `collection_name` it was added to. This creates a critical access control issue because `has_access_to_file()` uses this single collection_name to determine permissions. During RAG retrieving documents operations, the system does check Knowledge Base permissions correctly through `get_sources_from_items()`. It verifies that the user has access to the Knowledge Base being queried before retrieving documents. The problem arise when trying to access to this file through endpoints like `/api/v1/files/{id}`, which rely on the single collection_name metadata field. Some ways to solve it: 1. Storing an array of collection_names in file metadata instead of a single value. 2. Creating a separate junction table to track file-to-Knowledge Base relationships. 3. Modifying has_access_to_file() to check all Knowledge Bases containing the file's content in the vector database. Any of those is "easy" to patch, its need a more deep implement, but seem that the most robust is the option 2 (also it implementation is more complex) : - It properly models the many-to-many relationship. - Access checks are efficient SQL queries rather than vector DB lookups. - It maintains data integrity with foreign key constraints. - The migration path is clear: populate junction table from existing `knowledge.data.file_ids` arrays. If you can try, I left the junction table implementation for reference: # Draft of Implementation of Option 2 (Junction Table) to fix the multi-collection file access control issue. ## Step 1: Create the Junction Table Model First, create a new model file `backend/open_webui/models/file_knowledge.py`: ```python import time import uuid from typing import Optional, List from sqlalchemy import Column, String, BigInteger, ForeignKey, UniqueConstraint from pydantic import BaseModel, ConfigDict from open_webui.internal.db import Base, get_db class FileKnowledge(Base): __tablename__ = "file_knowledge" id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4())) file_id = Column(String, ForeignKey("file.id", ondelete="CASCADE"), nullable=False) knowledge_id = Column(String, ForeignKey("knowledge.id", ondelete="CASCADE"), nullable=False) created_at = Column(BigInteger) __table_args__ = (UniqueConstraint('file_id', 'knowledge_id', name='uix_file_knowledge'),) class FileKnowledgeModel(BaseModel): model_config = ConfigDict(from_attributes=True) id: str file_id: str knowledge_id: str created_at: int class FileKnowledgeTable: def insert_file_knowledge( self, file_id: str, knowledge_id: str ) -> Optional[FileKnowledgeModel]: with get_db() as db: # Check if relationship already exists existing = db.query(FileKnowledge).filter_by( file_id=file_id, knowledge_id=knowledge_id ).first() if existing: return FileKnowledgeModel.model_validate(existing) file_knowledge = FileKnowledgeModel( id=str(uuid.uuid4()), file_id=file_id, knowledge_id=knowledge_id, created_at=int(time.time()) ) try: result = FileKnowledge(**file_knowledge.model_dump()) db.add(result) db.commit() db.refresh(result) return FileKnowledgeModel.model_validate(result) except Exception: return None def get_knowledge_ids_by_file_id(self, file_id: str) -> List[str]: with get_db() as db: file_knowledges = db.query(FileKnowledge).filter_by(file_id=file_id).all() return [fk.knowledge_id for fk in file_knowledges] def get_file_ids_by_knowledge_id(self, knowledge_id: str) -> List[str]: with get_db() as db: file_knowledges = db.query(FileKnowledge).filter_by(knowledge_id=knowledge_id).all() return [fk.file_id for fk in file_knowledges] def delete_file_knowledge(self, file_id: str, knowledge_id: str) -> bool: with get_db() as db: result = db.query(FileKnowledge).filter_by( file_id=file_id, knowledge_id=knowledge_id ).delete() db.commit() return result > 0 def delete_by_file_id(self, file_id: str) -> bool: with get_db() as db: result = db.query(FileKnowledge).filter_by(file_id=file_id).delete() db.commit() return result > 0 def delete_by_knowledge_id(self, knowledge_id: str) -> bool: with get_db() as db: result = db.query(FileKnowledge).filter_by(knowledge_id=knowledge_id).delete() db.commit() return result > 0 FileKnowledges = FileKnowledgeTable() ``` ## Step 2: Update `add_file_to_knowledge_by_id()` Modify `backend/open_webui/routers/knowledge.py` to insert junction records: ```python # Add import at top of file from open_webui.models.file_knowledge import FileKnowledges # Update the function @router.post("/{id}/file/add", response_model=Optional[KnowledgeFilesResponse]) def add_file_to_knowledge_by_id( request: Request, id: str, form_data: KnowledgeFileIdForm, user=Depends(get_verified_user), ): knowledge = Knowledges.get_knowledge_by_id(id=id) if not knowledge: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) if ( knowledge.user_id != user.id and not has_access(user.id, "write", knowledge.access_control) and user.role != "admin" ): raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.ACCESS_PROHIBITED, ) file = Files.get_file_by_id(form_data.file_id) if not file: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) if not file.data: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.FILE_NOT_PROCESSED, ) # Add content to the vector database try: process_file( request, ProcessFileForm(file_id=form_data.file_id, collection_name=id), user=user, ) except Exception as e: log.debug(e) raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=str(e), ) if knowledge: data = knowledge.data or {} file_ids = data.get("file_ids", []) if form_data.file_id not in file_ids: file_ids.append(form_data.file_id) data["file_ids"] = file_ids knowledge = Knowledges.update_knowledge_data_by_id(id=id, data=data) # NEW: Insert junction table record FileKnowledges.insert_file_knowledge( file_id=form_data.file_id, knowledge_id=id ) if knowledge: files = Files.get_file_metadatas_by_ids(file_ids) return KnowledgeFilesResponse( **knowledge.model_dump(), files=files, ) else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("knowledge"), ) else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("file_id"), ) else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) ``` ## Step 3: Update `remove_file_from_knowledge_by_id()` Modify the removal logic to delete junction records and only delete files when no longer referenced: ```python @router.post("/{id}/file/remove", response_model=Optional[KnowledgeFilesResponse]) def remove_file_from_knowledge_by_id( id: str, form_data: KnowledgeFileIdForm, delete_file: bool = Query(True), user=Depends(get_verified_user), ): knowledge = Knowledges.get_knowledge_by_id(id=id) if not knowledge: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) if ( knowledge.user_id != user.id and not has_access(user.id, "write", knowledge.access_control) and user.role != "admin" ): raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.ACCESS_PROHIBITED, ) file = Files.get_file_by_id(form_data.file_id) if not file: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) # Remove content from the vector database try: VECTOR_DB_CLIENT.delete( collection_name=knowledge.id, filter={"file_id": form_data.file_id} ) except Exception as e: log.debug("This was most likely caused by bypassing embedding processing") log.debug(e) pass # NEW: Delete junction table record FileKnowledges.delete_file_knowledge( file_id=form_data.file_id, knowledge_id=id ) # NEW: Only delete file if it's not referenced by any other Knowledge Base if delete_file: remaining_knowledge_ids = FileKnowledges.get_knowledge_ids_by_file_id(form_data.file_id) if not remaining_knowledge_ids: try: # Remove the file's collection from vector database file_collection = f"file-{form_data.file_id}" if VECTOR_DB_CLIENT.has_collection(collection_name=file_collection): VECTOR_DB_CLIENT.delete_collection(collection_name=file_collection) except Exception as e: log.debug("This was most likely caused by bypassing embedding processing") log.debug(e) pass # Delete file from database Files.delete_file_by_id(form_data.file_id) if knowledge: data = knowledge.data or {} file_ids = data.get("file_ids", []) if form_data.file_id in file_ids: file_ids.remove(form_data.file_id) data["file_ids"] = file_ids knowledge = Knowledges.update_knowledge_data_by_id(id=id, data=data) if knowledge: files = Files.get_file_metadatas_by_ids(file_ids) return KnowledgeFilesResponse( **knowledge.model_dump(), files=files, ) else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("knowledge"), ) else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("file_id"), ) else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) ``` ## Step 4: Update `has_access_to_file()` Rewrite the access control check in `backend/open_webui/routers/files.py`: ```python # Add import at top from open_webui.models.file_knowledge import FileKnowledges def has_access_to_file( file_id: Optional[str], access_type: str, user=Depends(get_verified_user) ) -> bool: file = Files.get_file_by_id(file_id) log.debug(f"Checking if user has {access_type} access to file") if not file: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) # NEW: Get all Knowledge Bases this file belongs to via junction table knowledge_ids = FileKnowledges.get_knowledge_ids_by_file_id(file_id) if not knowledge_ids: # Fallback to old metadata-based check for backward compatibility knowledge_base_id = file.meta.get("collection_name") if file.meta else None if knowledge_base_id: knowledge_ids = [knowledge_base_id] else: return False # Check if user has access to any of these Knowledge Bases knowledge_bases = Knowledges.get_knowledge_bases_by_user_id(user.id, access_type) accessible_kb_ids = {kb.id for kb in knowledge_bases} has_access = any(kb_id in accessible_kb_ids for kb_id in knowledge_ids) return has_access ``` ## Step 5: Update `reset_knowledge_by_id()` Clean up junction records when resetting a Knowledge Base: ```python @router.post("/{id}/reset", response_model=Optional[KnowledgeResponse]) async def reset_knowledge_by_id(id: str, user=Depends(get_verified_user)): knowledge = Knowledges.get_knowledge_by_id(id=id) if not knowledge: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) if ( knowledge.user_id != user.id and not has_access(user.id, "write", knowledge.access_control) and user.role != "admin" ): raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.ACCESS_PROHIBITED, ) try: VECTOR_DB_CLIENT.delete_collection(collection_name=id) except Exception as e: log.debug(e) pass # NEW: Delete all junction records for this Knowledge Base FileKnowledges.delete_by_knowledge_id(id) knowledge = Knowledges.update_knowledge_data_by_id(id=id, data={"file_ids": []}) return knowledge ``` ## Step 6: Database Migration Script ```python """add file_knowledge junction table Revision ID: xxxx Revises: yyyy # Replace with actual previous revision Create Date: 2025-10-28 """ from alembic import op import sqlalchemy as sa from sqlalchemy import text import time revision = 'xxxx' down_revision = 'yyyy' # Replace with actual previous revision branch_labels = None depends_on = None def upgrade(): # Create the junction table op.create_table( 'file_knowledge', sa.Column('id', sa.String(), nullable=False), sa.Column('file_id', sa.String(), nullable=False), sa.Column('knowledge_id', sa.String(), nullable=False), sa.Column('created_at', sa.BigInteger(), nullable=True), sa.ForeignKeyConstraint(['file_id'], ['file.id'], ondelete='CASCADE'), sa.ForeignKeyConstraint(['knowledge_id'], ['knowledge.id'], ondelete='CASCADE'), sa.PrimaryKeyConstraint('id'), sa.UniqueConstraint('file_id', 'knowledge_id', name='uix_file_knowledge') ) # Migrate existing data from knowledge.data.file_ids to junction table conn = op.get_bind() # Get all knowledge bases with file_ids result = conn.execute(text(""" SELECT id, data FROM knowledge WHERE data IS NOT NULL """)) import json import uuid for row in result: knowledge_id = row[0] data = json.loads(row[1]) if isinstance(row[1], str) else row[1] if data and 'file_ids' in data: file_ids = data['file_ids'] # Insert junction records for each file for file_id in file_ids: conn.execute(text(""" INSERT INTO file_knowledge (id, file_id, knowledge_id, created_at) VALUES (:id, :file_id, :knowledge_id, :created_at) ON CONFLICT (file_id, knowledge_id) DO NOTHING """), { 'id': str(uuid.uuid4()), 'file_id': file_id, 'knowledge_id': knowledge_id, 'created_at': int(time.time()) }) conn.commit() def downgrade(): # Drop the junction table op.drop_table('file_knowledge') ``` ## Step 7: Update Knowledge Base Deletion Modify the delete endpoint to clean up junction records: ```python # In backend/open_webui/routers/knowledge.py, add to delete endpoint @router.delete("/{id}/delete", response_model=bool) async def delete_knowledge_by_id(id: str, user=Depends(get_verified_user)): knowledge = Knowledges.get_knowledge_by_id(id=id) if not knowledge: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) if ( knowledge.user_id != user.id and not has_access(user.id, "write", knowledge.access_control) and user.role != "admin" ): raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.ACCESS_PROHIBITED, ) try: VECTOR_DB_CLIENT.delete_collection(collection_name=id) except Exception as e: log.debug(e) pass # NEW: Delete all junction records for this Knowledge Base FileKnowledges.delete_by_knowledge_id(id) result = Knowledges.delete_knowledge_by_id(id=id) return result ``` ## Step 8: Update Batch File Processing Modify the batch processing endpoint to insert junction records: ```python # In backend/open_webui/routers/knowledge.py @router.post("/{id}/file/batch/add", response_model=Optional[KnowledgeFilesResponse]) def add_files_to_knowledge_by_id( request: Request, id: str, form_data: BatchProcessFilesForm, user=Depends(get_verified_user), ): knowledge = Knowledges.get_knowledge_by_id(id=id) if not knowledge: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) if ( knowledge.user_id != user.id and not has_access(user.id, "write", knowledge.access_control) and user.role != "admin" ): raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.ACCESS_PROHIBITED, ) # Process files in batch try: process_files_batch( request, BatchProcessFilesForm( file_ids=form_data.file_ids, collection_name=id ), user=user, ) except Exception as e: log.debug(e) raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=str(e), ) if knowledge: data = knowledge.data or {} file_ids = data.get("file_ids", []) # Add new file IDs for file_id in form_data.file_ids: if file_id not in file_ids: file_ids.append(file_id) # NEW: Insert junction table record FileKnowledges.insert_file_knowledge( file_id=file_id, knowledge_id=id ) data["file_ids"] = file_ids knowledge = Knowledges.update_knowledge_data_by_id(id=id, data=data) if knowledge: files = Files.get_file_metadatas_by_ids(file_ids) return KnowledgeFilesResponse( **knowledge.model_dump(), files=files, ) else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("knowledge"), ) else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) ``` ## Step 9: Testing the Implementation After implementing these changes, test the following scenarios: 1. **Add file to multiple Knowledge Bases**: Verify that junction records are created for each relationship 2. **Access control**: Confirm that users can access files through any Knowledge Base they have permissions for 3. **File removal**: Ensure files are only deleted when removed from all Knowledge Bases 4. **Migration**: Test that existing `knowledge.data.file_ids` are properly migrated to the junction table 5. **Backward compatibility**: Verify that the fallback to `file.meta.collection_name` works for unmigrated data ## Notes This implementation provides a complete solution for the multi-collection file access control issue. The junction table approach properly models the many-to-many relationship between files and Knowledge Bases, ensuring that access control checks work correctly regardless of which Knowledge Base a file was most recently added to. The migration script populates the junction table from existing `knowledge.data.file_ids` arrays, maintaining backward compatibility. The foreign key constraints with `ondelete='CASCADE'` ensure that junction records are automatically cleaned up when either a file or Knowledge Base is deleted.

GiteaMirror commented

2026-05-05 20:52:04 -05:00

@acwoo97 commented on GitHub (Oct 29, 2025):

@rgaricano
Thanks again for taking the time to look into this and for explaining everything in detail!
I’ll go through your suggestions and verify whether the behavior you described matches what I’m seeing on my side.

I might have misunderstood some parts of the flow — I actually didn’t realize that a file could be added to multiple Knowledge Bases. From what I observed, each file upload through the UI seemed to generate a new UUID, and the frontend used that UUID to call the /knowledge/add API. So I assumed even identical files were always created as new entries. I’ll review this part again to confirm how it really works.

Also, one interesting thing I noticed during debugging:
for the files that fail to load, not only does their metadata contain a collection_name with the file-{uuid} prefix, but that same UUID also appears in the Knowledge Base’s file_ids list.
That’s why my temporary workaround (as I mentioned earlier) — checking whether the file’s UUID exists in the Knowledge Base’s file_ids list — actually resolves the issue for now.

Lastly, it seems my earlier hypothesis about a possible race condition between the upload and knowledge-add APIs was incorrect. After reviewing the frontend code, I noticed it waits for the upload status before proceeding, so it’s unlikely to be a timing issue.

@acwoo97 commented on GitHub (Oct 29, 2025): @rgaricano Thanks again for taking the time to look into this and for explaining everything in detail! I’ll go through your suggestions and verify whether the behavior you described matches what I’m seeing on my side. I might have misunderstood some parts of the flow — I actually didn’t realize that a file could be added to multiple Knowledge Bases. From what I observed, each file upload through the UI seemed to generate a new UUID, and the frontend used that UUID to call the /knowledge/add API. So I assumed even identical files were always created as new entries. I’ll review this part again to confirm how it really works. Also, one interesting thing I noticed during debugging: for the files that fail to load, not only does their metadata contain a collection_name with the file-{uuid} prefix, but that same UUID also appears in the Knowledge Base’s file_ids list. That’s why my temporary workaround (as I mentioned earlier) — checking whether the file’s UUID exists in the Knowledge Base’s file_ids list — actually resolves the issue for now. Lastly, it seems my earlier hypothesis about a possible race condition between the upload and knowledge-add APIs was incorrect. After reviewing the frontend code, I noticed it waits for the upload status before proceeding, so it’s unlikely to be a timing issue.

GiteaMirror commented

2026-05-05 20:52:06 -05:00

@acwoo97 commented on GitHub (Oct 29, 2025):

think there might be another possible cause as well.
When calling knowledge/add, the process involves process_file, which internally calls Files.update_file_metadata_by_id().
If that function raises an exception, it only logs the error and simply returns None without propagating it.

So, if that step fails silently, the file would just keep its original file-{uuid} collection name from the initial upload.
I intentionally triggered a failure in that part, and it seems to reproduce the same behavior.

@acwoo97 commented on GitHub (Oct 29, 2025): think there might be another possible cause as well. When calling knowledge/add, the process involves process_file, which internally calls Files.update_file_metadata_by_id(). If that function raises an exception, it only logs the error and simply returns None without propagating it. So, if that step fails silently, the file would just keep its original file-{uuid} collection name from the initial upload. I intentionally triggered a failure in that part, and it seems to reproduce the same behavior.

GiteaMirror commented

2026-05-05 20:52:06 -05:00

@athoik commented on GitHub (Nov 27, 2025):

Hi,

A simple fix also created here: https://github.com/open-webui/open-webui/pull/19523

It checks if file id exists in kb file ids.

Become draft due to https://github.com/open-webui/open-webui/pull/19278

Would be nice to consider code in this thread applied on #19278

@athoik commented on GitHub (Nov 27, 2025): Hi, A simple fix also created here: https://github.com/open-webui/open-webui/pull/19523 It checks if file id exists in kb file ids. Become draft due to https://github.com/open-webui/open-webui/pull/19278 Would be nice to consider code in this thread applied on #19278

GiteaMirror commented

2026-05-05 20:52:08 -05:00

@Classic298 commented on GitHub (Nov 27, 2025):

the knowledge file table migration is probably the blocker here

@Classic298 commented on GitHub (Nov 27, 2025): the knowledge file table migration is probably the blocker here

GiteaMirror commented

2026-05-05 20:52:09 -05:00

@tjbck commented on GitHub (Dec 2, 2025):

Should be addressed in dev with 9f6c91987f

@tjbck commented on GitHub (Dec 2, 2025): Should be addressed in dev with 9f6c91987fcd03033186b3ab4a2f9be505856efe

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#57339

[GH-ISSUE #18689] issue: Knowledge file access inconsistency — some files not accessible due to mismatched collection_name #57339

Check Existing Issues

Installation Method

Open WebUI Version

Ollama Version (if applicable)

Operating System

Browser (if applicable)

Confirmation

Expected Behavior

Actual Behavior

Steps to Reproduce

Logs & Screenshots

Additional Information

Draft of Implementation of Option 2 (Junction Table) to fix the multi-collection file access control issue.

Step 1: Create the Junction Table Model

Step 2: Update add_file_to_knowledge_by_id()

Step 3: Update remove_file_from_knowledge_by_id()

Step 4: Update has_access_to_file()

Step 5: Update reset_knowledge_by_id()

Step 6: Database Migration Script

Step 7: Update Knowledge Base Deletion

Step 8: Update Batch File Processing

Step 9: Testing the Implementation

Notes

Step 2: Update `add_file_to_knowledge_by_id()`

Step 3: Update `remove_file_from_knowledge_by_id()`

Step 4: Update `has_access_to_file()`

Step 5: Update `reset_knowledge_by_id()`