[GH-ISSUE #20846] issue: Bug with pagination for search_files_by_id. Leads to duplicates being returned or files missing #57975

Closed
opened 2026-05-05 22:04:30 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @thomasmhofmann on GitHub (Jan 21, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20846

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.7.2

Ollama Version (if applicable)

No response

Operating System

Linux from official container image

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Bug Report: Non-Deterministic Pagination in Knowledge Base File Listing

Summary

The knowledge base file listing API (GET /api/v1/knowledge/{id}/files) returns non-deterministic results when paginating through files that share the same updated_at timestamp. This causes files to appear multiple times across pages or not appear at all.

Environment

  • OpenWebUI Version: 0.7.2
  • Database: PostgreSQL
  • API Endpoint: GET /api/v1/knowledge/{id}/files?page={N}

Expected Behavior

Each file should appear exactly once across all paginated results. The total count should match the number of unique files returned.

Actual Behavior

Example with 208 files in knowledge base:

  • API reports: total: 208
  • File lb-1603-v2.0.md (id: ab2efd9f-86a5-4aaa-8a02-9377f32da4d3) appears on both page 1 and page 2
  • File lb-448-v1.0.md (id: 1a1f1963-62ce-4404-b892-7fea3c5ca8da) never appears on any page
  • Only 207 unique files are returned across all pages

Database verification shows both files exist and are linked to the knowledge base:

SELECT f.id, f.filename, f.updated_at
FROM file f
JOIN knowledge_file kf ON f.id = kf.file_id
WHERE kf.knowledge_id = '7042a333-0132-4f75-8c98-8986f8d5e7a9'
  AND f.filename IN ('lb-448-v1.0.md', 'lb-1603-v2.0.md')
ORDER BY f.updated_at DESC, f.id ASC;

Result:

id                                   | filename          | updated_at
-------------------------------------|-------------------|------------
1a1f1963-62ce-4404-b892-7fea3c5ca8da | lb-448-v1.0.md    | 1768938808
ab2efd9f-86a5-4aaa-8a02-9377f32da4d3 | lb-1603-v2.0.md   | 1768938808

Both files have identical updated_at timestamps.

Root Cause

File: backend/open_webui/models/knowledge.py
Method: Knowledges.search_files_by_id()
Line: 463

query = query.order_by(File.updated_at.desc())

The query sorts by File.updated_at without a secondary sort key. When multiple files have the same updated_at value, the database returns them in undefined order. This order can vary between pagination requests, causing:

  1. Duplicate entries: File appears at end of page N and beginning of page N+1
  2. Missing entries: File appears at beginning of page N and end of page N-1 (skipped by OFFSET)

This is a well-known database pagination anti-pattern when sorting by non-unique columns.

Steps to Reproduce

  1. Create a knowledge base with multiple files
  2. Ensure some files have identical updated_at timestamps (e.g., bulk upload)
  3. Query the API with pagination:
    • GET /api/v1/knowledge/{id}/files?page=1
    • GET /api/v1/knowledge/{id}/files?page=2
    • Continue through all pages
  4. Observe that:
    • Some files appear on multiple pages
    • Some files never appear in any page
    • Total count doesn't match unique files returned

Logs & Screenshots

SELECT 
    f.id,
    f.filename,
    f.updated_at,
    kf.knowledge_id
FROM file f
JOIN knowledge_file kf ON f.id = kf.file_id
WHERE kf.knowledge_id = '7042a333-0132-4f75-8c98-8986f8d5e7a9'
  AND (f.filename LIKE '%1603%' OR f.filename LIKE '%448%')
ORDER BY f.updated_at DESC, f.id ASC;

Results in:

"1a1f1963-62ce-4404-b892-7fea3c5ca8da"	"lb-448-v1.0.md"	1768938808	"7042a333-0132-4f75-8c98-8986f8d5e7a9"
"ab2efd9f-86a5-4aaa-8a02-9377f32da4d3"	"lb-1603-v2.0.md"	1768938808	"7042a333-0132-4f75-8c98-8986f8d5e7a9"

API result is missing lb-448-v1.0.md but lb-1603-v2.0.md appears twice.

Additional Information

Proposed Fix

Add a secondary sort on a unique column (e.g., File.id) to ensure deterministic ordering:

query = query.order_by(File.updated_at.desc(), File.id.asc())

This should be applied to all order_by clauses in the method:

# Line 446-448
if direction == "asc":
    query = query.order_by(File.filename.asc(), File.id.asc())
else:
    query = query.order_by(File.filename.desc(), File.id.asc())

# Line 451-453
if direction == "asc":
    query = query.order_by(File.created_at.asc(), File.id.asc())
else:
    query = query.order_by(File.created_at.desc(), File.id.asc())

# Line 456-458
if direction == "asc":
    query = query.order_by(File.updated_at.asc(), File.id.asc())
else:
    query = query.order_by(File.updated_at.desc(), File.id.asc())

# Line 460 (default)
query = query.order_by(File.updated_at.desc(), File.id.asc())

# Line 463 (else branch)
query = query.order_by(File.updated_at.desc(), File.id.asc())

Impact

  • Affected Users: Anyone using knowledge bases with bulk-uploaded files or files with synchronized timestamps
  • Workaround: See "Client-Side Workaround" section below

Client-Side Workaround

Until the OpenWebUI API is fixed, clients can work around this issue by specifying a sort order on a unique column using the filter parameters:

API Request:

GET /api/v1/knowledge/{id}/files?page=1&order_by=name&direction=asc

Filter Parameters:

  • order_by=name: Sort by filename (unique in most cases)
  • direction=asc: Ascending order for consistency

This ensures deterministic ordering across paginated requests, preventing duplicates and missing files.

Implementation Example (Java with JAX-RS):

@GET
@Path("/{knowledgeId}/files")
JsonNode getKnowledgeBaseFiles(
    @HeaderParam("Authorization") String bearerToken,
    @PathParam("knowledgeId") String knowledgeId,
    @QueryParam("page") Integer page,
    @QueryParam("view_option") String viewOption,
    @QueryParam("order_by") String orderBy,        // Add this
    @QueryParam("direction") String direction      // Add this
);

// Usage
JsonNode response = client.getKnowledgeBaseFiles(
    authHeader, knowledgeId, page, "metadata", "name", "asc"
);

Limitations:

  • This workaround only works if filenames are unique within the knowledge base

Additional Notes

This bug was introduced when pagination was added to the knowledge base file listing endpoint. Before pagination, the entire result set was returned in one query, so the non-deterministic ordering wasn't visible.

Commit that caused the issue:

94a8439105

Originally created by @thomasmhofmann on GitHub (Jan 21, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/20846 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.7.2 ### Ollama Version (if applicable) _No response_ ### Operating System Linux from official container image ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior # Bug Report: Non-Deterministic Pagination in Knowledge Base File Listing ## Summary The knowledge base file listing API (`GET /api/v1/knowledge/{id}/files`) returns non-deterministic results when paginating through files that share the same `updated_at` timestamp. This causes files to appear multiple times across pages or not appear at all. ## Environment - **OpenWebUI Version**: 0.7.2 - **Database**: PostgreSQL - **API Endpoint**: `GET /api/v1/knowledge/{id}/files?page={N}` ## Expected Behavior Each file should appear exactly once across all paginated results. The total count should match the number of unique files returned. ### Actual Behavior **Example with 208 files in knowledge base:** - API reports: `total: 208` - File `lb-1603-v2.0.md` (id: `ab2efd9f-86a5-4aaa-8a02-9377f32da4d3`) appears on **both page 1 and page 2** - File `lb-448-v1.0.md` (id: `1a1f1963-62ce-4404-b892-7fea3c5ca8da`) **never appears** on any page - Only 207 unique files are returned across all pages **Database verification shows both files exist and are linked to the knowledge base:** ```sql SELECT f.id, f.filename, f.updated_at FROM file f JOIN knowledge_file kf ON f.id = kf.file_id WHERE kf.knowledge_id = '7042a333-0132-4f75-8c98-8986f8d5e7a9' AND f.filename IN ('lb-448-v1.0.md', 'lb-1603-v2.0.md') ORDER BY f.updated_at DESC, f.id ASC; ``` **Result:** ``` id | filename | updated_at -------------------------------------|-------------------|------------ 1a1f1963-62ce-4404-b892-7fea3c5ca8da | lb-448-v1.0.md | 1768938808 ab2efd9f-86a5-4aaa-8a02-9377f32da4d3 | lb-1603-v2.0.md | 1768938808 ``` Both files have **identical `updated_at` timestamps**. ## Root Cause **File**: `backend/open_webui/models/knowledge.py` **Method**: `Knowledges.search_files_by_id()` **Line**: 463 ```python query = query.order_by(File.updated_at.desc()) ``` The query sorts by `File.updated_at` without a secondary sort key. When multiple files have the same `updated_at` value, the database returns them in **undefined order**. This order can vary between pagination requests, causing: 1. **Duplicate entries**: File appears at end of page N and beginning of page N+1 2. **Missing entries**: File appears at beginning of page N and end of page N-1 (skipped by OFFSET) This is a well-known database pagination anti-pattern when sorting by non-unique columns. ### Steps to Reproduce 1. Create a knowledge base with multiple files 2. Ensure some files have identical `updated_at` timestamps (e.g., bulk upload) 3. Query the API with pagination: - `GET /api/v1/knowledge/{id}/files?page=1` - `GET /api/v1/knowledge/{id}/files?page=2` - Continue through all pages 4. Observe that: - Some files appear on multiple pages - Some files never appear in any page - Total count doesn't match unique files returned ### Logs & Screenshots ``` SELECT f.id, f.filename, f.updated_at, kf.knowledge_id FROM file f JOIN knowledge_file kf ON f.id = kf.file_id WHERE kf.knowledge_id = '7042a333-0132-4f75-8c98-8986f8d5e7a9' AND (f.filename LIKE '%1603%' OR f.filename LIKE '%448%') ORDER BY f.updated_at DESC, f.id ASC; ``` Results in: ``` "1a1f1963-62ce-4404-b892-7fea3c5ca8da" "lb-448-v1.0.md" 1768938808 "7042a333-0132-4f75-8c98-8986f8d5e7a9" "ab2efd9f-86a5-4aaa-8a02-9377f32da4d3" "lb-1603-v2.0.md" 1768938808 "7042a333-0132-4f75-8c98-8986f8d5e7a9" ``` API result is missing lb-448-v1.0.md but lb-1603-v2.0.md appears twice. ### Additional Information ## Proposed Fix Add a secondary sort on a unique column (e.g., `File.id`) to ensure deterministic ordering: ```python query = query.order_by(File.updated_at.desc(), File.id.asc()) ``` This should be applied to all order_by clauses in the method: ```python # Line 446-448 if direction == "asc": query = query.order_by(File.filename.asc(), File.id.asc()) else: query = query.order_by(File.filename.desc(), File.id.asc()) # Line 451-453 if direction == "asc": query = query.order_by(File.created_at.asc(), File.id.asc()) else: query = query.order_by(File.created_at.desc(), File.id.asc()) # Line 456-458 if direction == "asc": query = query.order_by(File.updated_at.asc(), File.id.asc()) else: query = query.order_by(File.updated_at.desc(), File.id.asc()) # Line 460 (default) query = query.order_by(File.updated_at.desc(), File.id.asc()) # Line 463 (else branch) query = query.order_by(File.updated_at.desc(), File.id.asc()) ``` ## Impact - **Affected Users**: Anyone using knowledge bases with bulk-uploaded files or files with synchronized timestamps - **Workaround**: See "Client-Side Workaround" section below ## Client-Side Workaround Until the OpenWebUI API is fixed, clients can work around this issue by specifying a sort order on a unique column using the filter parameters: **API Request**: ``` GET /api/v1/knowledge/{id}/files?page=1&order_by=name&direction=asc ``` **Filter Parameters**: - `order_by=name`: Sort by filename (unique in most cases) - `direction=asc`: Ascending order for consistency This ensures deterministic ordering across paginated requests, preventing duplicates and missing files. **Implementation Example** (Java with JAX-RS): ```java @GET @Path("/{knowledgeId}/files") JsonNode getKnowledgeBaseFiles( @HeaderParam("Authorization") String bearerToken, @PathParam("knowledgeId") String knowledgeId, @QueryParam("page") Integer page, @QueryParam("view_option") String viewOption, @QueryParam("order_by") String orderBy, // Add this @QueryParam("direction") String direction // Add this ); // Usage JsonNode response = client.getKnowledgeBaseFiles( authHeader, knowledgeId, page, "metadata", "name", "asc" ); ``` **Limitations**: - This workaround only works if filenames are *unique* within the knowledge base ## Additional Notes This bug was introduced when pagination was added to the knowledge base file listing endpoint. Before pagination, the entire result set was returned in one query, so the non-deterministic ordering wasn't visible. Commit that caused the issue: https://github.com/open-webui/open-webui/commit/94a8439105f30203ea9d729787c9c5978f5c22a2
GiteaMirror added the bug label 2026-05-05 22:04:30 -05:00
Author
Owner

@owui-terminator[bot] commented on GitHub (Jan 21, 2026):

🔍 Similar Issues Found

I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:

  1. #20641 issue: Web Search and Builtin Tools permissions break search
    by HenkieTenkie62 • Jan 13, 2026 • bug

  2. #19264 issue: Uploaded file hash remains in database even when OCR fails, causing false duplicate detection
    by flefevre • Nov 18, 2025 • bug

  3. #20552 issue: Retrieval: list index out of range
    by outis151 • Jan 10, 2026 • bug

  4. #20595 issue: "search_web" tool executed even when "Web Search" control disabled
    by SlavikCA • Jan 11, 2026 • bug

  5. #19429 issue: user list wrong count and less than 30 items per page
    by destination-one • Nov 24, 2025 • bug


💡 Tips:

  • If this is a duplicate, please consider closing this issue and adding any additional details to the existing one
  • If you found a solution in any of these issues, please share it here to help others

This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.

<!-- gh-comment-id:3780095567 --> @owui-terminator[bot] commented on GitHub (Jan 21, 2026): 🔍 **Similar Issues Found** I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions: 1. [#20641](https://github.com/open-webui/open-webui/issues/20641) **issue: Web Search and Builtin Tools permissions break search** *by HenkieTenkie62 • Jan 13, 2026 • `bug`* 2. [#19264](https://github.com/open-webui/open-webui/issues/19264) **issue: Uploaded file hash remains in database even when OCR fails, causing false duplicate detection** *by flefevre • Nov 18, 2025 • `bug`* 3. [#20552](https://github.com/open-webui/open-webui/issues/20552) **issue: Retrieval: list index out of range** *by outis151 • Jan 10, 2026 • `bug`* 4. [#20595](https://github.com/open-webui/open-webui/issues/20595) **issue: "search_web" tool executed even when "Web Search" control disabled** *by SlavikCA • Jan 11, 2026 • `bug`* 5. [#19429](https://github.com/open-webui/open-webui/issues/19429) **issue: user list wrong count and less than 30 items per page** *by destination-one • Nov 24, 2025 • `bug`* --- 💡 **Tips:** - If this is a duplicate, please consider closing this issue and adding any additional details to the existing one - If you found a solution in any of these issues, please share it here to help others *This comment was generated automatically by a bot.* Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
Author
Owner

@tjbck commented on GitHub (Jan 21, 2026):

Addressed in dev!

<!-- gh-comment-id:3781550840 --> @tjbck commented on GitHub (Jan 21, 2026): Addressed in dev!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#57975