[PR #13353] [CLOSED] PR: **chore** Postgresql / ChromaDB Maintenance Script #23161

Closed
opened 2026-04-20 04:40:25 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/13353
Author: @spammenotinoz
Created: 4/30/2025
Status: Closed

Base: devHead: dev


📝 Commits (1)

  • b004ee8 Postgresql/ChromaDB Cleanup

📊 Changes

1 file changed (+187 additions, -0 deletions)

View changed files

scripts/postgres_chroma.cleanup.py (+187 -0)

📄 Description

<html>

The intention is to reduce the size of the ChromaDB and PostSQL databases cleaning up orphaned records.
Note: --delete-vectors is extremely resource intensive and should only be used as part of regular maintenance and not a make good.

CHANGELOG ENTRY

Description

This Python script is designed to manage and clean up Postgres database entries, files, and vector store collections. It connects to a PostgreSQL database, examines stored chat and knowledge data, and optionally deletes unused files, database entries, and vector store collections to free up resources.

By default it does not delete\change anything.

Requirements \ Dependencies

Python 3.x
psycopg2
psycopg2.extras
chromadb
psutil
json
argparse
os
sys
concurrent.futures

USAGE

python postgres_chroma.cleanup.py [options]

Optional Arguments

Option | Description | Default | Example -- | -- | -- | -- --chroma-path | Path to the Chroma vector database directory. If not provided, defaults to script directory | None | /path/to/vector_db -b, --batch-chats | Number of chat entries to process per batch (adjust for performance/memory usage) | 10 | 50 -l, --list-files | List files marked for deletion without executing deletions | False | N/A --delete-files | Delete identified unused files from storage | False | N/A --delete-db-entries | Delete database entries (in file table) that are unused | False | N/A --delete-vectors | Delete vector store collections not associated with current data | False | N/A --no-confirm | Skip confirmation prompts before deletion actions | False | N/A --log-memory | Log memory usage at different steps in the script | False | N/A

Script Functionality Breakdown

  1. Connects to PostgreSQL
    Uses provided database URL to connect.
    Reads file table for IDs.
    Reads knowledge table to extract knowledge IDs.
    Streams chat entries.
  2. Extracts IDs
    File IDs: All IDs from file table.
    Knowledge IDs: Extracted from knowledge table JSON data.
    Chat File IDs: Extracted from chat content by recursive search for file IDs.
  3. Checks for conflicts
    Ensures no overlapping IDs between knowledge and chat files (raises error if found).
  4. Determines files to delete
    Finds file IDs that are not associated with current knowledge or chat data.
    Reads files from the uploads directory, identifies files by prefix before underscore _.
    Lists which files are safe to delete.
  5. Works with Chroma vector store
    Lists existing vector collections.
    Identifies collections that are not associated with current data.
    Optionally deletes these collections.
  6. Deletes files and database entries
    Files: Deletes files from storage if --delete-files.
    Vector store collections: Deletes collections if --delete-vectors.
    Database Entries: Deletes records from file table if --delete-db-entries.
  7. Provides confirmation prompts (unless --no-confirm)
    Before deletion actions, prompts the user unless overridden.
  8. Logs memory usage
    Optionally logs used memory during key steps if --log-memory.

By submitting this pull request, I confirm that I have read and fully agree to the CONTRIBUTOR_LICENSE_AGREEMENT, and I am providing my contributions under its terms.

</html>The intention is to reduce the size of the ChromaDB and PostSQL databases cleaning up orphaned records. Note: --delete-vectors is extremely resource intensive and should only be used as part of regular maintenance and not a make good.

CHANGELOG ENTRY
Description
This Python script is designed to manage and clean up Postgres database entries, files, and vector store collections. It connects to a PostgreSQL database, examines stored chat and knowledge data, and optionally deletes unused files, database entries, and vector store collections to free up resources.

By default it does not delete\change anything.

Requirements \ Dependencies
Python 3.x
psycopg2
psycopg2.extras
chromadb
psutil
json
argparse
os
sys
concurrent.futures

USAGE
python postgres_chroma.cleanup.py [options]

Optional Arguments
Option Description Default Example
--chroma-path Path to the Chroma vector database directory. If not provided, defaults to script directory None /path/to/vector_db
-b, --batch-chats Number of chat entries to process per batch (adjust for performance/memory usage) 10 50
-l, --list-files List files marked for deletion without executing deletions False N/A
--delete-files Delete identified unused files from storage False N/A
--delete-db-entries Delete database entries (in file table) that are unused False N/A
--delete-vectors Delete vector store collections not associated with current data False N/A
--no-confirm Skip confirmation prompts before deletion actions False N/A
--log-memory Log memory usage at different steps in the script False N/A
Script Functionality Breakdown

Connects to PostgreSQL
Uses provided database URL to connect.
Reads file table for IDs.
Reads knowledge table to extract knowledge IDs.
Streams chat entries.
Extracts IDs
File IDs: All IDs from file table.
Knowledge IDs: Extracted from knowledge table JSON data.
Chat File IDs: Extracted from chat content by recursive search for file IDs.
Checks for conflicts
Ensures no overlapping IDs between knowledge and chat files (raises error if found).
Determines files to delete
Finds file IDs that are not associated with current knowledge or chat data.
Reads files from the uploads directory, identifies files by prefix before underscore _.
Lists which files are safe to delete.
Works with Chroma vector store
Lists existing vector collections.
Identifies collections that are not associated with current data.
Optionally deletes these collections.
Deletes files and database entries
Files: Deletes files from storage if --delete-files.
Vector store collections: Deletes collections if --delete-vectors.
Database Entries: Deletes records from file table if --delete-db-entries.
Provides confirmation prompts (unless --no-confirm)
Before deletion actions, prompts the user unless overridden.
Logs memory usage
Optionally logs used memory during key steps if --log-memory.
By submitting this pull request, I confirm that I have read and fully agree to the CONTRIBUTOR_LICENSE_AGREEMENT, and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/13353 **Author:** [@spammenotinoz](https://github.com/spammenotinoz) **Created:** 4/30/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `dev` --- ### 📝 Commits (1) - [`b004ee8`](https://github.com/open-webui/open-webui/commit/b004ee8f0af368f6225ff70ec73226a093812bec) Postgresql/ChromaDB Cleanup ### 📊 Changes **1 file changed** (+187 additions, -0 deletions) <details> <summary>View changed files</summary> ➕ `scripts/postgres_chroma.cleanup.py` (+187 -0) </details> ### 📄 Description <html> <body> <!--StartFragment--><p dir="auto" style="box-sizing: border-box; margin-top: 0px !important; margin-bottom: 16px; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">The intention is to reduce the size of the ChromaDB and PostSQL databases cleaning up orphaned records.<br style="box-sizing: border-box;">Note: --delete-vectors is extremely resource intensive and should only be used as part of regular maintenance and not a make good.</p><h3 dir="auto" style="box-sizing: border-box; margin-top: 24px; margin-bottom: 16px; font-size: 1.25em; font-weight: 600; line-height: 1.25; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">CHANGELOG ENTRY</h3><h3 dir="auto" style="box-sizing: border-box; margin-top: 24px; margin-bottom: 16px; font-size: 1.25em; font-weight: 600; line-height: 1.25; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Description</h3><p dir="auto" style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">This Python script is designed to manage and clean up Postgres database entries, files, and vector store collections. It connects to a PostgreSQL database, examines stored chat and knowledge data, and optionally deletes unused files, database entries, and vector store collections to free up resources.</p><p dir="auto" style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">By default it does not delete\change anything.</p><h4 dir="auto" style="box-sizing: border-box; margin-top: 24px; margin-bottom: 16px; font-size: 14px; font-weight: 600; line-height: 1.25; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Requirements \ Dependencies</h4><p dir="auto" style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Python 3.x<br style="box-sizing: border-box;">psycopg2<br style="box-sizing: border-box;">psycopg2.extras<br style="box-sizing: border-box;">chromadb<br style="box-sizing: border-box;">psutil<br style="box-sizing: border-box;">json<br style="box-sizing: border-box;">argparse<br style="box-sizing: border-box;">os<br style="box-sizing: border-box;">sys<br style="box-sizing: border-box;">concurrent.futures</p><h3 dir="auto" style="box-sizing: border-box; margin-top: 24px; margin-bottom: 16px; font-size: 1.25em; font-weight: 600; line-height: 1.25; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">USAGE</h3><p dir="auto" style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">python postgres_chroma.cleanup.py [options]</p><h2 dir="auto" style="box-sizing: border-box; margin-top: 24px; margin-bottom: 16px; font-size: 1.5em; font-weight: 600; line-height: 1.25; padding-bottom: 0.3em; border-bottom: 0.829187px solid rgba(61, 68, 77, 0.7); color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Optional Arguments</h2><div dir="auto" style="box-sizing: border-box; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><div dir="auto" style="box-sizing: border-box;"><markdown-accessiblity-table data-catalyst="" style="box-sizing: border-box; display: block;"> Option | Description | Default | Example -- | -- | -- | -- --chroma-path | Path to the Chroma vector database directory. If not provided, defaults to script directory | None | /path/to/vector_db -b, --batch-chats | Number of chat entries to process per batch (adjust for performance/memory usage) | 10 | 50 -l, --list-files | List files marked for deletion without executing deletions | False | N/A --delete-files | Delete identified unused files from storage | False | N/A --delete-db-entries | Delete database entries (in file table) that are unused | False | N/A --delete-vectors | Delete vector store collections not associated with current data | False | N/A --no-confirm | Skip confirmation prompts before deletion actions | False | N/A --log-memory | Log memory usage at different steps in the script | False | N/A </markdown-accessiblity-table></div></div><p dir="auto" style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Script Functionality Breakdown</p><ol dir="auto" style="box-sizing: border-box; padding: 0px; margin-top: 0px; margin-bottom: 16px; position: relative; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><li style="box-sizing: border-box; margin-left: 24px;">Connects to PostgreSQL<br style="box-sizing: border-box;">Uses provided database URL to connect.<br style="box-sizing: border-box;">Reads file table for IDs.<br style="box-sizing: border-box;">Reads knowledge table to extract knowledge IDs.<br style="box-sizing: border-box;">Streams chat entries.</li><li style="box-sizing: border-box; margin-top: 0.25em; margin-left: 24px;">Extracts IDs<br style="box-sizing: border-box;">File IDs: All IDs from file table.<br style="box-sizing: border-box;">Knowledge IDs: Extracted from knowledge table JSON data.<br style="box-sizing: border-box;">Chat File IDs: Extracted from chat content by recursive search for file IDs.</li><li style="box-sizing: border-box; margin-top: 0.25em; margin-left: 24px;">Checks for conflicts<br style="box-sizing: border-box;">Ensures no overlapping IDs between knowledge and chat files (raises error if found).</li><li style="box-sizing: border-box; margin-top: 0.25em; margin-left: 24px;">Determines files to delete<br style="box-sizing: border-box;">Finds file IDs that are not associated with current knowledge or chat data.<br style="box-sizing: border-box;">Reads files from the uploads directory, identifies files by prefix before underscore _.<br style="box-sizing: border-box;">Lists which files are safe to delete.</li><li style="box-sizing: border-box; margin-top: 0.25em; margin-left: 24px;">Works with Chroma vector store<br style="box-sizing: border-box;">Lists existing vector collections.<br style="box-sizing: border-box;">Identifies collections that are not associated with current data.<br style="box-sizing: border-box;">Optionally deletes these collections.</li><li style="box-sizing: border-box; margin-top: 0.25em; margin-left: 24px;">Deletes files and database entries<br style="box-sizing: border-box;">Files: Deletes files from storage if --delete-files.<br style="box-sizing: border-box;">Vector store collections: Deletes collections if --delete-vectors.<br style="box-sizing: border-box;">Database Entries: Deletes records from file table if --delete-db-entries.</li><li style="box-sizing: border-box; margin-top: 0.25em; margin-left: 24px;">Provides confirmation prompts (unless --no-confirm)<br style="box-sizing: border-box;">Before deletion actions, prompts the user unless overridden.</li><li style="box-sizing: border-box; margin-top: 0.25em; margin-left: 24px;">Logs memory usage<br style="box-sizing: border-box;">Optionally logs used memory during key steps if --log-memory.</li></ol><p dir="auto" style="box-sizing: border-box; margin-top: 0px; margin-bottom: 0px !important; color: rgb(240, 246, 252); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, &quot;Noto Sans&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">By submitting this pull request, I confirm that I have read and fully agree to the<span> </span><a href="https://github.com/open-webui/open-webui/pull/CONTRIBUTOR_LICENSE_AGREEMENT" style="box-sizing: border-box; background-color: transparent; color: rgb(68, 147, 248); text-decoration: underline; text-underline-offset: 0.2rem;">CONTRIBUTOR_LICENSE_AGREEMENT</a>, and I am providing my contributions under its terms.</p><!--EndFragment--> </body> </html>The intention is to reduce the size of the ChromaDB and PostSQL databases cleaning up orphaned records. Note: --delete-vectors is extremely resource intensive and should only be used as part of regular maintenance and not a make good. CHANGELOG ENTRY Description This Python script is designed to manage and clean up Postgres database entries, files, and vector store collections. It connects to a PostgreSQL database, examines stored chat and knowledge data, and optionally deletes unused files, database entries, and vector store collections to free up resources. By default it does not delete\change anything. Requirements \ Dependencies Python 3.x psycopg2 psycopg2.extras chromadb psutil json argparse os sys concurrent.futures USAGE python postgres_chroma.cleanup.py [options] Optional Arguments Option Description Default Example --chroma-path Path to the Chroma vector database directory. If not provided, defaults to script directory None /path/to/vector_db -b, --batch-chats Number of chat entries to process per batch (adjust for performance/memory usage) 10 50 -l, --list-files List files marked for deletion without executing deletions False N/A --delete-files Delete identified unused files from storage False N/A --delete-db-entries Delete database entries (in file table) that are unused False N/A --delete-vectors Delete vector store collections not associated with current data False N/A --no-confirm Skip confirmation prompts before deletion actions False N/A --log-memory Log memory usage at different steps in the script False N/A Script Functionality Breakdown Connects to PostgreSQL Uses provided database URL to connect. Reads file table for IDs. Reads knowledge table to extract knowledge IDs. Streams chat entries. Extracts IDs File IDs: All IDs from file table. Knowledge IDs: Extracted from knowledge table JSON data. Chat File IDs: Extracted from chat content by recursive search for file IDs. Checks for conflicts Ensures no overlapping IDs between knowledge and chat files (raises error if found). Determines files to delete Finds file IDs that are not associated with current knowledge or chat data. Reads files from the uploads directory, identifies files by prefix before underscore _. Lists which files are safe to delete. Works with Chroma vector store Lists existing vector collections. Identifies collections that are not associated with current data. Optionally deletes these collections. Deletes files and database entries Files: Deletes files from storage if --delete-files. Vector store collections: Deletes collections if --delete-vectors. Database Entries: Deletes records from file table if --delete-db-entries. Provides confirmation prompts (unless --no-confirm) Before deletion actions, prompts the user unless overridden. Logs memory usage Optionally logs used memory during key steps if --log-memory. By submitting this pull request, I confirm that I have read and fully agree to the [CONTRIBUTOR_LICENSE_AGREEMENT](https://github.com/open-webui/open-webui/pull/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 04:40:25 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#23161