mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-11 00:13:40 -05:00
[PR #13353] [CLOSED] PR: **chore** Postgresql / ChromaDB Maintenance Script #23161
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/13353
Author: @spammenotinoz
Created: 4/30/2025
Status: ❌ Closed
Base:
dev← Head:dev📝 Commits (1)
b004ee8Postgresql/ChromaDB Cleanup📊 Changes
1 file changed (+187 additions, -0 deletions)
View changed files
➕
scripts/postgres_chroma.cleanup.py(+187 -0)📄 Description
<html>The intention is to reduce the size of the ChromaDB and PostSQL databases cleaning up orphaned records.
Note: --delete-vectors is extremely resource intensive and should only be used as part of regular maintenance and not a make good.
CHANGELOG ENTRY
Description
This Python script is designed to manage and clean up Postgres database entries, files, and vector store collections. It connects to a PostgreSQL database, examines stored chat and knowledge data, and optionally deletes unused files, database entries, and vector store collections to free up resources.
By default it does not delete\change anything.
Requirements \ Dependencies
Python 3.x
psycopg2
psycopg2.extras
chromadb
psutil
json
argparse
os
sys
concurrent.futures
USAGE
python postgres_chroma.cleanup.py [options]
Optional Arguments
Script Functionality Breakdown
Uses provided database URL to connect.
Reads file table for IDs.
Reads knowledge table to extract knowledge IDs.
Streams chat entries.
File IDs: All IDs from file table.
Knowledge IDs: Extracted from knowledge table JSON data.
Chat File IDs: Extracted from chat content by recursive search for file IDs.
Ensures no overlapping IDs between knowledge and chat files (raises error if found).
Finds file IDs that are not associated with current knowledge or chat data.
Reads files from the uploads directory, identifies files by prefix before underscore _.
Lists which files are safe to delete.
Lists existing vector collections.
Identifies collections that are not associated with current data.
Optionally deletes these collections.
Files: Deletes files from storage if --delete-files.
Vector store collections: Deletes collections if --delete-vectors.
Database Entries: Deletes records from file table if --delete-db-entries.
Before deletion actions, prompts the user unless overridden.
Optionally logs used memory during key steps if --log-memory.
By submitting this pull request, I confirm that I have read and fully agree to the CONTRIBUTOR_LICENSE_AGREEMENT, and I am providing my contributions under its terms.
</html>The intention is to reduce the size of the ChromaDB and PostSQL databases cleaning up orphaned records. Note: --delete-vectors is extremely resource intensive and should only be used as part of regular maintenance and not a make good.CHANGELOG ENTRY
Description
This Python script is designed to manage and clean up Postgres database entries, files, and vector store collections. It connects to a PostgreSQL database, examines stored chat and knowledge data, and optionally deletes unused files, database entries, and vector store collections to free up resources.
By default it does not delete\change anything.
Requirements \ Dependencies
Python 3.x
psycopg2
psycopg2.extras
chromadb
psutil
json
argparse
os
sys
concurrent.futures
USAGE
python postgres_chroma.cleanup.py [options]
Optional Arguments
Option Description Default Example
--chroma-path Path to the Chroma vector database directory. If not provided, defaults to script directory None /path/to/vector_db
-b, --batch-chats Number of chat entries to process per batch (adjust for performance/memory usage) 10 50
-l, --list-files List files marked for deletion without executing deletions False N/A
--delete-files Delete identified unused files from storage False N/A
--delete-db-entries Delete database entries (in file table) that are unused False N/A
--delete-vectors Delete vector store collections not associated with current data False N/A
--no-confirm Skip confirmation prompts before deletion actions False N/A
--log-memory Log memory usage at different steps in the script False N/A
Script Functionality Breakdown
Connects to PostgreSQL
Uses provided database URL to connect.
Reads file table for IDs.
Reads knowledge table to extract knowledge IDs.
Streams chat entries.
Extracts IDs
File IDs: All IDs from file table.
Knowledge IDs: Extracted from knowledge table JSON data.
Chat File IDs: Extracted from chat content by recursive search for file IDs.
Checks for conflicts
Ensures no overlapping IDs between knowledge and chat files (raises error if found).
Determines files to delete
Finds file IDs that are not associated with current knowledge or chat data.
Reads files from the uploads directory, identifies files by prefix before underscore _.
Lists which files are safe to delete.
Works with Chroma vector store
Lists existing vector collections.
Identifies collections that are not associated with current data.
Optionally deletes these collections.
Deletes files and database entries
Files: Deletes files from storage if --delete-files.
Vector store collections: Deletes collections if --delete-vectors.
Database Entries: Deletes records from file table if --delete-db-entries.
Provides confirmation prompts (unless --no-confirm)
Before deletion actions, prompts the user unless overridden.
Logs memory usage
Optionally logs used memory during key steps if --log-memory.
By submitting this pull request, I confirm that I have read and fully agree to the CONTRIBUTOR_LICENSE_AGREEMENT, and I am providing my contributions under its terms.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.