issue: Migrating from <= v0.6.27 to v0.628 with existing internal ChromaDB Vector Databse #6390

Closed
opened 2025-11-11 16:53:35 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @runyournode on GitHub (Sep 12, 2025).

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.28

Ollama Version (if applicable)

n/a

Operating System

Ubuntu 22.04

Browser (if applicable)

Chrome

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

ChromaDB was quite silently upgraded from 0.6.3 to 1.0.20 in latest owui release (v0.6.28)

When updating owui with an already existing default vector database (chromaDB), we should have a migration script to allow a seamless transition

Actual Behavior

ChromaDB Vector Database is not compatible with the new version.
Any RAG chat will fail.
You cannot force a re-index of the Chroma DB (has the existing one will throw errors)

Steps to Reproduce

Simply update your OWUI to the last version with an existing KB, and try a RAG model.
You can see the error in the owui logs.

Logs & Screenshots

You will most likely have an error like this :

chromadb.errors.InternalError: Error executing plan: Error sending backfill request to compactor: Error reading from metadata segment reader: error occurred while decoding column 0: mismatched types; Rust type u64 (as SQL type INTEGER) is not compatible with SQL type BLOB

Additional Information

Here is a draft of a migration script using the owui API.

Feel free to use it (at your own risk), or maybe improve it :-)
Eventually, it would be nice if the update could be done automatically when upgrading owui.

⚠️ You should at least update this script to save the memories and kb states before reseting them
⚠️ You will need to edit your RAG models as the new kb will have new ids (edit / delete the associated kbs / add them again)
ℹ️ It is not an ideal script as we are reprocessing all the embeddings.

Also, fyi I think that each KB will now create a single collection in the chromadb (in previous version, each file would produce a single collection)

"""
Migration Script for Chroma:
 - get the initial kb/memories population
 - fully reset any kb/memories/VectorDB
 - populate back the memories (which will automatically populate the associated VectorDB collection)
 - create the new kbs and populate them back with the initial files (which will automatically populate the associated VectorDB collections)

It will takes some times as the embeddings are processed again
No backup is implemented, you might want to save kb_list and memory_list in persistent files if the process fail at any points

Tested on migration from owui 0.6.26 (chromaDB==0.6.3) to owui 0.6.28 (chromaDb==1.0.20)

"""

import requests

OWUI_URL = 'http://localhost:8080'
OWUI_API_KEY = 'sk-hexkey'

headers = {
    'Authorization': f'Bearer {OWUI_API_KEY}',
    'Content-Type': 'application/json'
}


# Get the current state of kbs
kb_list = requests.get(
    url=f'{OWUI_URL}/api/v1/knowledge/',
    headers=headers,
).json()

# WARNING
# You might want to save the kb_list now in a persistent file as next steps will delete the kbs from owui
# WARNING

# Get the memories
memory_list = requests.get(
    url=f'{OWUI_URL}/api/v1/memories/',
    headers=headers,
).json()

# WARNING
# You might want to save the memory_list now in a persistent file as next steps will delete the memories from owui
# WARNING


# Delete the VectorDB (and most probably all the kbs as well)
j = requests.post(
    url=f'{OWUI_URL}/api/v1/retrieval/reset/db',
    headers=headers
).json()

# Delete the memories (from memory and VectorDB)
j = requests.post(
    url=f'{OWUI_URL}/api/v1/memories/reset',
    headers=headers
).json()

for memory in memory_list:
    j = requests.delete(
        url=f'{OWUI_URL}/api/v1/memories/{memory.get('id')}',
        headers=headers
    ).json()


# Re-populate memories, embeddings will be processed un populate the VectorDB internally
for memory in memory_list:
    requests.post(
        url=f'{OWUI_URL}/api/v1/memories/add',
        headers=headers,
        json={'content': memory.get('content')}
    )


# Recreate each kb and populate it with docs
for kb in kb_list:
    # Create a new empty kb
    kb_new = requests.post(
        url=f'{OWUI_URL}/api/v1/knowledge/create',
        headers=headers,
        json={
            "name": kb.get("name"),
            "description": kb.get("description"),

        }
    ).json()

    # Get the file_id we should add
    file_ids = []
    for file in kb.get('files'):
        file_ids.append({"file_id": file.get('id')})
    # Re-populate the kb, embeddings will be processed un populate the VectorDB internally
    j = requests.post(
        url=f'{OWUI_URL}/api/v1/knowledge/{kb_new.get("id")}/files/batch/add',
        headers=headers,
        json=file_ids
    )

print('Done')


Originally created by @runyournode on GitHub (Sep 12, 2025). ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.28 ### Ollama Version (if applicable) n/a ### Operating System Ubuntu 22.04 ### Browser (if applicable) Chrome ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior ChromaDB was quite silently upgraded from 0.6.3 to 1.0.20 in latest owui release (v0.6.28) When updating owui with an already existing default vector database (chromaDB), we should have a migration script to allow a seamless transition ### Actual Behavior ChromaDB Vector Database is not compatible with the new version. Any RAG chat will fail. You cannot force a re-index of the Chroma DB (has the existing one will throw errors) ### Steps to Reproduce Simply update your OWUI to the last version with an existing KB, and try a RAG model. You can see the error in the owui logs. ### Logs & Screenshots You will most likely have an error like this : ``` chromadb.errors.InternalError: Error executing plan: Error sending backfill request to compactor: Error reading from metadata segment reader: error occurred while decoding column 0: mismatched types; Rust type u64 (as SQL type INTEGER) is not compatible with SQL type BLOB ``` ### Additional Information Here is a draft of a migration script using the owui API. Feel free to use it (at your own risk), or maybe improve it :-) Eventually, it would be nice if the update could be done automatically when upgrading owui. :warning: You should at least update this script to save the memories and kb states before reseting them ⚠️ You will need to edit your RAG models as the new kb will have new ids (edit / delete the associated kbs / add them again) ℹ️ It is not an ideal script as we are reprocessing all the embeddings. Also, fyi I think that each KB will now create a single collection in the chromadb (in previous version, each file would produce a single collection) ```python """ Migration Script for Chroma: - get the initial kb/memories population - fully reset any kb/memories/VectorDB - populate back the memories (which will automatically populate the associated VectorDB collection) - create the new kbs and populate them back with the initial files (which will automatically populate the associated VectorDB collections) It will takes some times as the embeddings are processed again No backup is implemented, you might want to save kb_list and memory_list in persistent files if the process fail at any points Tested on migration from owui 0.6.26 (chromaDB==0.6.3) to owui 0.6.28 (chromaDb==1.0.20) """ import requests OWUI_URL = 'http://localhost:8080' OWUI_API_KEY = 'sk-hexkey' headers = { 'Authorization': f'Bearer {OWUI_API_KEY}', 'Content-Type': 'application/json' } # Get the current state of kbs kb_list = requests.get( url=f'{OWUI_URL}/api/v1/knowledge/', headers=headers, ).json() # WARNING # You might want to save the kb_list now in a persistent file as next steps will delete the kbs from owui # WARNING # Get the memories memory_list = requests.get( url=f'{OWUI_URL}/api/v1/memories/', headers=headers, ).json() # WARNING # You might want to save the memory_list now in a persistent file as next steps will delete the memories from owui # WARNING # Delete the VectorDB (and most probably all the kbs as well) j = requests.post( url=f'{OWUI_URL}/api/v1/retrieval/reset/db', headers=headers ).json() # Delete the memories (from memory and VectorDB) j = requests.post( url=f'{OWUI_URL}/api/v1/memories/reset', headers=headers ).json() for memory in memory_list: j = requests.delete( url=f'{OWUI_URL}/api/v1/memories/{memory.get('id')}', headers=headers ).json() # Re-populate memories, embeddings will be processed un populate the VectorDB internally for memory in memory_list: requests.post( url=f'{OWUI_URL}/api/v1/memories/add', headers=headers, json={'content': memory.get('content')} ) # Recreate each kb and populate it with docs for kb in kb_list: # Create a new empty kb kb_new = requests.post( url=f'{OWUI_URL}/api/v1/knowledge/create', headers=headers, json={ "name": kb.get("name"), "description": kb.get("description"), } ).json() # Get the file_id we should add file_ids = [] for file in kb.get('files'): file_ids.append({"file_id": file.get('id')}) # Re-populate the kb, embeddings will be processed un populate the VectorDB internally j = requests.post( url=f'{OWUI_URL}/api/v1/knowledge/{kb_new.get("id")}/files/batch/add', headers=headers, json=file_ids ) print('Done') ```
GiteaMirror added the bug label 2025-11-11 16:53:35 -06:00
Author
Owner

@rgaricano commented on GitHub (Sep 13, 2025):

Related: https://github.com/chroma-core/chroma/issues/4217#issuecomment-2913976410


@runyournode: that isn't a migration script, is a reset/reindex db script.

The error suggests that the metadata storage format changed between versions, but (as I know) the metadata when creating chromadb collection allways was {"hnsw:space": "cosine"}.

I think it's a particular issue, by the way, I leave a conversion script to resolve possible schema incompatibilities between the old Python-based storage format and the new Rust-based storage format.

chromadm_conv_script.py
Use:

# Run the conversion  
python convert_schema.py --old-dir ./old_chroma_data --new-dir ./new_chroma_data 
@rgaricano commented on GitHub (Sep 13, 2025): Related: https://github.com/chroma-core/chroma/issues/4217#issuecomment-2913976410 ___ @runyournode: that isn't a migration script, is a reset/reindex db script. The error suggests that the metadata storage format changed between versions, but (as I know) the metadata when creating chromadb collection allways was {"hnsw:space": "cosine"}. I think it's a particular issue, by the way, I leave a conversion script to resolve possible schema incompatibilities between the old Python-based storage format and the new Rust-based storage format. [chromadm_conv_script.py](https://github.com/user-attachments/files/22310315/chromadm_conv_script.py) Use: ``` # Run the conversion python convert_schema.py --old-dir ./old_chroma_data --new-dir ./new_chroma_data ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#6390