issue: Metadata lists and objects are incorrectly serialized to strings #5880

Closed
opened 2025-11-11 16:36:49 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @christian-hawk on GitHub (Jul 27, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.6.18

Ollama Version (if applicable)

No response

Operating System

Ubuntu 22.04

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Python list and dict objects in metadata must be correctly serialized and stored as native JSON array and object types in the vector database. The str() conversion should be restricted to unsupported types like datetime.

A metadata field that is a Python list (e.g., {'headings': ['Header 1', 'Header 2']}) must be stored in the vector database as a native JSON array: {'headings': ['Header 1', 'Header 2']}. The conversion to a string should be restricted only to unsupported data types.

Actual Behavior

A metadata field that is a Python list (e.g., {'headings': ['Header 1', 'Header 2']}) is incorrectly stored in the vector database as a single string: {'headings': "['Header 1', 'Header 2']"}. This renders the metadata useless for structured filtering.

Steps to Reproduce

  1. Set TEXT_SPLITTER=markdown_header.
  2. Upload a Markdown file with headers.
  3. Using a vector database inspection tool, query any chunk from the resulting file-* collection.
  4. Observe

Logs & Screenshots

owui250726.log

Additional Information

Additional Finding

The root cause is probably a faulty loop around line 1245 in routers/retrieval.py that overzealously applies str() conversion to list and dict types.

    # ChromaDB does not like datetime formats
    # for meta-data so convert them to string.
    for metadata in metadatas:
        for key, value in metadata.items():
            if (
                isinstance(value, datetime)
                or isinstance(value, list)
                or isinstance(value, dict)
            ):
                metadata[key] = str(value)

Originally created by @christian-hawk on GitHub (Jul 27, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.6.18 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 22.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Python `list` and `dict` objects in metadata must be correctly serialized and stored as native JSON array and object types in the vector database. The `str()` conversion should be restricted to unsupported types like `datetime`. A metadata field that is a Python list (e.g., {'headings': ['Header 1', 'Header 2']}) must be stored in the vector database as a native JSON array: {'headings': ['Header 1', 'Header 2']}. The conversion to a string should be restricted only to unsupported data types. ### Actual Behavior A metadata field that is a Python list (e.g., {'headings': ['Header 1', 'Header 2']}) is incorrectly stored in the vector database as a single string: {'headings': "['Header 1', 'Header 2']"}. This renders the metadata useless for structured filtering. ### Steps to Reproduce 1. Set `TEXT_SPLITTER=markdown_header`. 2. Upload a Markdown file with headers. 3. Using a vector database inspection tool, query any chunk from the resulting `file-*` collection. 4. Observe ### Logs & Screenshots [owui250726.log](https://github.com/user-attachments/files/21450846/owui250726.log) ### Additional Information ### Additional Finding The root cause is probably a faulty loop around line 1245 in `routers/retrieval.py` that overzealously applies `str()` conversion to `list` and `dict` types. ```python # ChromaDB does not like datetime formats # for meta-data so convert them to string. for metadata in metadatas: for key, value in metadata.items(): if ( isinstance(value, datetime) or isinstance(value, list) or isinstance(value, dict) ): metadata[key] = str(value) ```
GiteaMirror added the bug label 2025-11-11 16:36:49 -06:00
Author
Owner

@tjbck commented on GitHub (Jul 28, 2025):

Intended behaviour, some vectorDB do not play well with nested dicts.

@tjbck commented on GitHub (Jul 28, 2025): Intended behaviour, some vectorDB do not play well with nested dicts.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#5880