[PR #21931] [MERGED] feat: add MariaDB Vector backend #41974

Closed
opened 2026-04-25 14:02:41 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/21931
Author: @seulement55
Created: 2/27/2026
Status: Merged
Merged: 3/8/2026
Merged by: @tjbck

Base: devHead: Add-support-mariadb-vector-capabilities


📝 Commits (1)

  • a67bb72 Add support for mariadb-vector as backing vector DB

📊 Changes

7 files changed (+653 additions, -0 deletions)

View changed files

📝 Dockerfile (+1 -0)
📝 backend/open_webui/config.py (+72 -0)
backend/open_webui/retrieval/vector/dbs/mariadb_vector.py (+570 -0)
📝 backend/open_webui/retrieval/vector/factory.py (+4 -0)
📝 backend/open_webui/retrieval/vector/type.py (+1 -0)
📝 backend/requirements.txt (+1 -0)
📝 pyproject.toml (+4 -0)

📄 Description

Pull Request Checklist

  • Target branch: Verify that the pull request targets the dev branch. PRs targeting main will be immediately closed.
  • Description: Provide a concise description of the changes made in this pull request down below.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Add docs in Open WebUI Docs Repository. Document user-facing behavior, environment variables, public APIs/interfaces, or deployment steps.
  • Dependencies: Are there any new or upgraded dependencies? If so, explain why, update the changelog/docs, and include any compatibility notes. Actually run the code/function that uses updated library to ensure it doesn't crash.
  • Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Include reproducible steps to demonstrate the issue before the fix. Test edge cases (URL encoding, HTML entities, types). Take this as an opportunity to make screenshots of the feature/fix and include them in the PR description.
  • Agentic AI Code: Confirm this Pull Request is not written by any AI Agent or has at least gone through additional human review AND manual testing. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR.
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Design & Architecture: Prefer smart defaults over adding new settings; use local state for ephemeral UI logic. Open a Discussion for major architectural or UX changes.
  • Git Hygiene: Keep PRs atomic (one logical change). Clean up commits and rebase on dev to ensure no unrelated commits (e.g. from main) are included. Push updates to the existing PR branch instead of closing and reopening.
  • Title Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • feat: Introduces a new feature or enhancement to the codebase

Description

This PR adds MariaDB Vector as a supported vector database backend in Open-WebUI, enabling VECTOR_DB=mariadb-vector deployments. The implementation uses the official MariaDB connector (mariadb+mariadbconnector://...) to ensure correct parameter binding of VECTOR(n) values (float32 binary payload), and provides configurable index and pooling parameters.

This work addresses the request to support MariaDB as a vector database backend (see discussion https://github.com/open-webui/open-webui/discussions/21363).

Key changes

Adds a new mariadb-vector vector database backend, including configuration/env support and factory wiring, plus a MariaDB Vector client implementation that initializes the required schema and performs insert/upsert/search operations using the official MariaDB connector for correct VECTOR(n) float32 binding. Updates container and Python dependencies to include MariaDB driver support, and exposes tuning knobs for vector length, distance strategy, HNSW M, and connection pooling.

Schema and compatibility notes

  • MariaDB Vector imposes constraints on primary keys used with VECTOR INDEX. This backend uses id VARCHAR(254) CHARACTER SET ascii COLLATE ascii_bin PRIMARY KEY to ensure the PK stays under the 256-byte limit and comparisons are stable.
  • The backend checks the existing DDL for VECTOR(n) mismatch and fails fast if the configured dimension differs from the stored column dimension.

Dependencies

New dependency added:

  • mariadb==1.1.14 (Python driver), plus libmariadb-dev in the container image.

Rationale:

  • MariaDB Vector requires correct binding of VECTOR(n) values as float32 binary payloads; the official MariaDB connector provides the expected driver behavior and qmark paramstyle.

Compatibility notes:

  • MARIADB_VECTOR_DB_URL must use mariadb+mariadbconnector://... (official driver).
  • MariaDB server must support VECTOR and VECTOR INDEX features.

Testing

Manual test (repro steps)

1) Build image

docker build -f Dockerfile -t open-webui:0.9.0-dev .

2) Start Open-WebUI (Postgres main DB + MariaDB Vector DB)

docker run -d \
  --network host \
  -e OLLAMA_BASE_URL="http://ollama-host.com:11434" \
  -e AIOHTTP_CLIENT_TIMEOUT=3600 \
  \
  -e ENABLE_PERSISTENT_CONFIG=False \
  -e DEFAULT_MODEL_PARAMS='{"num_ctx":32768,"temperature":0.1,"top_p":0.9}' \
  \
  -e RAG_EMBEDDING_ENGINE=ollama \
  -e RAG_OLLAMA_BASE_URL="http://ollama-host.com:11434" \
  -e RAG_EMBEDDING_MODEL=nomic-embed-text \
  -e RAG_EMBEDDING_BATCH_SIZE=64 \
  -e RAG_TEXT_SPLITTER=token \
  -e RAG_SYSTEM_CONTEXT=True \
  -e RAG_TOP_K=10 \
  -e RAG_TOP_K_RERANKER=10 \
  \
  -e CHUNK_SIZE=1024 \
  -e CHUNK_OVERLAP=128 \
  -e CHUNK_MIN_SIZE_TARGET=640 \
  -e ENABLE_ASYNC_EMBEDDING=true \
  -e RAG_EMBEDDING_CONCURRENT_REQUESTS=8 \
  -e ENABLE_RAG_HYBRID_SEARCH=true \
  \
  -e DATABASE_URL="postgres://app:app@127.0.0.1:5432/openwebui" \
  -e VECTOR_DB="mariadb-vector" \
  -e MARIADB_VECTOR_DB_URL="mariadb+mariadbconnector://app:app@127.0.0.1:3306/openwebui" \
  -e MARIADB_VECTOR_INITIALIZE_MAX_VECTOR_LENGTH=768 \
  -e MARIADB_VECTOR_DISTANCE_STRATEGY=cosine \
  -e MARIADB_VECTOR_INDEX_M=12 \
  \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  open-webui:0.9.0-dev

3) Configure RAG in the UI
Follow the official RAG tutorial: https://docs.openwebui.com/tutorials/tips/rag-tutorial/#setup

4) Validate ingestion + retrieval

  • Upload Open-WebUI docs as a Knowledge Base
  • Run RAG queries and verify that sources are returned and answers reference the correct KB chunks

Observed results

When comparing similarity search results between mariadb-vector and pgvector using Open-WebUI documentation as the knowledge base and llama3.1:8b as the base model, retrieval outcomes were similar in most cases, with no major differences observed.

Documentation

Screenshots

  • Knowledge Base ingestion
    image

  • Example RAG answers with sources.
    image


Changelog Entry

Description

  • Add MariaDB Vector as a supported vector database backend for RAG similarity search, allowing Open-WebUI deployments to store and query embeddings using MariaDB’s VECTOR and VECTOR INDEX features.

Added

  • Added mariadb-vector as a vector database backend for RAG retrieval, including new MARIADB_VECTOR_* configuration options and a MariaDB Vector client implementation for insert/upsert/search. Requires the official MariaDB connector scheme (mariadb+mariadbconnector://) for correct VECTOR(n) binding.

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/21931 **Author:** [@seulement55](https://github.com/seulement55) **Created:** 2/27/2026 **Status:** ✅ Merged **Merged:** 3/8/2026 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `Add-support-mariadb-vector-capabilities` --- ### 📝 Commits (1) - [`a67bb72`](https://github.com/open-webui/open-webui/commit/a67bb7234a05d3e4e896b8be67a314f09965ab3f) Add support for mariadb-vector as backing vector DB ### 📊 Changes **7 files changed** (+653 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `Dockerfile` (+1 -0) 📝 `backend/open_webui/config.py` (+72 -0) ➕ `backend/open_webui/retrieval/vector/dbs/mariadb_vector.py` (+570 -0) 📝 `backend/open_webui/retrieval/vector/factory.py` (+4 -0) 📝 `backend/open_webui/retrieval/vector/type.py` (+1 -0) 📝 `backend/requirements.txt` (+1 -0) 📝 `pyproject.toml` (+4 -0) </details> ### 📄 Description <!-- ⚠️ CRITICAL CHECKS FOR CONTRIBUTORS (READ, DON'T DELETE) ⚠️ 1. Target the `dev` branch. PRs targeting `main` will be automatically closed. 2. Do NOT delete the CLA section at the bottom. It is required for the bot to accept your PR. --> # Pull Request Checklist - [x] **Target branch:** Verify that the pull request targets the `dev` branch. **PRs targeting `main` will be immediately closed.** - [x] **Description:** Provide a concise description of the changes made in this pull request down below. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [x] **Documentation:** Add docs in [Open WebUI Docs Repository](https://github.com/open-webui/docs). Document user-facing behavior, environment variables, public APIs/interfaces, or deployment steps. - [x] **Dependencies:** Are there any new or upgraded dependencies? If so, explain why, update the changelog/docs, and include any compatibility notes. Actually run the code/function that uses updated library to ensure it doesn't crash. - [x] **Testing:** Perform manual tests to **verify the implemented fix/feature works as intended AND does not break any other functionality**. Include reproducible steps to demonstrate the issue before the fix. Test edge cases (URL encoding, HTML entities, types). Take this as an opportunity to **make screenshots of the feature/fix and include them in the PR description**. - [x] **Agentic AI Code:** Confirm this Pull Request is **not written by any AI Agent** or has at least **gone through additional human review AND manual testing**. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR. - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Design & Architecture:** Prefer smart defaults over adding new settings; use local state for ephemeral UI logic. Open a Discussion for major architectural or UX changes. - [x] **Git Hygiene:** Keep PRs atomic (one logical change). Clean up commits and rebase on `dev` to ensure no unrelated commits (e.g. from `main`) are included. Push updates to the existing PR branch instead of closing and reopening. - [x] **Title Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **feat**: Introduces a new feature or enhancement to the codebase --- # Description This PR adds **MariaDB Vector** as a supported vector database backend in Open-WebUI, enabling `VECTOR_DB=mariadb-vector` deployments. The implementation uses the **official MariaDB connector** (`mariadb+mariadbconnector://...`) to ensure correct parameter binding of `VECTOR(n)` values (float32 binary payload), and provides configurable index and pooling parameters. This work addresses the request to support MariaDB as a vector database backend (see discussion [https://github.com/open-webui/open-webui/discussions/21363](https://github.com/open-webui/open-webui/discussions/21363)). ## Key changes Adds a new `mariadb-vector` vector database backend, including configuration/env support and factory wiring, plus a MariaDB Vector client implementation that initializes the required schema and performs insert/upsert/search operations using the official MariaDB connector for correct `VECTOR(n)` float32 binding. Updates container and Python dependencies to include MariaDB driver support, and exposes tuning knobs for vector length, distance strategy, HNSW `M`, and connection pooling. ## Schema and compatibility notes * MariaDB Vector imposes constraints on primary keys used with `VECTOR INDEX`. This backend uses `id VARCHAR(254) CHARACTER SET ascii COLLATE ascii_bin PRIMARY KEY` to ensure the PK stays under the 256-byte limit and comparisons are stable. * The backend checks the existing DDL for `VECTOR(n)` mismatch and fails fast if the configured dimension differs from the stored column dimension. ## Dependencies New dependency added: * `mariadb==1.1.14` (Python driver), plus `libmariadb-dev` in the container image. Rationale: * MariaDB Vector requires correct binding of `VECTOR(n)` values as float32 binary payloads; the official MariaDB connector provides the expected driver behavior and qmark paramstyle. Compatibility notes: * `MARIADB_VECTOR_DB_URL` **must** use `mariadb+mariadbconnector://...` (official driver). * MariaDB server must support `VECTOR` and `VECTOR INDEX` features. ## Testing ### Manual test (repro steps) **1) Build image** ```bash docker build -f Dockerfile -t open-webui:0.9.0-dev . ``` **2) Start Open-WebUI (Postgres main DB + MariaDB Vector DB)** ```bash docker run -d \ --network host \ -e OLLAMA_BASE_URL="http://ollama-host.com:11434" \ -e AIOHTTP_CLIENT_TIMEOUT=3600 \ \ -e ENABLE_PERSISTENT_CONFIG=False \ -e DEFAULT_MODEL_PARAMS='{"num_ctx":32768,"temperature":0.1,"top_p":0.9}' \ \ -e RAG_EMBEDDING_ENGINE=ollama \ -e RAG_OLLAMA_BASE_URL="http://ollama-host.com:11434" \ -e RAG_EMBEDDING_MODEL=nomic-embed-text \ -e RAG_EMBEDDING_BATCH_SIZE=64 \ -e RAG_TEXT_SPLITTER=token \ -e RAG_SYSTEM_CONTEXT=True \ -e RAG_TOP_K=10 \ -e RAG_TOP_K_RERANKER=10 \ \ -e CHUNK_SIZE=1024 \ -e CHUNK_OVERLAP=128 \ -e CHUNK_MIN_SIZE_TARGET=640 \ -e ENABLE_ASYNC_EMBEDDING=true \ -e RAG_EMBEDDING_CONCURRENT_REQUESTS=8 \ -e ENABLE_RAG_HYBRID_SEARCH=true \ \ -e DATABASE_URL="postgres://app:app@127.0.0.1:5432/openwebui" \ -e VECTOR_DB="mariadb-vector" \ -e MARIADB_VECTOR_DB_URL="mariadb+mariadbconnector://app:app@127.0.0.1:3306/openwebui" \ -e MARIADB_VECTOR_INITIALIZE_MAX_VECTOR_LENGTH=768 \ -e MARIADB_VECTOR_DISTANCE_STRATEGY=cosine \ -e MARIADB_VECTOR_INDEX_M=12 \ \ -v open-webui:/app/backend/data \ --name open-webui \ --restart always \ open-webui:0.9.0-dev ``` **3) Configure RAG in the UI** Follow the official RAG tutorial: [https://docs.openwebui.com/tutorials/tips/rag-tutorial/#setup](https://docs.openwebui.com/tutorials/tips/rag-tutorial/#setup) **4) Validate ingestion + retrieval** * Upload Open-WebUI docs as a Knowledge Base * Run RAG queries and verify that sources are returned and answers reference the correct KB chunks ### Observed results When comparing similarity search results between `mariadb-vector` and `pgvector` using Open-WebUI documentation as the knowledge base and `llama3.1:8b` as the base model, retrieval outcomes were similar in most cases, with no major differences observed. ## Documentation * [x] I will add documentation to [https://github.com/open-webui/docs](https://github.com/open-webui/docs) covering: https://github.com/open-webui/docs/pull/1125 * `VECTOR_DB=mariadb-vector` setup * `MARIADB_VECTOR_*` environment variables * required driver scheme: `mariadb+mariadbconnector://...` ## Screenshots * [x] Knowledge Base ingestion <img width="1851" height="631" alt="image" src="https://github.com/user-attachments/assets/df5a5352-3827-444e-b554-3f0321e9bcae" /> * [x] Example RAG answers with sources. <img width="1486" height="924" alt="image" src="https://github.com/user-attachments/assets/58c5cdeb-231f-4c71-9578-9cd98e262032" /> --- # Changelog Entry ### Description * Add MariaDB Vector as a supported vector database backend for RAG similarity search, allowing Open-WebUI deployments to store and query embeddings using MariaDB’s `VECTOR` and `VECTOR INDEX` features. ### Added * Added `mariadb-vector` as a vector database backend for RAG retrieval, including new `MARIADB_VECTOR_*` configuration options and a MariaDB Vector client implementation for insert/upsert/search. Requires the official MariaDB connector scheme (`mariadb+mariadbconnector://`) for correct `VECTOR(n)` binding. --- ### Contributor License Agreement <!-- 🚨 DO NOT DELETE THE TEXT BELOW 🚨 Keep the "Contributor License Agreement" confirmation text intact. Deleting it will trigger the CLA-Bot to INVALIDATE your PR. --> By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 14:02:41 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#41974