mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #22319] feat: table-aware RAG ingestion for CSV, TSV, and Excel files #58363
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @salim4n on GitHub (Mar 6, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22319
Check Existing Issues
Verify Feature Scope
Problem Description
When uploading CSV, TSV, or Excel files, the current RAG pipeline uses
CSVLoaderandUnstructuredExcelLoaderwhich split tabular data as plain text. This causes:Desired Solution you'd like
Replace the default loaders with table-aware alternatives that:
Configurable via
TABLE_ROWS_PER_CHUNKenv var (default: 1 row per chunk for precise retrieval).Scope
backend/open_webui/retrieval/loaders/table.py— newTableAwareCSVLoader,TableAwareExcelLoaderbackend/open_webui/retrieval/loaders/main.py— routing CSV/TSV/Excel to new loadersbackend/open_webui/routers/retrieval.py— bypass text splitting for table docsbackend/open_webui/config.py— newTABLE_ROWS_PER_CHUNKconfig entryExample
Before (CSVLoader):
Jean;Dupont;Par
is;30
Marie;Martin;Ly
on;25
After (TableAwareCSVLoader):
Columns: prenom | nom | ville | age
Row 0: Jean | Dupont | Paris | 30
Tests
24 unit tests covering: delimiter detection (comma, semicolon, tab, pipe), chunking, metadata, encoding fallback (latin-1), empty
files, multi-sheet Excel, invalid file handling.
I have a working implementation ready to submit as a PR if this is welcome.
Alternatives Considered
No response
Additional Context
No response