mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[PR #22320] [CLOSED] feat: table-aware RAG ingestion for CSV, TSV, and Excel files #65467
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/22320
Author: @salim4n
Created: 3/6/2026
Status: ❌ Closed
Base:
main← Head:feat/table-aware-rag-ingestion📝 Commits (3)
356aa37feat: table-aware RAG ingestion for CSV, TSV, and Excel filesb720726test: add unit tests for table-aware CSV and Excel loaders3d72a8drefactor: change TABLE_ROWS_PER_CHUNK default from 5 to 1📊 Changes
7 files changed (+626 additions, -54 deletions)
View changed files
📝
backend/open_webui/config.py(+6 -0)📝
backend/open_webui/retrieval/loaders/main.py(+35 -5)➕
backend/open_webui/retrieval/loaders/table.py(+213 -0)📝
backend/open_webui/routers/retrieval.py(+70 -49)➕
backend/open_webui/test/retrieval/__init__.py(+0 -0)➕
backend/open_webui/test/retrieval/loaders/__init__.py(+0 -0)➕
backend/open_webui/test/retrieval/loaders/test_table.py(+302 -0)📄 Description
Summary
Closes discussion #22319
Replace
CSVLoaderandUnstructuredExcelLoaderwith custom table-aware loaders that preserve row integrity for better RAG retrieval on tabular data.TABLE_ROWS_PER_CHUNKenv var / config (default: 1)Changed files
backend/open_webui/retrieval/loaders/table.py— newTableAwareCSVLoader,TableAwareExcelLoaderbackend/open_webui/retrieval/loaders/main.py— route CSV/TSV/Excel to new loadersbackend/open_webui/routers/retrieval.py— bypass text splitting forfile_type="table"docsbackend/open_webui/config.py— addTABLE_ROWS_PER_CHUNKTest plan
pytest)TABLE_ROWS_PER_CHUNKconfig is respected🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.