mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #21821] issue: Qdrant points payload text "eating" spaces #35110
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Atrocraz on GitHub (Feb 24, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/21821
Check Existing Issues
Installation Method
Docker
Open WebUI Version
0.8.5
Ollama Version (if applicable)
No response
Operating System
Ubuntu 22.04
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
After loading file in knowledge base text both in payload and embeddings remains mostly untouched and readable.
Actual Behavior
For some reason text field in payload (and i suppose in embeddings too) loses some spaces.
Steps to Reproduce
Load .docx or .pdf (didn't test other formats) in knowledge via OWUI GUI.
Logs & Screenshots
Qdrant point:

Docx file with the same chunk:

OWUI knowledge file:
Additional Information
I tried with nomic-embed v1.5 and v2.0, but i doubt it's an embedding model problem, since payload contains corrupted text.
At the same time same text in knowledge GUI remains unchanged.
@Atrocraz commented on GitHub (Feb 24, 2026):
Well, i guess the thing is, that for whatever reason even plain text Word files are being processed via OCR\tika\etc., which makes initial text severely worse and hurts BM25 a lot.
Switching to .txt files fixes the problem, however i wouldn't say it's a solution long term, because users rarely will upload .txt files.
Is there any other way to make this problem better?