mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #20558] issue: RAG Knowledge file when modify/update, older data is available #34752
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @n4gY1 on GitHub (Jan 10, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20558
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
v0.7.1
Ollama Version (if applicable)
No response
Operating System
debian
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
When I modify the text knowledge files, I thought the old data in the database would be overwritten. But it seems the old data remains in the database.
Actual Behavior
I created a knowledge file called Webshop.txt, where I just described how to access the webshop. Originally I wrote it like this: http://192.168.1.10:8080. Then after a while I changed the port to: http://192.168.1.10:8090
There are two items in the database after this.
select * from mbedding_fulltext_search_content;
122|az auto webshop elérése a http://192.168.1.10:8080 címen érhető el. ----> ORIGINAL DATA
123|az auto webshop elérése a http://192.168.1.10:8090 címen érhető el. ----> NEW, modified data
The problem is that it returns both results (RAG) when searching. Which is not good :(
webshop.txt
Content
87.52%
az auto webshop elérése a http://192.168.1.10:8090 címen érhető el.
Content
87.29%
az auto webshop elérése a http://192.168.1.10:8080 címen érhető el.
Steps to Reproduce
Upload knowledge file.
After modify knowledge file, update correct information or changed data.
RAG search returns both old data and new updated data.
Logs & Screenshots
select * from embedding_fulltext_search_content;
122|az auto webshop elérése a http://192.168.1.10:8080 címen érhető el. ----> ORIGINAL DATA
123|az auto webshop elérése a http://192.168.1.10:8090 címen érhető el. ----> NEW, modified data
The problem is that it returns both results (RAG) when searching. Which is not good :(
webshop.txt
Content
87.52%
az auto webshop elérése a http://192.168.1.10:8090 címen érhető el.
Content
87.29%
az auto webshop elérése a http://192.168.1.10:8080 címen érhető el.
Additional Information
It would be good if we modified the knowledge content, then there would no longer be the old data, only the new.
@owui-terminator[bot] commented on GitHub (Jan 10, 2026):
🔍 Similar Issues Found
I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:
#20033 issue: RAG not working when file uploaded through chat
by abhishek-paradkar-sndk • Dec 18, 2025 •
bug#20236 issue: Select knowledge within Folders
by liucoj • Dec 29, 2025 •
bug#19698 issue: .41 web based search and webpages - RAG - are not fixed
by frenzybiscuit • Dec 02, 2025 •
bug#19281 issue: RAG Template applied with "Bypass Embedding and Retrieval" enabled
by lucyknada • Nov 19, 2025 •
bug#19098 issue: Prompt & context duplication when RAG template is used
by matiboux • Nov 11, 2025 •
bugShow 4 more related issues
#19752 issue: minor UI Bug: knowledge sharing
by mahenning • Dec 04, 2025 •
bug,confirmed issue#14463 issue: regression on v0.6.12 with RAG
by bb-chris • May 29, 2025 •
bug#17733 issue: RAG queries still generated even if all files are in full context mode
by Classic298 • Sep 25, 2025 •
bug#19701 issue: knowledge can not multiple upload file
by willy808 • Dec 03, 2025 •
bug💡 Tips:
This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
@silentoplayz commented on GitHub (Jan 10, 2026):
Are you certain you didn't create a second file containing the new content of Webshop.txt? To me, it sounds like this is most likely the case. It does not sound like you have edited a file already within a knowledgebase and saved the contents of the updated file.
@n4gY1 commented on GitHub (Jan 11, 2026):
Yes, I'm sure. Actually, it was just this one knowledge file, that's what I edited.
That's also strange, although since it's a test and not a production environment, I deleted all the knowledge base files on the web interface. But they are still there in the "embedding_fulltext_search_content" table.
I will create a knowledge base file under the "BAC_IT" collections
I'm adding a first_knowledge.txt file with the following content: "webshop url: http://192.168.1.1.1:8080". It's also strange that this is added to the "embedding_fulltext_search_content" database with ids 2 and 3, so it's a duplicate.
sqlite> select * from embedding_fulltext_search_content;
1|BAC_IT
bács rendszergazdák
2|webshop url : http://192.168.1.1:8080
3|webshop url : http://192.168.1.1:8080
After I edit this knowledge base file so that port is 8080 -> 8090, the 2 id disappears but the 3 id remains 8080 in that table.
sqlite> select * from embedding_fulltext_search_content;
1|BAC_IT
bács rendszergazdák
3|webshop url : http://192.168.1.1:8080
I ask the AI model what the webshop URL is, and it gives me the address http://192.168.1.1:8080
Ai answers (gemma7b)
what is webshop urls?
profile
gemma3:4b
The webshop URL is http://192.168.1.1:8080/
first_knowledge.txt
.
I open citation "first_knowledge.txt", i see it says "http://192.168.1.1_8080"
But if I go to the collections, BAC_IT, first_knowledge.txt knowledge file, I see "http://192.168.1.1:8090"
8080 != 8090
Something is wrong with the knowledge base if something changes somewhere.
@0xPatryk commented on GitHub (Jan 12, 2026):
I have the same issue
For context, I uploaded the knowledge base files as Markdown. Due to Docling Setting, all of the images were returned in the knowledge files in base64-encoded format. I've stripped the images and edited the knowledge base files, but the images are still returned in the knowledge base response.
@mraaz97 commented on GitHub (Jan 20, 2026):
I can confirm this issue on my end. In the preview it is showing the same content with different information twice:
@n4gY1 commented on GitHub (Jan 30, 2026):
When reindex database:
It is normally?
2026-01-30 16:51:02.358 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f6393868ae0>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function WorkerThread.run at 0x7f6311bfb740>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
result = context.run(func, *args)
│ │ │ └ ()
│ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi...
│ └ <method 'run' of '_contextvars.Context' objects>
└ <_contextvars.Context object at 0x7f62e7f57880>
File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file
result = save_docs_to_vector_db(
└ <function save_docs_to_vector_db at 0x7f6314638680>
File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db
raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
│ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>
└ <enum 'ERROR_MESSAGES'>
ValueError: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.367 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file alias.txt (ID: 4f93ac5d-7048-405c-9d24-3e0e9e6ac9df): 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.368 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:329 - Failed to process 1 files in knowledge base b343177a-da14-4111-983d-db50b52297f5
2026-01-30 16:51:02.368 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 4f93ac5d-7048-405c-9d24-3e0e9e6ac9df, Error: 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.481 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1420 - Document with hash 72314fa705bd9704d86ffe80043739e2aca22a955a1fafb7b39515e78b378303 already exists
2026-01-30 16:51:02.482 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f6393868ae0>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function WorkerThread.run at 0x7f6311bfb740>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
result = context.run(func, *args)
│ │ │ └ ()
│ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi...
│ └ <method 'run' of '_contextvars.Context' objects>
└ <_contextvars.Context object at 0x7f630f98d600>
File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file
result = save_docs_to_vector_db(
└ <function save_docs_to_vector_db at 0x7f6314638680>
File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db
raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
│ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>
└ <enum 'ERROR_MESSAGES'>
ValueError: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.490 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file Robotzsaru debug.txt (ID: 559e6f60-66b8-4e1a-876a-173db627d09e): 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.581 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1420 - Document with hash d4ad5df7083fabb3e194781cd4050e62aa71dad09a87cfe574513f5f507e76b5 already exists
2026-01-30 16:51:02.582 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f6393868ae0>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function WorkerThread.run at 0x7f6311bfb740>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
result = context.run(func, *args)
│ │ │ └ ()
│ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi...
│ └ <method 'run' of '_contextvars.Context' objects>
└ <_contextvars.Context object at 0x7f62e7f05480>
File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file
result = save_docs_to_vector_db(
└ <function save_docs_to_vector_db at 0x7f6314638680>
File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db
raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
│ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>
└ <enum 'ERROR_MESSAGES'>
ValueError: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.768 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file first_knowledge.txt (ID: 90a1b3d0-5e81-4db1-b36f-3b59a29bfd85): 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.878 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1420 - Document with hash 156846654a149e9c432cdaf622733550a9d26f3efa83c4556451e4f17f14487e already exists
2026-01-30 16:51:02.879 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f6393868ae0>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function WorkerThread.run at 0x7f6311bfb740>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
result = context.run(func, *args)
│ │ │ └ ()
│ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi...
│ └ <method 'run' of '_contextvars.Context' objects>
└ <_contextvars.Context object at 0x7f630f9a8900>
File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file
result = save_docs_to_vector_db(
└ <function save_docs_to_vector_db at 0x7f6314638680>
File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db
raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
│ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>
└ <enum 'ERROR_MESSAGES'>
ValueError: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:03.077 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file ermi_ai.pptx (ID: 98b3f7d0-2fea-43d5-938b-22aa613ac6e1): 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:03.077 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:329 - Failed to process 3 files in knowledge base 6d2d5a1a-ff9f-42a7-a0c4-0f9ae1e9ae8a
2026-01-30 16:51:03.078 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 559e6f60-66b8-4e1a-876a-173db627d09e, Error: 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:03.078 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 90a1b3d0-5e81-4db1-b36f-3b59a29bfd85, Error: 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:03.078 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 98b3f7d0-2fea-43d5-938b-22aa613ac6e1, Error: 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:03.078 | INFO | open_webui.routers.knowledge:reindex_knowledge_files:335 - Reindexing completed.
2026-01-30 16:51:03.081 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.50:58253 - "POST /api/v1/knowledge/reindex HTTP/1.1" 200
2026-01-30 16:51:37.715 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.50:55437 - "GET /_app/version.json HTTP/1.1" 200
2026-01-30 16:53:24.813 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.50:64675 - "GET /_app/version.json HTTP/1.1" 200
2026-01-30 16:53:38.277 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.120:57219 - "GET /_app/version.json HTTP/1.1" 304
@n4gY1 commented on GitHub (Feb 15, 2026):
The bug is also found in the new version (v0.8.1)
Question:
Knowledge: "first_knowledge.txt" file, the current content that I modified: "webshop url : http://192.168.1.1:8080".
Original content: "webshop url : http://192.168.1.1:8090"
I re-indexed the knowledge base, a day passed and this response was received:
@Classic298 commented on GitHub (Feb 16, 2026):
might fix this issue - testing needed please
https://github.com/open-webui/open-webui/pull/21495
@Classic298 commented on GitHub (Feb 16, 2026):
"testing needed" is actually wrong
testing required.
I cannot test it as i struggle to reproduce
if you guys can test and confirm this here, then it can get merged more quickly (or rather at all)
@Classic298 commented on GitHub (Feb 16, 2026):
0.8.3 to release soon so time for testing would be good now
@Classic298 commented on GitHub (Feb 17, 2026):
anyone could test it yet?
@n4gY1 commented on GitHub (Feb 17, 2026):
https://youtu.be/UGoerVYpueA | youtube demonstration
new version (0.8.3) , create new knowledge. Than update knowledge. Reindex document database. Answers duplicated (oldest and newest data)
@mraaz97 commented on GitHub (Feb 19, 2026):
Hey @n4gY1 did you actually test the PR #21495 from @Classic298? It seems you tested on 0.8.3, where the fix was obviously not merged yet. Thanks for your effort.
@Classic298 commented on GitHub (Feb 19, 2026):
waiting on testers still. there are many reporters here but no one willing to test?
@mraaz97 commented on GitHub (Feb 19, 2026):
@Classic298 Have tested on my end and it is still not working for me. It just adds the new chunks but it's not deleting the old ones. For testing what I've done: Cloned your repo, checked out the fix branch and build docker image locally.
@tjbck commented on GitHub (Mar 8, 2026):
Should be addressed in dev, let us know if the issue persists!