[GH-ISSUE #20558] issue: RAG Knowledge file when modify/update, older data is available #34752

Closed
opened 2026-04-25 08:53:49 -05:00 by GiteaMirror · 16 comments
Owner

Originally created by @n4gY1 on GitHub (Jan 10, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20558

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.7.1

Ollama Version (if applicable)

No response

Operating System

debian

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When I modify the text knowledge files, I thought the old data in the database would be overwritten. But it seems the old data remains in the database.

Actual Behavior

I created a knowledge file called Webshop.txt, where I just described how to access the webshop. Originally I wrote it like this: http://192.168.1.10:8080. Then after a while I changed the port to: http://192.168.1.10:8090

There are two items in the database after this.

select * from mbedding_fulltext_search_content;
122|az auto webshop elérése a http://192.168.1.10:8080 címen érhető el. ----> ORIGINAL DATA
123|az auto webshop elérése a http://192.168.1.10:8090 címen érhető el. ----> NEW, modified data

The problem is that it returns both results (RAG) when searching. Which is not good :(


webshop.txt

Content
87.52%
az auto webshop elérése a http://192.168.1.10:8090 címen érhető el.
Content
87.29%
az auto webshop elérése a http://192.168.1.10:8080 címen érhető el.

Steps to Reproduce

Upload knowledge file.
After modify knowledge file, update correct information or changed data.
RAG search returns both old data and new updated data.

Logs & Screenshots

select * from embedding_fulltext_search_content;
122|az auto webshop elérése a http://192.168.1.10:8080 címen érhető el. ----> ORIGINAL DATA
123|az auto webshop elérése a http://192.168.1.10:8090 címen érhető el. ----> NEW, modified data

The problem is that it returns both results (RAG) when searching. Which is not good :(


webshop.txt

Content
87.52%
az auto webshop elérése a http://192.168.1.10:8090 címen érhető el.
Content
87.29%
az auto webshop elérése a http://192.168.1.10:8080 címen érhető el.

Additional Information

It would be good if we modified the knowledge content, then there would no longer be the old data, only the new.

Originally created by @n4gY1 on GitHub (Jan 10, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/20558 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.7.1 ### Ollama Version (if applicable) _No response_ ### Operating System debian ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When I modify the text knowledge files, I thought the old data in the database would be overwritten. But it seems the old data remains in the database. ### Actual Behavior I created a knowledge file called Webshop.txt, where I just described how to access the webshop. Originally I wrote it like this: http://192.168.1.10:8080. Then after a while I changed the port to: http://192.168.1.10:8090 There are two items in the database after this. **select * from mbedding_fulltext_search_content;** 122|az auto webshop elérése a http://192.168.1.10:8080 címen érhető el. ----> ORIGINAL DATA 123|az auto webshop elérése a http://192.168.1.10:8090 címen érhető el. ----> NEW, modified data The problem is that it returns both results (RAG) when searching. Which is not good :( -------------------------------------------------- [webshop.txt](http://192.168.1.254:3000/api/v1/files/eb71058b-b407-452e-97ae-731fc29160c1/content) Content 87.52% az auto webshop elérése a http://192.168.1.10:8090 címen érhető el. Content 87.29% az auto webshop elérése a http://192.168.1.10:8080 címen érhető el. -------------------------------------------------- ### Steps to Reproduce Upload knowledge file. After modify knowledge file, update correct information or changed data. RAG search returns both old data and new updated data. ### Logs & Screenshots **select * from embedding_fulltext_search_content;** 122|az auto webshop elérése a http://192.168.1.10:8080 címen érhető el. ----> ORIGINAL DATA 123|az auto webshop elérése a http://192.168.1.10:8090 címen érhető el. ----> NEW, modified data **The** problem is that it returns both results (RAG) when searching. Which is not good :( -------------------------------------------------- [webshop.txt](http://192.168.1.254:3000/api/v1/files/eb71058b-b407-452e-97ae-731fc29160c1/content) Content 87.52% az auto webshop elérése a http://192.168.1.10:8090 címen érhető el. Content 87.29% az auto webshop elérése a http://192.168.1.10:8080 címen érhető el. ### Additional Information It would be good if we modified the knowledge content, then there would no longer be the old data, only the new.
GiteaMirror added the bug label 2026-04-25 08:53:49 -05:00
Author
Owner

@owui-terminator[bot] commented on GitHub (Jan 10, 2026):

🔍 Similar Issues Found

I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:

  1. #20033 issue: RAG not working when file uploaded through chat
    by abhishek-paradkar-sndk • Dec 18, 2025 • bug

  2. #20236 issue: Select knowledge within Folders
    by liucoj • Dec 29, 2025 • bug

  3. #19698 issue: .41 web based search and webpages - RAG - are not fixed
    by frenzybiscuit • Dec 02, 2025 • bug

  4. #19281 issue: RAG Template applied with "Bypass Embedding and Retrieval" enabled
    by lucyknada • Nov 19, 2025 • bug

  5. #19098 issue: Prompt & context duplication when RAG template is used
    by matiboux • Nov 11, 2025 • bug

Show 4 more related issues
  1. #19752 issue: minor UI Bug: knowledge sharing
    by mahenning • Dec 04, 2025 • bug, confirmed issue

  2. #14463 issue: regression on v0.6.12 with RAG
    by bb-chris • May 29, 2025 • bug

  3. #17733 issue: RAG queries still generated even if all files are in full context mode
    by Classic298 • Sep 25, 2025 • bug

  4. #19701 issue: knowledge can not multiple upload file
    by willy808 • Dec 03, 2025 • bug


💡 Tips:

  • If this is a duplicate, please consider closing this issue and adding any additional details to the existing one
  • If you found a solution in any of these issues, please share it here to help others

This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.

<!-- gh-comment-id:3733351674 --> @owui-terminator[bot] commented on GitHub (Jan 10, 2026): 🔍 **Similar Issues Found** I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions: 1. [#20033](https://github.com/open-webui/open-webui/issues/20033) **issue: RAG not working when file uploaded through chat** *by abhishek-paradkar-sndk • Dec 18, 2025 • `bug`* 2. [#20236](https://github.com/open-webui/open-webui/issues/20236) **issue: Select knowledge within Folders** *by liucoj • Dec 29, 2025 • `bug`* 3. [#19698](https://github.com/open-webui/open-webui/issues/19698) **issue: .41 web based search and webpages - RAG - are not fixed** *by frenzybiscuit • Dec 02, 2025 • `bug`* 4. [#19281](https://github.com/open-webui/open-webui/issues/19281) **issue: RAG Template applied with "Bypass Embedding and Retrieval" enabled** *by lucyknada • Nov 19, 2025 • `bug`* 5. [#19098](https://github.com/open-webui/open-webui/issues/19098) **issue: Prompt & context duplication when RAG template is used** *by matiboux • Nov 11, 2025 • `bug`* <details> <summary>Show 4 more related issues</summary> 6. [#19752](https://github.com/open-webui/open-webui/issues/19752) **issue: minor UI Bug: knowledge sharing** *by mahenning • Dec 04, 2025 • `bug`, `confirmed issue`* 7. [#14463](https://github.com/open-webui/open-webui/issues/14463) **issue: regression on v0.6.12 with RAG** *by bb-chris • May 29, 2025 • `bug`* 8. [#17733](https://github.com/open-webui/open-webui/issues/17733) **issue: RAG queries still generated even if all files are in full context mode** *by Classic298 • Sep 25, 2025 • `bug`* 9. [#19701](https://github.com/open-webui/open-webui/issues/19701) **issue: knowledge can not multiple upload file** *by willy808 • Dec 03, 2025 • `bug`* </details> --- 💡 **Tips:** - If this is a duplicate, please consider closing this issue and adding any additional details to the existing one - If you found a solution in any of these issues, please share it here to help others *This comment was generated automatically by a bot.* Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
Author
Owner

@silentoplayz commented on GitHub (Jan 10, 2026):

Are you certain you didn't create a second file containing the new content of Webshop.txt? To me, it sounds like this is most likely the case. It does not sound like you have edited a file already within a knowledgebase and saved the contents of the updated file.

<!-- gh-comment-id:3733487014 --> @silentoplayz commented on GitHub (Jan 10, 2026): Are you certain you didn't create a second file containing the new content of Webshop.txt? To me, it sounds like this is most likely the case. It does not sound like you have edited a file already within a knowledgebase and saved the contents of the updated file.
Author
Owner

@n4gY1 commented on GitHub (Jan 11, 2026):

Yes, I'm sure. Actually, it was just this one knowledge file, that's what I edited.
That's also strange, although since it's a test and not a production environment, I deleted all the knowledge base files on the web interface. But they are still there in the "embedding_fulltext_search_content" table.

I will create a knowledge base file under the "BAC_IT" collections
I'm adding a first_knowledge.txt file with the following content: "webshop url: http://192.168.1.1.1:8080". It's also strange that this is added to the "embedding_fulltext_search_content" database with ids 2 and 3, so it's a duplicate.

sqlite> select * from embedding_fulltext_search_content;
1|BAC_IT

bács rendszergazdák
2|webshop url : http://192.168.1.1:8080
3|webshop url : http://192.168.1.1:8080

After I edit this knowledge base file so that port is 8080 -> 8090, the 2 id disappears but the 3 id remains 8080 in that table.

sqlite> select * from embedding_fulltext_search_content;
1|BAC_IT

bács rendszergazdák
3|webshop url : http://192.168.1.1:8080

I ask the AI ​​model what the webshop URL is, and it gives me the address http://192.168.1.1:8080

Ai answers (gemma7b)
what is webshop urls?

profile
gemma3:4b
The webshop URL is http://192.168.1.1:8080/
first_knowledge.txt
.

I open citation "first_knowledge.txt", i see it says "http://192.168.1.1_8080"

But if I go to the collections, BAC_IT, first_knowledge.txt knowledge file, I see "http://192.168.1.1:8090"
8080 != 8090

Something is wrong with the knowledge base if something changes somewhere.

<!-- gh-comment-id:3735443297 --> @n4gY1 commented on GitHub (Jan 11, 2026): Yes, I'm sure. Actually, it was just this one knowledge file, that's what I edited. That's also strange, although since it's a test and not a production environment, I deleted all the knowledge base files on the web interface. But they are still there in the "embedding_fulltext_search_content" table. I will create a knowledge base file under the "BAC_IT" collections I'm adding a first_knowledge.txt file with the following content: "webshop url: http://192.168.1.1.1:8080". It's also strange that this is added to the "embedding_fulltext_search_content" database with ids 2 and 3, so it's a duplicate. sqlite> select * from embedding_fulltext_search_content; 1|BAC_IT bács rendszergazdák 2|webshop url : http://192.168.1.1:8080 3|webshop url : http://192.168.1.1:8080 After I edit this knowledge base file so that port is 8080 -> 8090, the 2 id disappears but the 3 id remains 8080 in that table. sqlite> select * from embedding_fulltext_search_content; 1|BAC_IT bács rendszergazdák 3|webshop url : http://192.168.1.1:8080 I ask the AI ​​model what the webshop URL is, and it gives me the address http://192.168.1.1:8080 Ai answers (gemma7b) what is webshop urls? profile gemma3:4b The webshop URL is http://192.168.1.1:8080/ first_knowledge.txt . I open citation "first_knowledge.txt", i see it says "http://192.168.1.1_8080" But if I go to the collections, BAC_IT, first_knowledge.txt knowledge file, I see "http://192.168.1.1:8090" 8080 != 8090 Something is wrong with the knowledge base if something changes somewhere.
Author
Owner

@0xPatryk commented on GitHub (Jan 12, 2026):

I have the same issue

For context, I uploaded the knowledge base files as Markdown. Due to Docling Setting, all of the images were returned in the knowledge files in base64-encoded format. I've stripped the images and edited the knowledge base files, but the images are still returned in the knowledge base response.

<!-- gh-comment-id:3737172615 --> @0xPatryk commented on GitHub (Jan 12, 2026): I have the same issue For context, I uploaded the knowledge base files as Markdown. Due to Docling Setting, all of the images were returned in the knowledge files in base64-encoded format. I've stripped the images and edited the knowledge base files, but the images are still returned in the knowledge base response.
Author
Owner

@mraaz97 commented on GitHub (Jan 20, 2026):

I can confirm this issue on my end. In the preview it is showing the same content with different information twice:

Image
<!-- gh-comment-id:3772264876 --> @mraaz97 commented on GitHub (Jan 20, 2026): I can confirm this issue on my end. In the preview it is showing the same content with different information twice: <img width="1536" height="1024" alt="Image" src="https://github.com/user-attachments/assets/fd4642b7-5658-4d67-b20d-dd55f59aaaed" />
Author
Owner

@n4gY1 commented on GitHub (Jan 30, 2026):

When reindex database:

It is normally?

2026-01-30 16:51:02.358 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed.
Traceback (most recent call last):

File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f6393868ae0>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function WorkerThread.run at 0x7f6311bfb740>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
result = context.run(func, *args)
│ │ │ └ ()
│ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi...
│ └ <method 'run' of '_contextvars.Context' objects>
└ <_contextvars.Context object at 0x7f62e7f57880>

File "/app/backend/open_webui/routers/retrieval.py", line 1789, in process_file
raise e
└ ValueError(<ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>)

File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file
result = save_docs_to_vector_db(
└ <function save_docs_to_vector_db at 0x7f6314638680>

File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db
raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
│ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>
└ <enum 'ERROR_MESSAGES'>

ValueError: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.367 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file alias.txt (ID: 4f93ac5d-7048-405c-9d24-3e0e9e6ac9df): 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.368 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:329 - Failed to process 1 files in knowledge base b343177a-da14-4111-983d-db50b52297f5
2026-01-30 16:51:02.368 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 4f93ac5d-7048-405c-9d24-3e0e9e6ac9df, Error: 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.481 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1420 - Document with hash 72314fa705bd9704d86ffe80043739e2aca22a955a1fafb7b39515e78b378303 already exists
2026-01-30 16:51:02.482 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed.
Traceback (most recent call last):

File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f6393868ae0>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function WorkerThread.run at 0x7f6311bfb740>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
result = context.run(func, *args)
│ │ │ └ ()
│ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi...
│ └ <method 'run' of '_contextvars.Context' objects>
└ <_contextvars.Context object at 0x7f630f98d600>

File "/app/backend/open_webui/routers/retrieval.py", line 1789, in process_file
raise e
└ ValueError(<ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>)

File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file
result = save_docs_to_vector_db(
└ <function save_docs_to_vector_db at 0x7f6314638680>

File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db
raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
│ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>
└ <enum 'ERROR_MESSAGES'>

ValueError: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.490 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file Robotzsaru debug.txt (ID: 559e6f60-66b8-4e1a-876a-173db627d09e): 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.581 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1420 - Document with hash d4ad5df7083fabb3e194781cd4050e62aa71dad09a87cfe574513f5f507e76b5 already exists
2026-01-30 16:51:02.582 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed.
Traceback (most recent call last):

File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f6393868ae0>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function WorkerThread.run at 0x7f6311bfb740>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
result = context.run(func, *args)
│ │ │ └ ()
│ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi...
│ └ <method 'run' of '_contextvars.Context' objects>
└ <_contextvars.Context object at 0x7f62e7f05480>

File "/app/backend/open_webui/routers/retrieval.py", line 1789, in process_file
raise e
└ ValueError(<ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>)

File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file
result = save_docs_to_vector_db(
└ <function save_docs_to_vector_db at 0x7f6314638680>

File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db
raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
│ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>
└ <enum 'ERROR_MESSAGES'>

ValueError: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.768 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file first_knowledge.txt (ID: 90a1b3d0-5e81-4db1-b36f-3b59a29bfd85): 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:02.878 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1420 - Document with hash 156846654a149e9c432cdaf622733550a9d26f3efa83c4556451e4f17f14487e already exists
2026-01-30 16:51:02.879 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed.
Traceback (most recent call last):

File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7f6393868ae0>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function WorkerThread.run at 0x7f6311bfb740>
└ <WorkerThread(AnyIO worker thread, started 140062322824896)>
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
result = context.run(func, *args)
│ │ │ └ ()
│ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi...
│ └ <method 'run' of '_contextvars.Context' objects>
└ <_contextvars.Context object at 0x7f630f9a8900>

File "/app/backend/open_webui/routers/retrieval.py", line 1789, in process_file
raise e
└ ValueError(<ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>)

File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file
result = save_docs_to_vector_db(
└ <function save_docs_to_vector_db at 0x7f6314638680>

File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db
raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
│ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>
└ <enum 'ERROR_MESSAGES'>

ValueError: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:03.077 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file ermi_ai.pptx (ID: 98b3f7d0-2fea-43d5-938b-22aa613ac6e1): 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:03.077 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:329 - Failed to process 3 files in knowledge base 6d2d5a1a-ff9f-42a7-a0c4-0f9ae1e9ae8a
2026-01-30 16:51:03.078 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 559e6f60-66b8-4e1a-876a-173db627d09e, Error: 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:03.078 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 90a1b3d0-5e81-4db1-b36f-3b59a29bfd85, Error: 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:03.078 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 98b3f7d0-2fea-43d5-938b-22aa613ac6e1, Error: 400: Duplicate content detected. Please provide unique content to proceed.
2026-01-30 16:51:03.078 | INFO | open_webui.routers.knowledge:reindex_knowledge_files:335 - Reindexing completed.
2026-01-30 16:51:03.081 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.50:58253 - "POST /api/v1/knowledge/reindex HTTP/1.1" 200
2026-01-30 16:51:37.715 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.50:55437 - "GET /_app/version.json HTTP/1.1" 200
2026-01-30 16:53:24.813 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.50:64675 - "GET /_app/version.json HTTP/1.1" 200
2026-01-30 16:53:38.277 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.120:57219 - "GET /_app/version.json HTTP/1.1" 304

<!-- gh-comment-id:3824715231 --> @n4gY1 commented on GitHub (Jan 30, 2026): When reindex database: **It is normally?** 2026-01-30 16:51:02.358 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed. Traceback (most recent call last): File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x7f6393868ae0> └ <WorkerThread(AnyIO worker thread, started 140062322824896)> File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function WorkerThread.run at 0x7f6311bfb740> └ <WorkerThread(AnyIO worker thread, started 140062322824896)> File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run result = context.run(func, *args) │ │ │ └ () │ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi... │ └ <method 'run' of '_contextvars.Context' objects> └ <_contextvars.Context object at 0x7f62e7f57880> > File "/app/backend/open_webui/routers/retrieval.py", line 1789, in process_file raise e └ ValueError(<ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>) File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file result = save_docs_to_vector_db( └ <function save_docs_to_vector_db at 0x7f6314638680> File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT) │ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'> └ <enum 'ERROR_MESSAGES'> ValueError: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:02.367 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file alias.txt (ID: 4f93ac5d-7048-405c-9d24-3e0e9e6ac9df): 400: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:02.368 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:329 - Failed to process 1 files in knowledge base b343177a-da14-4111-983d-db50b52297f5 2026-01-30 16:51:02.368 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 4f93ac5d-7048-405c-9d24-3e0e9e6ac9df, Error: 400: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:02.481 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1420 - Document with hash 72314fa705bd9704d86ffe80043739e2aca22a955a1fafb7b39515e78b378303 already exists 2026-01-30 16:51:02.482 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed. Traceback (most recent call last): File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x7f6393868ae0> └ <WorkerThread(AnyIO worker thread, started 140062322824896)> File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function WorkerThread.run at 0x7f6311bfb740> └ <WorkerThread(AnyIO worker thread, started 140062322824896)> File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run result = context.run(func, *args) │ │ │ └ () │ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi... │ └ <method 'run' of '_contextvars.Context' objects> └ <_contextvars.Context object at 0x7f630f98d600> > File "/app/backend/open_webui/routers/retrieval.py", line 1789, in process_file raise e └ ValueError(<ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>) File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file result = save_docs_to_vector_db( └ <function save_docs_to_vector_db at 0x7f6314638680> File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT) │ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'> └ <enum 'ERROR_MESSAGES'> ValueError: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:02.490 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file Robotzsaru debug.txt (ID: 559e6f60-66b8-4e1a-876a-173db627d09e): 400: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:02.581 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1420 - Document with hash d4ad5df7083fabb3e194781cd4050e62aa71dad09a87cfe574513f5f507e76b5 already exists 2026-01-30 16:51:02.582 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed. Traceback (most recent call last): File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x7f6393868ae0> └ <WorkerThread(AnyIO worker thread, started 140062322824896)> File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function WorkerThread.run at 0x7f6311bfb740> └ <WorkerThread(AnyIO worker thread, started 140062322824896)> File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run result = context.run(func, *args) │ │ │ └ () │ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi... │ └ <method 'run' of '_contextvars.Context' objects> └ <_contextvars.Context object at 0x7f62e7f05480> > File "/app/backend/open_webui/routers/retrieval.py", line 1789, in process_file raise e └ ValueError(<ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>) File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file result = save_docs_to_vector_db( └ <function save_docs_to_vector_db at 0x7f6314638680> File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT) │ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'> └ <enum 'ERROR_MESSAGES'> ValueError: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:02.768 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file first_knowledge.txt (ID: 90a1b3d0-5e81-4db1-b36f-3b59a29bfd85): 400: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:02.878 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1420 - Document with hash 156846654a149e9c432cdaf622733550a9d26f3efa83c4556451e4f17f14487e already exists 2026-01-30 16:51:02.879 | ERROR | open_webui.routers.retrieval:process_file:1792 - Duplicate content detected. Please provide unique content to proceed. Traceback (most recent call last): File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x7f6393868ae0> └ <WorkerThread(AnyIO worker thread, started 140062322824896)> File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function WorkerThread.run at 0x7f6311bfb740> └ <WorkerThread(AnyIO worker thread, started 140062322824896)> File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run result = context.run(func, *args) │ │ │ └ () │ │ └ functools.partial(<function process_file at 0x7f6314638860>, <starlette.requests.Request object at 0x7f630f9831d0>, ProcessFi... │ └ <method 'run' of '_contextvars.Context' objects> └ <_contextvars.Context object at 0x7f630f9a8900> > File "/app/backend/open_webui/routers/retrieval.py", line 1789, in process_file raise e └ ValueError(<ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>) File "/app/backend/open_webui/routers/retrieval.py", line 1750, in process_file result = save_docs_to_vector_db( └ <function save_docs_to_vector_db at 0x7f6314638680> File "/app/backend/open_webui/routers/retrieval.py", line 1421, in save_docs_to_vector_db raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT) │ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'> └ <enum 'ERROR_MESSAGES'> ValueError: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:03.077 | ERROR | open_webui.routers.knowledge:reindex_knowledge_files:317 - Error processing file ermi_ai.pptx (ID: 98b3f7d0-2fea-43d5-938b-22aa613ac6e1): 400: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:03.077 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:329 - Failed to process 3 files in knowledge base 6d2d5a1a-ff9f-42a7-a0c4-0f9ae1e9ae8a 2026-01-30 16:51:03.078 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 559e6f60-66b8-4e1a-876a-173db627d09e, Error: 400: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:03.078 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 90a1b3d0-5e81-4db1-b36f-3b59a29bfd85, Error: 400: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:03.078 | WARNING | open_webui.routers.knowledge:reindex_knowledge_files:333 - File ID: 98b3f7d0-2fea-43d5-938b-22aa613ac6e1, Error: 400: Duplicate content detected. Please provide unique content to proceed. 2026-01-30 16:51:03.078 | INFO | open_webui.routers.knowledge:reindex_knowledge_files:335 - Reindexing completed. 2026-01-30 16:51:03.081 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.50:58253 - "POST /api/v1/knowledge/reindex HTTP/1.1" 200 2026-01-30 16:51:37.715 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.50:55437 - "GET /_app/version.json HTTP/1.1" 200 2026-01-30 16:53:24.813 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.50:64675 - "GET /_app/version.json HTTP/1.1" 200 2026-01-30 16:53:38.277 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 192.168.1.120:57219 - "GET /_app/version.json HTTP/1.1" 304
Author
Owner

@n4gY1 commented on GitHub (Feb 15, 2026):

The bug is also found in the new version (v0.8.1)

Question:

Where is the webshop located?

Knowledge: "first_knowledge.txt" file, the current content that I modified: "webshop url : http://192.168.1.1:8080".
Original content: "webshop url : http://192.168.1.1:8090"

I re-indexed the knowledge base, a day passed and this response was received:

A webshop elérési címe http://192.168.1.1:8090/ vagy http://192.168.1.1:8080/
(first_knowledge.txt)

The webshop's access address is http://192.168.1.1:8090 or http://192.168.1.1:8080

<!-- gh-comment-id:3904932814 --> @n4gY1 commented on GitHub (Feb 15, 2026): ### The bug is also found in the new version (v0.8.1) Question: > Where is the webshop located? Knowledge: "first_knowledge.txt" file, the current content that I modified: **"webshop url : http://192.168.1.1:8080"**. Original content: "webshop url : http://192.168.1.1:8090" I re-indexed the knowledge base, a day passed and this response was received: > A webshop elérési címe http://192.168.1.1:8090/ vagy http://192.168.1.1:8080/ > (first_knowledge.txt) > The webshop's access address is **http://192.168.1.1:8090 or http://192.168.1.1:8080**
Author
Owner

@Classic298 commented on GitHub (Feb 16, 2026):

might fix this issue - testing needed please

https://github.com/open-webui/open-webui/pull/21495

<!-- gh-comment-id:3910460021 --> @Classic298 commented on GitHub (Feb 16, 2026): might fix this issue - testing needed please https://github.com/open-webui/open-webui/pull/21495
Author
Owner

@Classic298 commented on GitHub (Feb 16, 2026):

"testing needed" is actually wrong
testing required.
I cannot test it as i struggle to reproduce

if you guys can test and confirm this here, then it can get merged more quickly (or rather at all)

<!-- gh-comment-id:3910560016 --> @Classic298 commented on GitHub (Feb 16, 2026): "testing needed" is actually wrong testing required. I cannot test it as i struggle to reproduce if you guys can test and confirm this here, then it can get merged more quickly (or rather at all)
Author
Owner

@Classic298 commented on GitHub (Feb 16, 2026):

0.8.3 to release soon so time for testing would be good now

<!-- gh-comment-id:3910560823 --> @Classic298 commented on GitHub (Feb 16, 2026): 0.8.3 to release soon so time for testing would be good now
Author
Owner

@Classic298 commented on GitHub (Feb 17, 2026):

anyone could test it yet?

<!-- gh-comment-id:3916182202 --> @Classic298 commented on GitHub (Feb 17, 2026): anyone could test it yet?
Author
Owner

@n4gY1 commented on GitHub (Feb 17, 2026):

https://youtu.be/UGoerVYpueA | youtube demonstration
new version (0.8.3) , create new knowledge. Than update knowledge. Reindex document database. Answers duplicated (oldest and newest data)

<!-- gh-comment-id:3916429411 --> @n4gY1 commented on GitHub (Feb 17, 2026): https://youtu.be/UGoerVYpueA | [youtube demonstration](https://youtu.be/UGoerVYpueA) new version (0.8.3) , create new knowledge. Than update knowledge. Reindex document database. Answers duplicated (oldest and newest data)
Author
Owner

@mraaz97 commented on GitHub (Feb 19, 2026):

https://youtu.be/UGoerVYpueA | youtube demonstration new version (0.8.3) , create new knowledge. Than update knowledge. Reindex document database. Answers duplicated (oldest and newest data)

Hey @n4gY1 did you actually test the PR #21495 from @Classic298? It seems you tested on 0.8.3, where the fix was obviously not merged yet. Thanks for your effort.

<!-- gh-comment-id:3926372721 --> @mraaz97 commented on GitHub (Feb 19, 2026): > https://youtu.be/UGoerVYpueA | [youtube demonstration](https://youtu.be/UGoerVYpueA) new version (0.8.3) , create new knowledge. Than update knowledge. Reindex document database. Answers duplicated (oldest and newest data) Hey @n4gY1 did you actually test the PR #21495 from @Classic298? It seems you tested on 0.8.3, where the fix was obviously not merged yet. Thanks for your effort.
Author
Owner

@Classic298 commented on GitHub (Feb 19, 2026):

waiting on testers still. there are many reporters here but no one willing to test?

<!-- gh-comment-id:3926638717 --> @Classic298 commented on GitHub (Feb 19, 2026): waiting on testers still. there are many reporters here but no one willing to test?
Author
Owner

@mraaz97 commented on GitHub (Feb 19, 2026):

@Classic298 Have tested on my end and it is still not working for me. It just adds the new chunks but it's not deleting the old ones. For testing what I've done: Cloned your repo, checked out the fix branch and build docker image locally.

Image
<!-- gh-comment-id:3927725265 --> @mraaz97 commented on GitHub (Feb 19, 2026): @Classic298 Have tested on my end and it is still not working for me. It just adds the new chunks but it's not deleting the old ones. For testing what I've done: Cloned your repo, checked out the fix branch and build docker image locally. <img width="1336" height="594" alt="Image" src="https://github.com/user-attachments/assets/427ff248-d97d-4ea7-841e-feaf106f3920" />
Author
Owner

@tjbck commented on GitHub (Mar 8, 2026):

Should be addressed in dev, let us know if the issue persists!

<!-- gh-comment-id:4020282727 --> @tjbck commented on GitHub (Mar 8, 2026): Should be addressed in dev, let us know if the issue persists!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#34752