[PR #17938] [CLOSED] feat+FIX: Add website/webpage support to knowledge bases #40238

Closed
opened 2026-04-25 12:38:48 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/17938
Author: @Classic298
Created: 9/30/2025
Status: Closed

Base: devHead: add-website-knowledgebase


📝 Commits (7)

  • b08fe86 Add website to knowledge base (#21)
  • 0ff6cd1 Update KnowledgeBase.svelte
  • 5cfc35d Update retrieval.py
  • e3e5135 Update retrieval.py
  • 199c0d2 Merge branch 'dev' into add-website-knowledgebase
  • c5d833d Merge branch 'dev' into add-website-knowledgebase
  • 5e40b67 remove whitespace

📊 Changes

3 files changed (+131 additions, -12 deletions)

View changed files

📝 backend/open_webui/routers/retrieval.py (+65 -12)
📝 src/lib/components/workspace/Knowledge/KnowledgeBase.svelte (+46 -0)
📝 src/lib/components/workspace/Knowledge/KnowledgeBase/AddContentMenu.svelte (+20 -0)

📄 Description

Pull Request Checklist

Before submitting, make sure you've checked the following:

  • Target branch: This pull request targets the dev branch.
  • Description: Added functionality to add websites and webpages directly to knowledge bases through the "Add from Website" option in the knowledge base UI.
  • Changelog: Changelog entry included below.
  • Documentation: Documentation updates may be needed for this new feature.
  • Dependencies: No new dependencies added.
  • Testing: Manually tested the feature with multiple websites.
  • Code review: Self-review completed.

Changelog Entry

Description

  1. FEAT: This PR adds the ability to add websites and webpages directly to knowledge bases. Users can now scrape and index web content without needing to manually download files first. This improves the workflow for building knowledge bases from online sources.
  2. FIX: This PR also fixes an ISSUE reported here and here

Added

  • "Add from Website" option in knowledge base content menu
  • uploadWebHandler function to handle web scraping and file creation for knowledge bases
  • Backend support for creating file records from scraped web content
  • File ID now returned in processWeb response to enable proper knowledge base association

Changed

  • Modified process_web endpoint to create persistent file records with IDs
  • Updated save_docs_to_vector_db to skip duplicate content check when add=True (adding to existing knowledge bases)
  • Refactored process_web logic to differentiate between chat context (auto-save to vector DB) and knowledge base context (defer to process_file)
  • Modified addFileHandler to return boolean success status

Fixed

"I upload files in the knowledgebase, even though I upload a new file with new content that is never uploaded, there is also this duplicate error, only one file can be uploaded to this knowledgebase."

Testing conducted:

  1. Tested the feature in knowledgebase normally
  2. Tested adding multiple websites in knowledgebase
  3. Tested adding the same website multiple times in knowledgebase (works - which is intended in this case)
  4. Tested if adding websites in the chat still works - it does
  5. Tested if adding THE SAME website IN A NEW chat still works - it does
  6. Tested if adding THE SAME website multiple times in a new chat still works - now works; probably not intended, but was necessary to avoid problems encountered in 5 -> problem encountered was: **if you added a specific website in a chat - then created a new chat, the exact same website couldn't be added again, resulting in duplicate content warning just like the user reported in issue 17088 here

Screenshots

image image

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/17938 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 9/30/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `add-website-knowledgebase` --- ### 📝 Commits (7) - [`b08fe86`](https://github.com/open-webui/open-webui/commit/b08fe860fd230a2b2cbf6411aeb8eedeffc9b4fb) Add website to knowledge base (#21) - [`0ff6cd1`](https://github.com/open-webui/open-webui/commit/0ff6cd1bedce754f993a961082f5606d4a2b9062) Update KnowledgeBase.svelte - [`5cfc35d`](https://github.com/open-webui/open-webui/commit/5cfc35d3c5f37691ed945470710f55b7c4d20b27) Update retrieval.py - [`e3e5135`](https://github.com/open-webui/open-webui/commit/e3e5135101225c582077491464378881c6cbb95b) Update retrieval.py - [`199c0d2`](https://github.com/open-webui/open-webui/commit/199c0d2fe7be7f1470eb15433c2f9cbcaf8b95dc) Merge branch 'dev' into add-website-knowledgebase - [`c5d833d`](https://github.com/open-webui/open-webui/commit/c5d833dbd151839bb23693b842096adfb95198b1) Merge branch 'dev' into add-website-knowledgebase - [`5e40b67`](https://github.com/open-webui/open-webui/commit/5e40b674d8cc70fcec4ffee8bd1fee14eddf9447) remove whitespace ### 📊 Changes **3 files changed** (+131 additions, -12 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/routers/retrieval.py` (+65 -12) 📝 `src/lib/components/workspace/Knowledge/KnowledgeBase.svelte` (+46 -0) 📝 `src/lib/components/workspace/Knowledge/KnowledgeBase/AddContentMenu.svelte` (+20 -0) </details> ### 📄 Description # Pull Request Checklist **Before submitting, make sure you've checked the following:** - [x] **Target branch:** This pull request targets the `dev` branch. - [x] **Description:** Added functionality to add websites and webpages directly to knowledge bases through the "Add from Website" option in the knowledge base UI. - [x] **Changelog:** Changelog entry included below. - [X] **Documentation:** Documentation updates may be needed for this new feature. - [X] **Dependencies:** No new dependencies added. - [x] **Testing:** Manually tested the feature with multiple websites. - [x] **Code review:** Self-review completed. # Changelog Entry ### Description 1. **FEAT**: This PR **adds the ability to add websites and webpages directly to knowledge bases**. Users can now scrape and index web content without needing to manually download files first. This improves the workflow for building knowledge bases from online sources. 3. **FIX**: This PR also **fixes an ISSUE reported** [here](https://github.com/open-webui/open-webui/issues/17088#issuecomment-3337386510) and [here](https://github.com/open-webui/open-webui/discussions/16345) ### Added - "Add from Website" option in knowledge base content menu - `uploadWebHandler` function to handle web scraping and file creation for knowledge bases - Backend support for creating file records from scraped web content - File ID now returned in `processWeb` response to enable proper knowledge base association ### Changed - Modified `process_web` endpoint to create persistent file records with IDs - Updated `save_docs_to_vector_db` to skip duplicate content check when `add=True` (adding to existing knowledge bases) - Refactored `process_web` logic to differentiate between chat context (auto-save to vector DB) and knowledge base context (defer to `process_file`) - Modified `addFileHandler` to return boolean success status ### Fixed - Related: - https://github.com/open-webui/open-webui/discussions/6118 - https://github.com/open-webui/open-webui/issues/6202 - https://github.com/open-webui/open-webui/issues/17088#issuecomment-3337386510 - https://github.com/open-webui/open-webui/issues/18485 - https://github.com/open-webui/open-webui/discussions/16345 __**"I upload files in the knowledgebase, even though I upload a new file with new content that is never uploaded, there is also this duplicate error, only one file can be uploaded to this knowledgebase."**__ ### Testing conducted: 1. Tested the feature in knowledgebase normally 2. Tested adding multiple websites in knowledgebase 3. Tested adding the same website multiple times in knowledgebase (works - which is intended in this case) 4. Tested if adding websites in the chat still works - it does 5. Tested if adding THE SAME website IN A NEW chat still works - it does 6. Tested if adding THE SAME website multiple times in a new chat still works - now works; probably not intended, but was necessary to avoid problems encountered in `5` -> problem encountered was: **if you added a specific website in a chat - then created a new chat, the exact same website couldn't be added again, resulting in duplicate content warning [just like the user reported in issue 17088 here](https://github.com/open-webui/open-webui/issues/17088#issuecomment-3337386510) --- ### Screenshots <img width="395" height="416" alt="image" src="https://github.com/user-attachments/assets/16d54009-30d4-4d09-a65f-b676e2751f08" /> <img width="822" height="307" alt="image" src="https://github.com/user-attachments/assets/14126eb8-b702-4be0-87e8-85d50af628a2" /> ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [[Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 12:38:48 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#40238