feat: Allow clickable URLS as sources for documents. #4366

Closed
opened 2025-11-11 15:52:26 -06:00 by GiteaMirror · 3 comments
Owner

Originally created by @icsy7867 on GitHub (Mar 10, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

Not sure where to start or the best place to ask this.

For my use case, I would love to be able to make the source id of a document, be a clickable URL instead of some generic source-id.

I have written scrappers and wrappers to use my companies documents and knowledge source and push them to an API to load into a qdrant database. It was actually pretty easy to write and do. I simply used the confluence API to return the HTML code of every document in a space, and then I iteratively pushed this to a qdrant DB via an api.

When I named the document, I was able to (with some modifications) name the document as a web URL, which, when a document matched, produced a clickable URL link instead of something like "something.txt". With MANY peoples documents, information and other items stored in various web databases and sources (like confluence), it would be nice to reference these articles via their source links instead of a file name.

Desired Solution you'd like

There might be a better solution, but with the previous tool I made a few simple edits...

  1. When using the API to push a document into a RAG database, I took the source URL of the document I was uploading and URLENCODED it, so that the special characters and slashes would not interfere with the JSON formatting.

  2. When displaying the source name, I simply did a URLDECODE on the source/file name. This worked well in the tool, because if you URLDECODE something that is not URLENCODED it just returns the same string.

The link should be clickable. While using an AI model as an information hub, it would be nice if the model was able to reference the original source via URL, as depending on the user provided context, the information may or may not be completely accurate.

Alternatives Considered

I am playing around with the API and formatting to see what is possible.

Additional Context

No response

Originally created by @icsy7867 on GitHub (Mar 10, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description Not sure where to start or the best place to ask this. For my use case, I would love to be able to make the source id of a document, be a clickable URL instead of some generic source-id. I have written scrappers and wrappers to use my companies documents and knowledge source and push them to an API to load into a qdrant database. It was actually pretty easy to write and do. I simply used the confluence API to return the HTML code of every document in a space, and then I iteratively pushed this to a qdrant DB via an api. When I named the document, I was able to (with some modifications) name the document as a web URL, which, when a document matched, produced a clickable URL link instead of something like "something.txt". With MANY peoples documents, information and other items stored in various web databases and sources (like confluence), it would be nice to reference these articles via their source links instead of a file name. ### Desired Solution you'd like There might be a better solution, but with the previous tool I made a few simple edits... 1. When using the API to push a document into a RAG database, I took the source URL of the document I was uploading and URLENCODED it, so that the special characters and slashes would not interfere with the JSON formatting. 2. When displaying the source name, I simply did a URLDECODE on the source/file name. This worked well in the tool, because if you URLDECODE something that is not URLENCODED it just returns the same string. The link should be clickable. While using an AI model as an information hub, it would be nice if the model was able to reference the original source via URL, as depending on the user provided context, the information may or may not be completely accurate. ### Alternatives Considered I am playing around with the API and formatting to see what is possible. ### Additional Context _No response_
Author
Owner

@icsy7867 commented on GitHub (Mar 10, 2025):

For my initial API test, URL encoding the filename works:

Image

I think I could just edit the javascript here:
d7bfa395b0/src/lib/components/chat/Messages/Citations.svelte (L120)

and/or
d7bfa395b0/src/lib/components/chat/Messages/Citations.svelte (L160)

If I can just use the javascript uriDecodeComponent, maybe they would work? I can try to build the container.

And this one...
d7bfa395b0/src/lib/components/chat/Messages/CitationsModal.svelte (L101)

Trying this out and building the container :D

Neat! Editing the last document worked for the modal! But the ID on the search did not change.

Image

Whoops! Forgot one...
d7bfa395b0/src/lib/components/chat/Messages/Citations.svelte (L197)

This worked!

Image

Image

Now the only remaining place is here:

Image

EDIT Found it...

d7bfa395b0/src/lib/components/chat/MessageInput/Commands/Knowledge.svelte (L213)

Image

I believe the last piece is here:
d7bfa395b0/src/lib/components/common/FileItem.svelte (L85)

But I have to switch gears and will try tonight :D

EDIT

It appears that was the correct file! After selecting the file, it appears correctly now:

Image

@icsy7867 commented on GitHub (Mar 10, 2025): For my initial API test, URL encoding the filename works: ![Image](https://github.com/user-attachments/assets/fda301bf-d86f-4a1c-a9df-6b80bed924e4) I think I could just edit the javascript here: https://github.com/open-webui/open-webui/blob/d7bfa395b0672a21a41fb6706a4275673d339762/src/lib/components/chat/Messages/Citations.svelte#L120 and/or https://github.com/open-webui/open-webui/blob/d7bfa395b0672a21a41fb6706a4275673d339762/src/lib/components/chat/Messages/Citations.svelte#L160 If I can just use the javascript uriDecodeComponent, maybe they would work? I can try to build the container. And this one... https://github.com/open-webui/open-webui/blob/d7bfa395b0672a21a41fb6706a4275673d339762/src/lib/components/chat/Messages/CitationsModal.svelte#L101 Trying this out and building the container :D Neat! Editing the last document worked for the modal! But the ID on the search did not change. ![Image](https://github.com/user-attachments/assets/3ee0388f-ebeb-4b26-9487-fdbf1cc778e3) Whoops! Forgot one... https://github.com/open-webui/open-webui/blob/d7bfa395b0672a21a41fb6706a4275673d339762/src/lib/components/chat/Messages/Citations.svelte#L197 This worked! ![Image](https://github.com/user-attachments/assets/38776614-04aa-4ad2-b67f-759a36eef565) ![Image](https://github.com/user-attachments/assets/61265e8e-814d-4dc3-b060-7778ab54484a) Now the only remaining place is here: ![Image](https://github.com/user-attachments/assets/571c3102-471b-4ad3-849a-df0ed5d11c9d) *EDIT* Found it... https://github.com/open-webui/open-webui/blob/d7bfa395b0672a21a41fb6706a4275673d339762/src/lib/components/chat/MessageInput/Commands/Knowledge.svelte#L213 ![Image](https://github.com/user-attachments/assets/b8e79c43-cb88-4569-a95e-75374d4595ce) I believe the last piece is here: https://github.com/open-webui/open-webui/blob/d7bfa395b0672a21a41fb6706a4275673d339762/src/lib/components/common/FileItem.svelte#L85 But I have to switch gears and will try tonight :D *EDIT* It appears that was the correct file! After selecting the file, it appears correctly now: ![Image](https://github.com/user-attachments/assets/d257590e-6e3c-4c9d-b03b-2d19762dd6b4)
Author
Owner

@linuxrrze commented on GitHub (Mar 11, 2025):

This made my day! Thank you for adding this feature!

I was also trying to commit external web scraping data into open-webui and found no way to make the citations show usable URLs.

@linuxrrze commented on GitHub (Mar 11, 2025): This made my day! Thank you for adding this feature! I was also trying to commit external web scraping data into open-webui and found no way to make the citations show usable URLs.
Author
Owner

@icsy7867 commented on GitHub (Mar 12, 2025):

Whoops... one more... location. I can create another pull, but it will most likely be tomorrow.

When a document is used as a reference, it needs to also be decoded.

b03fc97e28/src/lib/components/chat/Messages/Markdown/Source.svelte (L47)

@icsy7867 commented on GitHub (Mar 12, 2025): Whoops... one more... location. I can create another pull, but it will most likely be tomorrow. When a document is used as a reference, it needs to also be decoded. https://github.com/open-webui/open-webui/blob/b03fc97e287f31ad07bda896143959bc4413f7d2/src/lib/components/chat/Messages/Markdown/Source.svelte#L47
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#4366