feat: editable rag params #174

New Issue

GiteaMirror · 2025-11-11T14:09:36-06:00

GiteaMirror commented

2025-11-11 14:09:36 -06:00

Originally created by @jukofyork on GitHub (Jan 12, 2024).

I've just been trying out the new Rag mode with some different PDFs and one test I did was add in the game rules for a board game called "Valor & Victory" and then quizzed several LLMs about the sequence of play.

They all seemed to get confused and completely miss out the "Move" phase and outright claimed it didn't exist!

I think the problem may like with these langchain parameters:

CHUNK_SIZE = 1500
CHUNK_OVERLAP = 100

From a quick read it seems the CHUNK_SIZE parameter is on the high side.

Im not sure if this was a deliberate decision or not but it would be nice to allow us to control this.

I've no idea how to edit this in the Docker image either or otherwise I would have happily tried a few different chunk sizes to see if there was any improvement on similar documents.

Originally created by @jukofyork on GitHub (Jan 12, 2024). I've just been trying out the new Rag mode with some different PDFs and one test I did was add in the game rules for a board game called "Valor & Victory" and then quizzed several LLMs about the sequence of play. They all seemed to get confused and completely miss out the "Move" phase and outright claimed it didn't exist! I think the problem may like with these langchain parameters: CHUNK_SIZE = 1500 CHUNK_OVERLAP = 100 From a quick read it seems the CHUNK_SIZE parameter is on the high side. Im not sure if this was a deliberate decision or not but it would be nice to allow us to control this. I've no idea how to edit this in the Docker image either or otherwise I would have happily tried a few different chunk sizes to see if there was any improvement on similar documents.

GiteaMirror added the enhancement good first issue help wanted labels 2025-11-11 14:09:36 -06:00

GiteaMirror closed this issue

2025-11-11 14:09:38 -06:00

GiteaMirror commented

2025-11-11 14:09:45 -06:00

@turnercore commented on GitHub (Jan 15, 2024):

In addition to this it would be nice if there was an endpoint to add your own chunks to manually chunk things. Savvy users could then easily write a script to handle rag input however they want. Really good rag integration is likely to have a lot of nuances that a lot of people are going to feel differently about, and it may be outside the scope of this project in the short term to handle all of them. Having an api endpoint that allows for custom rag upload handling seems like a good solution for now.

@turnercore commented on GitHub (Jan 15, 2024): In addition to this it would be nice if there was an endpoint to add your own chunks to manually chunk things. Savvy users could then easily write a script to handle rag input however they want. Really good rag integration is likely to have a lot of nuances that a lot of people are going to feel differently about, and it may be outside the scope of this project in the short term to handle all of them. Having an api endpoint that allows for custom rag upload handling seems like a good solution for now.

GiteaMirror commented

2025-11-11 14:09:45 -06:00

@romainfd commented on GitHub (Jan 15, 2024):

I feel like this could be added "rather easily" as this is almost how it is built already in rag/main.py with a POST /doc endpoint calling store_doc which leverages store_data_in_vector_db to do the chunking. However, as you say it "Really good rag integration is likely to have a lot of nuances that a lot of people are going to feel differently about" and there are many LangChain retrievers and logic around so not sure only the upload part could be enough (for self-query, time-weighted, ... for instance). I am wondering if adding the option to register a custom query endpoint wouldn't be a more "scalable" option. That way, you could implement all the chunking, embedding, storing, retrieving, ... logic you want! But then, you'd also need an endpoint to expose to Ollama web ui the different documents/collection you indexed so they are available in the UI!
Happy to discuss it more and try to draft a PR if we can find a nice way to do it!

@romainfd commented on GitHub (Jan 15, 2024): I feel like this could be added _"rather easily"_ as this is almost how it is built already in [`rag/main.py`](https://github.com/ollama-webui/ollama-webui/blob/main/backend/apps/rag/main.py) with a `POST /doc` endpoint calling `store_doc` which leverages `store_data_in_vector_db` to do the chunking. However, as you say it "Really good rag integration is likely to have a lot of nuances that a lot of people are going to feel differently about" and there are many [LangChain retrievers](https://python.langchain.com/docs/modules/data_connection/retrievers/) and logic around so not sure only the upload part could be enough (for self-query, time-weighted, ... for instance). I am wondering if adding the option to register a custom query endpoint wouldn't be a more "scalable" option. That way, you could implement all the chunking, embedding, storing, retrieving, ... logic you want! But then, you'd also need an endpoint to expose to Ollama web ui the different documents/collection you indexed so they are available in the UI! Happy to discuss it more and try to draft a PR if we can find a nice way to do it!

GiteaMirror commented

2025-11-11 14:09:45 -06:00

@turnercore commented on GitHub (Jan 16, 2024):

@romainfd we should figure it out. I haven't worked with Svelte before (mostly NextJS as far as web stuff goes), but I'd be happy to help with a PR. LLMs make it so much easier to jump into a new language anyway.

From what I'm hearing, you're suggesting the following:

a default rag built into webui, perhaps with some editable properties like CHUNK_SIZE, CHUNK_OVERLAP
in addition, the ability to register a custom query endpoint that would overwrite the default

The only other way I can see to handle this at scale would be a plugin system. I think the query endpoint built on top of a default would certainly be a quicker and simpler approach.

For a concrete example, I want to build an Obsidian plugin that would handle RAG implementation for the vault, processing an d chunking markdown files, including metadata like tags that you could give the AI to be able to do a rag query on a subset of the notes. If we did it this way I guess what you'd do is run a web backend for the plugin that connects to a database, and give it an endpoint that webui can query for retrieval. Similar to how it does with ollama, webui would ask the endpoint for context given an input, and the endpoint would determine what to send back which would be prepended to the user's query for the ollama (or openai) endpoint. Is that right?

I imagine there are many other custom use-cases people could have, and I think it would be awesome if the project allowed for them. Obviously it would be cool to have a system that you can customize within the app itself, but it might be a daunting and never-ending task to include every permutation of thing that people might want to do with RAG, especially considering how new the field is and the lack of best and established practices.

@turnercore commented on GitHub (Jan 16, 2024): @romainfd we should figure it out. I haven't worked with Svelte before (mostly NextJS as far as web stuff goes), but I'd be happy to help with a PR. LLMs make it so much easier to jump into a new language anyway. From what I'm hearing, you're suggesting the following: - a default rag built into webui, perhaps with some editable properties like CHUNK_SIZE, CHUNK_OVERLAP - in addition, the ability to register a custom query endpoint that would overwrite the default The only other way I can see to handle this at scale would be a plugin system. I think the query endpoint built on top of a default would certainly be a quicker and simpler approach. For a concrete example, I want to build an Obsidian plugin that would handle RAG implementation for the vault, processing an d chunking markdown files, including metadata like tags that you could give the AI to be able to do a rag query on a subset of the notes. If we did it this way I guess what you'd do is run a web backend for the plugin that connects to a database, and give it an endpoint that webui can query for retrieval. Similar to how it does with ollama, webui would ask the endpoint for context given an input, and the endpoint would determine what to send back which would be prepended to the user's query for the ollama (or openai) endpoint. Is that right? I imagine there are many other custom use-cases people could have, and I think it would be awesome if the project allowed for them. Obviously it would be cool to have a system that you can customize within the app itself, but it might be a daunting and never-ending task to include every permutation of thing that people might want to do with RAG, especially considering how new the field is and the lack of best and established practices.

GiteaMirror commented

2025-11-11 14:09:45 -06:00

@happysalada commented on GitHub (Jan 18, 2024):

hey, just contributing my 2c.
I completely agree with having an external query endpoint, that would be the most flexible and simple approach for contributors. It doesn't help people who want something that "just works" out of the box, but there are so many subtleties in RAG that I'm not sure things will just work without tweaking anyway.

Another data point to add is that some databases (meilisearch) have launched mixed semantic and keyword based search, which I think will improve some results.

The question becomes though, what will be the shape of the http query to the separate endpoint. In my experience vector dbs (qdrant, and the rest) have slightly different interfaces. Abstracting in one layer might be difficult.
Or maybe you just make a choice to only support a handful ?

@happysalada commented on GitHub (Jan 18, 2024): hey, just contributing my 2c. I completely agree with having an external query endpoint, that would be the most flexible and simple approach for contributors. It doesn't help people who want something that "just works" out of the box, but there are so many subtleties in RAG that I'm not sure things will just work without tweaking anyway. Another data point to add is that some databases (meilisearch) have launched mixed semantic and keyword based search, which I think will improve some results. The question becomes though, what will be the shape of the http query to the separate endpoint. In my experience vector dbs (qdrant, and the rest) have slightly different interfaces. Abstracting in one layer might be difficult. Or maybe you just make a choice to only support a handful ?

GiteaMirror commented

2025-11-11 14:09:45 -06:00

@turnercore commented on GitHub (Jan 19, 2024):

I imagine you would have a standard shape for the request and expected response, and if you wanted to write a program that could receive the request the give the response then that program would handle the database query differences. But, what shape the request should have is still in question. Probably it should have something standard like ‘query’ with your search, and then a more open ‘options’ field or something. It still could be difficult to get the LLM to give you exactly what you want for the custom endpoint without knowing ahead of time all the things you might want.

Alternatively along with the custom endpoint, there could be a schema field or something to instruct the LLM to send the data you want. For example, you give it a custom endpoint and provide it a schema like { query: string, tags: string[], …other options… }

On Jan 18, 2024, at 8:57 AM, Yt @.***> wrote:

hey, just contributing my 2c.
I completely agree with having an external query endpoint, that would be the most flexible and simple approach for contributors. It doesn't help people who want something that "just works" out of the box, but there are so many subtleties in RAG that I'm not sure things will just work without tweaking anyway.

Another data point to add is that some databases (meilisearch) have launched mixed semantic and keyword based search, which I think will improve some results.

The question becomes though, what will be the shape of the http query to the separate endpoint. In my experience vector dbs (qdrant, and the rest) have slightly different interfaces. Abstracting in one layer might be difficult.
Or maybe you just make a choice to only support a handful ?

—
Reply to this email directly, view it on GitHub https://github.com/ollama-webui/ollama-webui/issues/460#issuecomment-1898639040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUNB57BOBHG4EXI4UE5ETSLYPEZ53AVCNFSM6AAAAABBYVKL5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGYZTSMBUGA.
You are receiving this because you commented.

@turnercore commented on GitHub (Jan 19, 2024): I imagine you would have a standard shape for the request and expected response, and if you wanted to write a program that could receive the request the give the response then that program would handle the database query differences. But, what shape the request should have is still in question. Probably it should have something standard like ‘query’ with your search, and then a more open ‘options’ field or something. It still could be difficult to get the LLM to give you exactly what you want for the custom endpoint without knowing ahead of time all the things you might want. Alternatively along with the custom endpoint, there could be a schema field or something to instruct the LLM to send the data you want. For example, you give it a custom endpoint and provide it a schema like { query: string, tags: string[], …other options… } > On Jan 18, 2024, at 8:57 AM, Yt ***@***.***> wrote: > > > hey, just contributing my 2c. > I completely agree with having an external query endpoint, that would be the most flexible and simple approach for contributors. It doesn't help people who want something that "just works" out of the box, but there are so many subtleties in RAG that I'm not sure things will just work without tweaking anyway. > > Another data point to add is that some databases (meilisearch) have launched mixed semantic and keyword based search, which I think will improve some results. > > The question becomes though, what will be the shape of the http query to the separate endpoint. In my experience vector dbs (qdrant, and the rest) have slightly different interfaces. Abstracting in one layer might be difficult. > Or maybe you just make a choice to only support a handful ? > > — > Reply to this email directly, view it on GitHub <https://github.com/ollama-webui/ollama-webui/issues/460#issuecomment-1898639040>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUNB57BOBHG4EXI4UE5ETSLYPEZ53AVCNFSM6AAAAABBYVKL5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGYZTSMBUGA>. > You are receiving this because you commented. >

GiteaMirror commented

2025-11-11 14:09:46 -06:00

@romainfd commented on GitHub (Jan 20, 2024):

That's exactly what I was suggesting (be able to customize a few parameters for the "normal" user and, for the power user, be able to register a custom endpoint to handle your own logic.
The challenge would indeed be about the shape of the request/response. Perhaps, we could investigate what the Langchain Custom Retriever allows? I think this approach offers a lot of flexibility as it let's the power user define its entire load/split/embed/store/retrieve logic so we don't have to handle all the current (and future) diversity in these fields.
I think a first approach with one endpoint with just the query returning a list of Document (or even a string concatenation of them) could be interesting?
The we could extend the request parameters, add another endpoint to be able to list the stored documents in the web ui, ...

@romainfd commented on GitHub (Jan 20, 2024): That's exactly what I was suggesting (be able to customize a few parameters for the "normal" user and, for the power user, be able to register a custom endpoint to handle your own logic. The challenge would indeed be about the shape of the request/response. Perhaps, we could investigate what the [Langchain Custom Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/#custom-retriever) allows? I think this approach offers a lot of flexibility as it let's the power user define its entire load/split/embed/store/retrieve logic so we don't have to handle all the current (and future) diversity in these fields. I think a first approach with one endpoint with just the query returning a list of Document (or even a string concatenation of them) could be interesting? The we could extend the request parameters, add another endpoint to be able to list the stored documents in the web ui, ...

GiteaMirror commented

2025-11-11 14:09:46 -06:00

@turnercore commented on GitHub (Jan 20, 2024):

Using Langchain as the expected standard interface sounds like a great idea, it makes the most sense for people wanting to do complicated implementations. I imagine if you're using a custom endpoint then you would be expected to handle the load/split/embed/store/retrieve logic, but it would be nice to create the endpoints for it to talk back to webui to retrieve the data from there in case the user did not want to set up their own external store.

On one hand I think the return to webui should be simple, like just a string, so people writing the langchain portion of it know what to send back and how it will be handled by webui (I'm assuming the way it handles it is to insert the context string above the user's query, but let me know if I'm wrong about that assumption). However, I do see where you're going with the list of Documents. I think it would be very useful to have webui know about what documents have been referenced to provide links to the user so they can do further research. This citing of sources is often missing from current AI implementation, and I would really like to see it here.

What might be a way to do it is have the langchain (or whatever) endpoint return the context string, and also some optional metadata about the result, such as the list of documents or chunks referenced and optionally the location of where to find these documents. That way if the location was off server (like a url) you could get a link to the web page or document wherever it is, and if we added webui endpoint for listing/displaying the user's locally stored documents, then webui could display that document without leaving the ui.

One thing I am struggling with conceiving how to do is that I know the way I want to use it involves using tags for the document to more specifically search the RAG database, but I'm not sure how to have the user or the ai generate the tags that it wants to search for data in. How do we allow these tags to get sent to the external RAG implementation. One way, I suppose, would be to have the external RAG again handle figuring out the tags, so webui just sends the user's query and asks for context, when the RAG system gets a query it can use ai to determine the tags it would like to search the database for. This would allow the user to do nothing and it be handled intelegently, or put 'search using the tabletop_rpg tag' and the ai would probably pick up on that fine. I may have talked myself through that 🦆, but I'll leave it here in case the thought process helps and shows how flexible an external endpoint could be.

@turnercore commented on GitHub (Jan 20, 2024): Using Langchain as the expected standard interface sounds like a great idea, it makes the most sense for people wanting to do complicated implementations. I imagine if you're using a custom endpoint then you would be expected to handle the load/split/embed/store/retrieve logic, but it would be nice to create the endpoints for it to talk back to webui to retrieve the data from there in case the user did not want to set up their own external store. On one hand I think the return to webui should be simple, like just a string, so people writing the langchain portion of it know what to send back and how it will be handled by webui (I'm assuming the way it handles it is to insert the context string above the user's query, but let me know if I'm wrong about that assumption). However, I do see where you're going with the list of Documents. I think it would be very useful to have webui know about what documents have been referenced to provide links to the user so they can do further research. This citing of sources is often missing from current AI implementation, and I would really like to see it here. What might be a way to do it is have the langchain (or whatever) endpoint return the context string, and also some optional metadata about the result, such as the list of documents or chunks referenced and optionally the location of where to find these documents. That way if the location was off server (like a url) you could get a link to the web page or document wherever it is, and if we added webui endpoint for listing/displaying the user's locally stored documents, then webui could display that document without leaving the ui. *** One thing I am struggling with conceiving how to do is that I know the way I want to use it involves using tags for the document to more specifically search the RAG database, but I'm not sure how to have the user or the ai generate the tags that it wants to search for data in. How do we allow these tags to get sent to the external RAG implementation. One way, I suppose, would be to have the external RAG again handle figuring out the tags, so webui just sends the user's query and asks for context, when the RAG system gets a query it can use ai to determine the tags it would like to search the database for. This would allow the user to do nothing and it be handled intelegently, or put 'search using the tabletop_rpg tag' and the ai would probably pick up on that fine. I may have talked myself through that 🦆, but I'll leave it here in case the thought process helps and shows how flexible an external endpoint could be.

GiteaMirror commented

2025-11-11 14:09:46 -06:00

@romainfd commented on GitHub (Jan 20, 2024):

The idea behind the list of Documents was also based on the langchain Retriever signature. Currently, the query/ endpoint on Chroma returns a QueryResult which is defined here.

Currently, it's indeed only eventually (in the front-end) that the retrieved results are merged (details here) (you can indeed see the console.log when using it). Then it is just inserted as context in the prompt.

So the idea would just be to replace the query/ endpoint to use one targeting our own implementation but still returning the same results so nothing else needs to be changed.

The reason why it is not listing the sources is because the current implementation works a bit differently. You must use '#' to list the documents you are gonna search. Whereas I was more thinking about just sending the query / not letting the UI know the documents you have in your store and search the entire store. The two approaches are a bit different so not sure it is the correct way to go. We would need some feedback from the team I guess

I think it should be handled on the service side indeed as it is a bit custom but QueryResult contains metadata or other fields that could be used for that I think

@romainfd commented on GitHub (Jan 20, 2024): The idea behind the list of Documents was also based on the langchain Retriever signature. Currently, the [query/ endpoint](https://github.com/ollama-webui/ollama-webui/blob/f079cb6b563145f664c746cc4a96cc782699c4f2/backend/apps/rag/main.py#L96-L115) on [Chroma returns a QueryResult](https://docs.trychroma.com/reference/Collection#query) which is defined [here](https://github.com/chroma-core/chroma/blob/b5dc65fcacafc2c6bfc6450645ed6e9447ba400a/chromadb/api/types.py#L162). Currently, it's indeed only eventually (in the front-end) that the retrieved results are merged (details [here](https://github.com/ollama-webui/ollama-webui/blob/f079cb6b563145f664c746cc4a96cc782699c4f2/src/routes/(app)/%2Bpage.svelte#L235-L251)) (you can indeed see the console.log when using it). Then it is just [inserted](https://github.com/ollama-webui/ollama-webui/blob/main/src/lib/utils/rag/index.ts) as context in the prompt. So the idea would just be to replace the query/ endpoint to use one targeting our own implementation but still returning the same results so nothing else needs to be changed. The reason why it is not listing the sources is because the current implementation works a bit differently. You must use '#' to list the documents you are gonna search. Whereas I was more thinking about just sending the query / not letting the UI know the documents you have in your store and search the entire store. The two approaches are a bit different so not sure it is the correct way to go. We would need some feedback from the team I guess --- I think it should be handled on the service side indeed as it is a bit custom but `QueryResult` contains metadata or other fields that could be used for that I think

GiteaMirror commented

2025-11-11 14:09:46 -06:00

@justinh-rahb commented on GitHub (Jan 24, 2024):

Could there be some useful snippets to lift from here?

https://github.com/nlmatics/nlm-ingestor

@justinh-rahb commented on GitHub (Jan 24, 2024): Could there be some useful snippets to lift from here? https://github.com/nlmatics/nlm-ingestor

GiteaMirror commented

2025-11-11 14:09:47 -06:00

@jpgrace commented on GitHub (Jan 26, 2024):

My $0.02. I'd like to be able to search the documents, have a summary response generated, and then links to the source material. One use case I'm imagining is a RAG consisting of all the messages in a Slack channel. A user ask a question that has already been answered and get back an answer immediately. The summary response would serve to answer the user's question and the link could be back to the original message in the Slack channel. This would add more context to the answer.

@jpgrace commented on GitHub (Jan 26, 2024): My $0.02. I'd like to be able to search the documents, have a summary response generated, and then links to the source material. One use case I'm imagining is a RAG consisting of all the messages in a Slack channel. A user ask a question that has already been answered and get back an answer immediately. The summary response would serve to answer the user's question and the link could be back to the original message in the Slack channel. This would add more context to the answer.

GiteaMirror commented

2025-11-11 14:09:47 -06:00

@horiacristescu commented on GitHub (Feb 3, 2024):

I recently uploaded about 7000 of my own comments in a file, and when trying to find a reference it failed. Chunk size 1500 might explain the problem, because comments are not that long. I would have preferred chunking based on new lines or some other separator. As it stands, it can't find references in long chunks. Maybe I could cluster the input text by topic so chunks would be more focused.

@horiacristescu commented on GitHub (Feb 3, 2024): I recently uploaded about 7000 of my own comments in a file, and when trying to find a reference it failed. Chunk size 1500 might explain the problem, because comments are not that long. I would have preferred chunking based on new lines or some other separator. As it stands, it can't find references in long chunks. Maybe I could cluster the input text by topic so chunks would be more focused.

GiteaMirror commented

2025-11-11 14:09:48 -06:00

@jukofyork commented on GitHub (Feb 3, 2024):

I recently uploaded about 7000 of my own comments in a file, and when trying to find a reference it failed. Chunk size 1500 might explain the problem, because comments are not that long. I would have preferred chunking based on new lines or some other separator. As it stands, it can't find references in long chunks. Maybe I could cluster the input text by topic so chunks would be more focused.

Yeah, this is a bit like what I experienced in the OP, but without being able to experiment it's hard to say if it is the chunk size or not - top many small chunks might even have the opposite effect?

@jukofyork commented on GitHub (Feb 3, 2024): > I recently uploaded about 7000 of my own comments in a file, and when trying to find a reference it failed. Chunk size 1500 might explain the problem, because comments are not that long. I would have preferred chunking based on new lines or some other separator. As it stands, it can't find references in long chunks. Maybe I could cluster the input text by topic so chunks would be more focused. Yeah, this is a bit like what I experienced in the OP, but without being able to experiment it's hard to say if it is the chunk size or not - top many small chunks might even have the opposite effect?

GiteaMirror commented

2025-11-11 14:09:48 -06:00

@Nidvogr commented on GitHub (Feb 5, 2024):

It really depends on the model used as well, if it can process the extra context and "Needle in the haystack" or not. Just changing the prompt might have a big effect on this, or using a finetuned model for rag.

Having support for different embedding models as well, or just hardcoding them and then having to keep up with the best new ones from time to time as an option (like choosing between minilm-l6-v2 or bge-large or whatnot.

Finding chunks themselves based on cosine similarity, rerank or the model embeddings via semantic similarity are typical different options which can make sense to change as well.

Personally I use something like vectara and just use an endpoint like suggested. But having local embedding models and then support the use of RAG AND having it customizable would be great (but a lot of work).

I think just exposing the "number of chunks" and "chunk size" would be nice to begin with, and then maybe supporting editing the template for how the context gets integrated into the query to the LLM, since that can affect how effective the LLM can find the information a lot.

@Nidvogr commented on GitHub (Feb 5, 2024): It really depends on the model used as well, if it can process the extra context and "Needle in the haystack" or not. Just changing the prompt might have a big effect on this, or using a finetuned model for rag. Having support for different embedding models as well, or just hardcoding them and then having to keep up with the best new ones from time to time as an option (like choosing between minilm-l6-v2 or bge-large or whatnot. Finding chunks themselves based on cosine similarity, rerank or the model embeddings via semantic similarity are typical different options which can make sense to change as well. Personally I use something like vectara and just use an endpoint like suggested. But having local embedding models and then support the use of RAG AND having it customizable would be great (but a lot of work). I think just exposing the "number of chunks" and "chunk size" would be nice to begin with, and then maybe supporting editing the template for how the context gets integrated into the query to the LLM, since that can affect how effective the LLM can find the information a lot.

GiteaMirror commented

2025-11-11 14:09:48 -06:00

@tjbck commented on GitHub (Feb 18, 2024):

You can now adjust RAG params from documents page! As for the external rag pipeline support, let's continue our discussion here: https://github.com/open-webui/pipelines/issues/15

@tjbck commented on GitHub (Feb 18, 2024): You can now adjust RAG params from documents page! As for the external rag pipeline support, let's continue our discussion here: https://github.com/open-webui/pipelines/issues/15 ![image](https://github.com/open-webui/open-webui/assets/25473318/3fa8666f-f237-4873-9d34-cf93326f5da6)

GiteaMirror commented

2025-11-11 14:09:49 -06:00

@jannikstdl commented on GitHub (Feb 18, 2024):

Having support for different embedding models as well, or just hardcoding them and then having to keep up with the best new ones from time to time as an option (like choosing between minilm-l6-v2 or bge-large or whatnot.

Agree, i pushed a PR open-webui/open-webui#772 with the ability to change to a different sentence-transformer embedding model in the Dockerfile.
This might improve the RAG for some (For me in german, since i use the multilingual model intfloat/e5-mistral-7b-instruct maybe for @jukofyork case aswell).
You can try that once its merged.

based on the initial question of this issue

I've just been trying out the new Rag mode with some different PDFs and one test I did was add in the game rules for a board game called "Valor & Victory" and then quizzed several LLMs about the sequence of play.

They all seemed to get confused and completely miss out the "Move" phase and outright claimed it didn't exist!

CHUNK_SIZE and CHUCK_OVERLAP COULD be effecting the quality but there are way more parts what make up a good Retrievalin RAG. (Like the Embedding Model and the RAG Template).

Technically CHUNK_SIZE is the size of texts the docs are splitted and stored in the vectordb (and retrieved, in Open WebUI the top 4 best CHUNKS are send back) and CHUCK_OVERLAP the size of the overlap of the texts to not cut the text straight off and give connections between the chunks.

But if the semantic search (made with the embedding model) based on your query triggering RAG don't get what you exactly want and retrieved the wrong chunks you wouldn't get the info you want out of the docs no matter what CHUNK_SIZE you set.

Same for the RAG Template, this has to do a lot of prompt engineering. Maybe try to harden the template and give the LLM a instruction like "don't make up things; give me EXACTLY the info provided in the context" ..yes even caps log helps sometimes based on my testing.

@jannikstdl commented on GitHub (Feb 18, 2024): > Having support for different embedding models as well, or just hardcoding them and then having to keep up with the best new ones from time to time as an option (like choosing between minilm-l6-v2 or bge-large or whatnot. Agree, i pushed a PR open-webui/open-webui#772 with the ability to change to a different sentence-transformer embedding model in the Dockerfile. This might improve the RAG for some (For me in german, since i use the multilingual model [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) maybe for @jukofyork case aswell). You can try that once its merged. based on the initial question of this issue >I've just been trying out the new Rag mode with some different PDFs and one test I did was add in the game rules for a board game called "Valor & Victory" and then quizzed several LLMs about the sequence of play. > >They all seemed to get confused and completely miss out the "Move" phase and outright claimed it didn't exist! `CHUNK_SIZE` and `CHUCK_OVERLAP` COULD be effecting the quality but there are way more parts what make up a good Retrievalin RAG. (Like the Embedding Model and the RAG Template). Technically `CHUNK_SIZE` is the size of texts the docs are splitted and stored in the vectordb (and retrieved, in Open WebUI the top 4 best CHUNKS are send back) and `CHUCK_OVERLAP` the size of the overlap of the texts to not cut the text straight off and give connections between the chunks. But if the semantic search (made with the embedding model) based on your query triggering RAG don't get what you exactly want and retrieved the wrong chunks you wouldn't get the info you want out of the docs no matter what `CHUNK_SIZE `you set. Same for the RAG Template, this has to do a lot of prompt engineering. Maybe try to harden the template and give the LLM a instruction like "don't make up things; give me EXACTLY the info provided in the context" ..yes even caps log helps sometimes based on my testing.

GiteaMirror commented

2025-11-11 14:09:49 -06:00

@mkhludnev commented on GitHub (May 15, 2024):

vector dbs (qdrant

fyi open-webui/open-webui#2269

@mkhludnev commented on GitHub (May 15, 2024): > vector dbs (qdrant fyi open-webui/open-webui#2269

GiteaMirror commented

2025-11-11 14:09:49 -06:00

@sir3mat commented on GitHub (Sep 2, 2024):

@jannikstdl the CHUNK_SIZE and CHUCK_OVERLAP refers to token count or text character count in openwebui?

@sir3mat commented on GitHub (Sep 2, 2024): @jannikstdl the CHUNK_SIZE and CHUCK_OVERLAP refers to token count or text character count in openwebui?

GiteaMirror commented

2025-11-11 14:09:50 -06:00

@jannikstdl commented on GitHub (Sep 3, 2024):

@jannikstdl the CHUNK_SIZE and CHUCK_OVERLAP refers to token count or text character count in openwebui?

No, it's exactly what it's named.

https://dev.to/peterabel/what-chunk-size-and-chunk-overlap-should-you-use-4338

@jannikstdl commented on GitHub (Sep 3, 2024): > @jannikstdl the CHUNK_SIZE and CHUCK_OVERLAP refers to token count or text character count in openwebui? > > No, it's exactly what it's named. https://dev.to/peterabel/what-chunk-size-and-chunk-overlap-should-you-use-4338

GiteaMirror referenced this issue

2025-11-11 17:13:43 -06:00

[PR #174] [MERGED] doc: roadmap updated #6988

GiteaMirror referenced this issue

2026-04-20 02:48:31 -05:00

[PR #174] [MERGED] doc: roadmap updated #20192

GiteaMirror referenced this issue

2026-04-25 09:59:35 -05:00

[PR #174] [MERGED] doc: roadmap updated #35822

GiteaMirror referenced this issue

2026-04-29 17:22:38 -05:00

[PR #174] [MERGED] doc: roadmap updated #43240