[GH-ISSUE #12077] feat: Handling of a large number of knowledge base files #16459

New Issue

GiteaMirror · 2026-04-19T22:22:23-05:00

GiteaMirror commented

2026-04-19 22:22:23 -05:00

Originally created by @setuin on GitHub (Mar 26, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/12077

Check Existing Issues

I have searched the existing issues and discussions.

Problem Description

I always feel frustrated when I have to upload a large number of PDF files to the knowledge base. It would cause a lot of token waste and I can't handle too many text files either.

Desired Solution you'd like

I'm wondering how to incorporate the processing of the knowledge base into the process of establishing it, especially when dealing with a large number of PDF text files. What would be the best way to handle it?

Alternatives Considered

What I'm thinking about is to interface with the vector database. In the knowledge base area, we can use a better vector model or establish a more appropriate database to compress the data, reduce the usage of tokens or adopt a better retrieval method. I'm not quite clear about whether this is handled at the user end or the client end.

Additional Context

No response

Originally created by @setuin on GitHub (Mar 26, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/12077 ### Check Existing Issues - [ ] I have searched the existing issues and discussions. ### Problem Description I always feel frustrated when I have to upload a large number of PDF files to the knowledge base. It would cause a lot of token waste and I can't handle too many text files either. ### Desired Solution you'd like I'm wondering how to incorporate the processing of the knowledge base into the process of establishing it, especially when dealing with a large number of PDF text files. What would be the best way to handle it? ### Alternatives Considered What I'm thinking about is to interface with the vector database. In the knowledge base area, we can use a better vector model or establish a more appropriate database to compress the data, reduce the usage of tokens or adopt a better retrieval method. I'm not quite clear about whether this is handled at the user end or the client end. ### Additional Context _No response_

GiteaMirror closed this issue

2026-04-19 22:22:23 -05:00

GiteaMirror commented

2026-04-19 22:22:24 -05:00

@mahenning commented on GitHub (Mar 27, 2025):

I'm not sure what you want here. What do you mean with "token waste" when uploading PDFs?

In the knowledge base area, we can use a better vector model or establish a more appropriate database to compress the data

Do you want a separate vector database and embedding model for each knowledge? How should that "compress" the data?

how to incorporate the processing of the knowledge base into the process of establishing it

I'm even more lost here, please rephrase and explain.

Maybe you can enlighten me with examples, but right now I have no idea what exactly you want, sorry.

@mahenning commented on GitHub (Mar 27, 2025): I'm not sure what you want here. What do you mean with "token waste" when uploading PDFs? > In the knowledge base area, we can use a better vector model or establish a more appropriate database to compress the data Do you want a separate vector database and embedding model for each knowledge? How should that "compress" the data? > how to incorporate the processing of the knowledge base into the process of establishing it I'm even more lost here, please rephrase and explain. Maybe you can enlighten me with examples, but right now I have no idea what exactly you want, sorry.

GiteaMirror commented

2026-04-19 22:22:25 -05:00

@ivanbaldo commented on GitHub (Apr 14, 2025):

Three weeks and no clarification, should be clarified or closed...

@ivanbaldo commented on GitHub (Apr 14, 2025): Three weeks and no clarification, should be clarified or closed...

GiteaMirror commented

2026-04-19 22:22:26 -05:00

@Mariano215 commented on GitHub (Aug 10, 2025):

I'm having a similar issue. The issue is that if I upload many documents into a knowledgebase the query ends up blowing past the token limit because - it seems - that the returned results from many documents are not ranked and limited to only the top choices. It seems to send EVERYTHING it finds as a potential match back to the LLM as context and we get an error.

There should be some functionality that limits the RAG retrieval to only the top choices. Even better, allow for hybrid rag, not just cosign similarity?

@Mariano215 commented on GitHub (Aug 10, 2025): I'm having a similar issue. The issue is that if I upload many documents into a knowledgebase the query ends up blowing past the token limit because - it seems - that the returned results from many documents are not ranked and limited to only the top choices. It seems to send EVERYTHING it finds as a potential match back to the LLM as context and we get an error. There should be some functionality that limits the RAG retrieval to only the top choices. Even better, allow for hybrid rag, not just cosign similarity?

GiteaMirror commented

2026-04-19 22:22:26 -05:00

@mahenning commented on GitHub (Aug 11, 2025):

There is already the option for hybrid search with a reranker and the top k (embedding) and top k reranker (for the reranker) option to limit the number of results. Note that currently this limit is per query, and typically 3 queries are created per user request in the chat to retrieve information. Maybe try with a lower top k reranker number?

@mahenning commented on GitHub (Aug 11, 2025): There is already the option for hybrid search with a reranker and the top k (embedding) and top k reranker (for the reranker) option to limit the number of results. Note that currently this limit is per query, and typically 3 queries are created per user request in the chat to retrieve information. Maybe try with a lower top k reranker number?

GiteaMirror referenced this issue

2026-04-19 23:48:11 -05:00

[GH-ISSUE #16459] issue: Unhelpful error when OpenAI balance is depleted: “argument of type ‘JSONResponse’ is not iterable” instead of quota/exceeded message (v0.6.21) #17912

GiteaMirror referenced this issue

2026-04-25 07:21:44 -05:00

[GH-ISSUE #16459] issue: Unhelpful error when OpenAI balance is depleted: “argument of type ‘JSONResponse’ is not iterable” instead of quota/exceeded message (v0.6.21) #33441

GiteaMirror referenced this issue

2026-05-05 19:44:02 -05:00

[GH-ISSUE #16459] issue: Unhelpful error when OpenAI balance is depleted: “argument of type ‘JSONResponse’ is not iterable” instead of quota/exceeded message (v0.6.21) #56578

Sign in to join this conversation.