feat: Add support for ZIM files #1461

Closed
opened 2025-11-11 14:45:44 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @suncloudsmoon on GitHub (Jul 7, 2024).

Is your feature request related to a problem? Please describe.
The issue with a lot of LLMs is that they hallucinate if you ask them about obscure topics.

Describe the solution you'd like
There has been plenty of research that highlights the benefits of using Wikipedia in RAG to reduce hallucinations and increase the truthfulness of LLMs. Since Wikipedia is distributed as a single ZIM file via the ZIM format, I propose that Open WebUI should implement functionality to use ZIM files for RAG. One possible implementation route is to use the python binding of libzim to search for titles or do full text search (supported in some ZIM files) and use that information for RAG.

Describe alternatives you've considered
An alternative is to use the web search capability in Open WebUI for RAG. The drawback of web search is that it doesn't work offline and it doesn't always have the best information on a given topic (potential for misinformation). Another alternative solution is to create a pipeline addon that implements the above functionality.

Additional context
N/A

Originally created by @suncloudsmoon on GitHub (Jul 7, 2024). **Is your feature request related to a problem? Please describe.** The issue with a lot of LLMs is that they hallucinate if you ask them about obscure topics. **Describe the solution you'd like** There has been plenty of research that highlights [the benefits of using Wikipedia in RAG](https://github.com/stanford-oval/WikiChat) to reduce hallucinations and increase the truthfulness of LLMs. Since Wikipedia is distributed as a single ZIM file via the [ZIM format](https://wiki.openzim.org/wiki/OpenZIM), I propose that Open WebUI should implement functionality to use ZIM files for RAG. One possible implementation route is to use the [python binding of libzim](https://github.com/openzim/python-libzim) to search for titles or do full text search (supported in some ZIM files) and use that information for RAG. **Describe alternatives you've considered** An alternative is to use the web search capability in Open WebUI for RAG. The drawback of web search is that it doesn't work offline and it doesn't always have the best information on a given topic (potential for misinformation). Another alternative solution is to create a pipeline addon that implements the above functionality. **Additional context** N/A
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1461