Quantized Embeddings #1220

Closed
opened 2025-11-11 14:40:25 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @snadeem1362 on GitHub (Jun 12, 2024).

Is your feature request related to a problem? Please describe.
Hello... we are using open-webui RAG for our company documents. we have almost 500GBs of data. Our system is generating embeddings and it is resulting in huge size of chroma db file. it is too much time consuming and require HW as well.
we are also incorporating wikipedia articles. Is there any builtin solution available so that we dont have to regenerate embeddings?
At the moment all-minilm-L6-v2 is generating 12GB of charoma DB against 1GB of wikipedia text.

Describe the solution you'd like
Can we add a feature to enable Quantize Embeddings. Binary quantization reduces size and speedup the retreival process.

Originally created by @snadeem1362 on GitHub (Jun 12, 2024). **Is your feature request related to a problem? Please describe.** Hello... we are using open-webui RAG for our company documents. we have almost 500GBs of data. Our system is generating embeddings and it is resulting in huge size of chroma db file. it is too much time consuming and require HW as well. we are also incorporating wikipedia articles. Is there any builtin solution available so that we dont have to regenerate embeddings? At the moment all-minilm-L6-v2 is generating 12GB of charoma DB against 1GB of wikipedia text. **Describe the solution you'd like** Can we add a feature to enable Quantize Embeddings. Binary quantization reduces size and speedup the retreival process.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1220