[PR #772] [MERGED] feat: choose embedding model when using docker #7253

Closed
opened 2025-11-11 17:21:33 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/772
Author: @jannikstdl
Created: 2/17/2024
Status: Merged
Merged: 2/19/2024
Merged by: @tjbck

Base: mainHead: choose-embedding-model


📝 Commits (7)

  • 1846c1e choose embedding model when using docker
  • bc3dd34 collection query fix
  • 4b88e7e Merge branch 'main' into choose-embedding-model
  • 0cb0358 refac: more descriptive var names
  • acf9990 storing vectordb in project cache folder + device types
  • ab104d5 refac
  • 7c127c3 feat: dynamic embedding model load

📊 Changes

4 files changed (+87 additions, -17 deletions)

View changed files

📝 Dockerfile (+19 -4)
📝 backend/apps/audio/main.py (+1 -1)
📝 backend/apps/rag/main.py (+61 -11)
📝 backend/config.py (+6 -1)

📄 Description

Changes

  • Added the functionallity to change the default embedding model used (all-MiniLM-L6-v2) in the Dockerfile as an ENV variable
  • You can now use any sentence-transformer embedding model which can be found here: https://huggingface.co/models?library=sentence-transformers
  • default model all-MiniLM-L6-v2 has low performance nowerdays and supports only english:
    image
  • all-MiniLM-L6-v2 is still default, please read the important info down below!
  • You can change the model to intfloat/multilingual-e5-large which is one of the most powerful embedding models aviable (You can also use the instruct version of the multilingual-e5-large intfloat/multilingual-e5-large-instruct which is smaller and almost better as the latest from OpenAI "text-embedding-large-3")
    image
  • Embedding models are preloaded in the Dockerfile (like the Whisper TTS model) for "no-internet" support.
  • deleted some unused imports in the RAG main.py

Improvements
In my local testing (in german) the output with intfloat/multilingual-e5-large as the embedding model are way more accurate to my question given to the LLM.
Also the load times of the RAG were much shorter, i don't know why and whether that is the case for you too. Maybe you can test this and give feedback.

Open Points

  • Support not only for sentence-transformer models (e.g. bert)
  • Store the embedding model in /app/backend/data/cache
    for now the model dir should be ~/.cache/torch/sentence_transformers which is not in the PVC
    I didn't find a parameter to change the location here sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=SENTENCE_TRANSFORMER_EMBED_MODEL)
    Insted of giving the model name as a string to be downloaded you can also give the param model_name a path. Maybe this would be the way to go.
  • there are parameters fot the embedding model and whisper tts model in the preloading to set the device type to cpu (our standard for both) ot cuda which can lead to better performance when using nvida gpus. We could also set a ENV to change that to the users specific needs
  • Maybe implementing an "Auto-re-embed" when the embedding model changes

Important Information

If you have some documents stored under the /documents route, changing the embedding model will cause the backend to not be able to read the files. So if you have a lot of docs beeing used for RAG let that by default or re-embed your files.
I also mentianed this in a Dockerfile comment.

@tjbck i don't know if
# wget embedding model weight from alpine (does not exist from slim-buster) RUN wget "https://chroma-onnx-models.s3.amazonaws.com/all-MiniLM-L6-v2/onnx.tar.gz" -O - | \ tar -xzf - -C /app
in the Dockerfile was ever used, but with this update the all-MiniLM-L6-v2 is declared in the ENV so it is obsolete imo.
If this is the case, deleting this RUN statement would save some buildtime + image size.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/772 **Author:** [@jannikstdl](https://github.com/jannikstdl) **Created:** 2/17/2024 **Status:** ✅ Merged **Merged:** 2/19/2024 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `main` ← **Head:** `choose-embedding-model` --- ### 📝 Commits (7) - [`1846c1e`](https://github.com/open-webui/open-webui/commit/1846c1e80dc597d83ad70759742abce67884c0e0) choose embedding model when using docker - [`bc3dd34`](https://github.com/open-webui/open-webui/commit/bc3dd34d8b7980668aa97041d804a84bc3e24e65) collection query fix - [`4b88e7e`](https://github.com/open-webui/open-webui/commit/4b88e7e44ff948c47266eb76b41d04e3313dea77) Merge branch 'main' into choose-embedding-model - [`0cb0358`](https://github.com/open-webui/open-webui/commit/0cb035848531aea96d20882779ab9b80d028ca48) refac: more descriptive var names - [`acf9990`](https://github.com/open-webui/open-webui/commit/acf999013bbf9d5d9e41596dcbfc79c4d1288ae1) storing vectordb in project cache folder + device types - [`ab104d5`](https://github.com/open-webui/open-webui/commit/ab104d5905105ac62d9d4502573c859073aef991) refac - [`7c127c3`](https://github.com/open-webui/open-webui/commit/7c127c35fcb52cdd29d05ea1dc734ad170dc96f3) feat: dynamic embedding model load ### 📊 Changes **4 files changed** (+87 additions, -17 deletions) <details> <summary>View changed files</summary> 📝 `Dockerfile` (+19 -4) 📝 `backend/apps/audio/main.py` (+1 -1) 📝 `backend/apps/rag/main.py` (+61 -11) 📝 `backend/config.py` (+6 -1) </details> ### 📄 Description **Changes** - Added the functionallity to change the default embedding model used (all-MiniLM-L6-v2) in the Dockerfile as an ENV variable - You can now use any sentence-transformer embedding model which can be found here: https://huggingface.co/models?library=sentence-transformers - default model `all-MiniLM-L6-v2` has low performance nowerdays and supports only english: ![image](https://github.com/open-webui/open-webui/assets/69747628/1f6f375d-4697-4169-8708-955cbe13812f) - `all-MiniLM-L6-v2` is still default, please read the important info down below! - You can change the model to `intfloat/multilingual-e5-large` which is one of the most powerful embedding models aviable (You can also use the instruct version of the multilingual-e5-large `intfloat/multilingual-e5-large-instruct` which is smaller and almost better as the latest from OpenAI ["text-embedding-large-3"](https://openai.com/blog/new-embedding-models-and-api-updates)) ![image](https://github.com/open-webui/open-webui/assets/69747628/bf3b1e0d-d164-42a0-b372-790423144a54) - Embedding models are preloaded in the Dockerfile (like the Whisper TTS model) for "no-internet" support. - deleted some unused imports in the RAG `main.py` **Improvements** In my local testing (in german) the output with `intfloat/multilingual-e5-large` as the embedding model are way more accurate to my question given to the LLM. Also the load times of the RAG were much shorter, i don't know why and whether that is the case for you too. Maybe you can test this and give feedback. **Open Points** - Support not only for sentence-transformer models (e.g. bert) - Store the embedding model in `/app/backend/data/cache` for now the model dir should be `~/.cache/torch/sentence_transformers` which is not in the PVC I didn't find a parameter to change the location here `sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=SENTENCE_TRANSFORMER_EMBED_MODEL)` Insted of giving the model name as a string to be downloaded you can also give the param model_name a path. Maybe this would be the way to go. - there are parameters fot the embedding model and whisper tts model in the preloading to set the device type to `cpu` (our standard for both) ot `cuda` which can lead to better performance when using nvida gpus. We could also set a ENV to change that to the users specific needs - Maybe implementing an "Auto-re-embed" when the embedding model changes ### **Important Information** If you have some documents stored under the `/documents` route, changing the embedding model will cause the backend to not be able to read the files. So if you have a lot of docs beeing used for RAG let that by default or re-embed your files. I also mentianed this in a Dockerfile comment. @tjbck i don't know if `# wget embedding model weight from alpine (does not exist from slim-buster) RUN wget "https://chroma-onnx-models.s3.amazonaws.com/all-MiniLM-L6-v2/onnx.tar.gz" -O - | \ tar -xzf - -C /app ` in the Dockerfile was ever used, but with this update the `all-MiniLM-L6-v2` is declared in the ENV so it is obsolete imo. If this is the case, deleting this RUN statement would save some buildtime + image size. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-11 17:21:33 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#7253