[GH-ISSUE #6623] nvidia/NV-Embed-v2 support #4168

Open
opened 2026-04-12 15:05:37 -05:00 by GiteaMirror · 15 comments
Owner

Originally created by @youxiaoxing on GitHub (Sep 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6623

Can you support the NVIDIA/NV-Embed-v2 model?
https://huggingface.co/nvidia/NV-Embed-v2

Originally created by @youxiaoxing on GitHub (Sep 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6623 Can you support the NVIDIA/NV-Embed-v2 model? https://huggingface.co/nvidia/NV-Embed-v2
GiteaMirror added the model label 2026-04-12 15:05:37 -05:00
Author
Owner

@taowang1993 commented on GitHub (Sep 4, 2024):

+1

This is number 1 embedding model for now. I need it as well.

<!-- gh-comment-id:2327852279 --> @taowang1993 commented on GitHub (Sep 4, 2024): +1 This is number 1 embedding model for now. I need it as well.
Author
Owner

@nikhil-swamix commented on GitHub (Sep 4, 2024):

this should be priority

<!-- gh-comment-id:2329460899 --> @nikhil-swamix commented on GitHub (Sep 4, 2024): this should be priority
Author
Owner

@b commented on GitHub (Sep 11, 2024):

$ ~/development/github/ollama/ollama/llm/llama.cpp/convert_hf_to_gguf.py --model-name NV-Embed-v2 --outfile NV-Embed-v2.gguf --dry-run .
INFO:hf-to-gguf:Loading model:
ERROR:hf-to-gguf:Model NVEmbedModel is not supported
<!-- gh-comment-id:2342627396 --> @b commented on GitHub (Sep 11, 2024): ``` $ ~/development/github/ollama/ollama/llm/llama.cpp/convert_hf_to_gguf.py --model-name NV-Embed-v2 --outfile NV-Embed-v2.gguf --dry-run . INFO:hf-to-gguf:Loading model: ERROR:hf-to-gguf:Model NVEmbedModel is not supported ```
Author
Owner

@utk7arsh commented on GitHub (Sep 11, 2024):

+1
Has anyone tried using its embeddings? I couldn't find any user engagement online with it which is weird given its been over 12 days since its release.

<!-- gh-comment-id:2344166830 --> @utk7arsh commented on GitHub (Sep 11, 2024): +1 Has anyone tried using its embeddings? I couldn't find any user engagement online with it which is weird given its been over 12 days since its release.
Author
Owner

@sakthi-geek commented on GitHub (Sep 23, 2024):

Any update on when we can expect NV Embed V2 support?

<!-- gh-comment-id:2368933938 --> @sakthi-geek commented on GitHub (Sep 23, 2024): Any update on when we can expect NV Embed V2 support?
Author
Owner

@alew3 commented on GitHub (Oct 1, 2024):

+1

<!-- gh-comment-id:2385456039 --> @alew3 commented on GitHub (Oct 1, 2024): +1
Author
Owner

@balaji-2k1 commented on GitHub (Oct 10, 2024):

Does anyone have any idea how to use this embedding model ,I am not able to load it even in my rtx 4090 24 GB , is there way to quantize it,and use it to embed documents.

<!-- gh-comment-id:2405510542 --> @balaji-2k1 commented on GitHub (Oct 10, 2024): Does anyone have any idea how to use this embedding model ,I am not able to load it even in my rtx 4090 24 GB , is there way to quantize it,and use it to embed documents.
Author
Owner

@nikhil-swamix commented on GitHub (Oct 13, 2024):

Sad Update:
seems nvidia dont want to support
https://huggingface.co/nvidia/NV-Embed-v1/discussions/19

<!-- gh-comment-id:2409080196 --> @nikhil-swamix commented on GitHub (Oct 13, 2024): Sad Update: seems nvidia dont want to support https://huggingface.co/nvidia/NV-Embed-v1/discussions/19
Author
Owner

@smallstepman commented on GitHub (Oct 15, 2024):

That's for v1

<!-- gh-comment-id:2412631156 --> @smallstepman commented on GitHub (Oct 15, 2024): That's for v1
Author
Owner

@DangerousBerries commented on GitHub (Oct 31, 2024):

So there's no way to use it currently?

<!-- gh-comment-id:2448764856 --> @DangerousBerries commented on GitHub (Oct 31, 2024): So there's no way to use it currently?
Author
Owner

@smallstepman commented on GitHub (Oct 31, 2024):

you'd need to deploy it yourself. the nvidia guys posed a limitation by requiring using specific (not up-to-date) versions of popular packages, which I suppose, could be the limiting factor in adopting their model in Ollama:

pip uninstall -y transformer-engine
pip install torch==2.2.0
pip install transformers==4.42.4
pip install flash-attn==2.2.0
pip install sentence-transformers==2.7.0
<!-- gh-comment-id:2449841098 --> @smallstepman commented on GitHub (Oct 31, 2024): you'd need to deploy it yourself. [the nvidia guys posed a limitation by requiring using specific (not up-to-date) versions of popular packages](https://huggingface.co/nvidia/NV-Embed-v2#2-required-packages), which I suppose, could be the limiting factor in adopting their model in Ollama: ``` pip uninstall -y transformer-engine pip install torch==2.2.0 pip install transformers==4.42.4 pip install flash-attn==2.2.0 pip install sentence-transformers==2.7.0 ```
Author
Owner

@nikhil-swamix commented on GitHub (Nov 1, 2024):

thanks for guidence @smallstepman and others , i appreciate the research by @nvidia,


but hell...

lets successively open issues EVERY day 1/member on related nvidia repos, requesting the update? we purchased nvidia GPUs man, any software/AI should be free to run if it can run (it definitely runs on ollama). thats ultimate neglect to users.
these are the repo related

  1. https://github.com/NVIDIA/GenerativeAIExamples | https://github.com/NVIDIA/GenerativeAIExamples/blob/main/docs/change-model.md
  2. https://huggingface.co/nvidia/NV-Embed-v2/discussions
    i would be opening 1 every day, and link back this repo. hope your support (some might deny my strategy and call me criminal, but people are dependent on me for better embeddings, and im nvidia customer.)
<!-- gh-comment-id:2451366368 --> @nikhil-swamix commented on GitHub (Nov 1, 2024): thanks for guidence @smallstepman and others , i appreciate the research by @nvidia, --- # but hell... lets successively open issues EVERY day 1/member on related nvidia repos, requesting the update? we purchased nvidia GPUs man, any software/AI should be free to run if it can run (it definitely runs on ollama). thats ultimate neglect to users. these are the repo related 1. https://github.com/NVIDIA/GenerativeAIExamples | https://github.com/NVIDIA/GenerativeAIExamples/blob/main/docs/change-model.md 2. https://huggingface.co/nvidia/NV-Embed-v2/discussions i would be opening 1 every day, and link back this repo. hope your support (some might deny my strategy and call me criminal, but people are dependent on me for better embeddings, and im nvidia customer.)
Author
Owner

@sdy623 commented on GitHub (Nov 12, 2024):

So there's no way to use it currently?

You can load model by fp16 option
AutoModel.from_pretrained("nvidia/NV-Embed-v2", trust_remote_code=True).half()
It requires about 19GB GPU RAM

<!-- gh-comment-id:2471101115 --> @sdy623 commented on GitHub (Nov 12, 2024): > So there's no way to use it currently? You can load model by fp16 option `AutoModel.from_pretrained("nvidia/NV-Embed-v2", trust_remote_code=True).half()` It requires about 19GB GPU RAM
Author
Owner

@mmesgarpour commented on GitHub (Nov 15, 2024):

I think it is NOT top of the priority list, because it has a Creative Commons License ("CC BY-NC-ND 4.0"), which means you can't change them in any way or use them commercially.

<!-- gh-comment-id:2478751354 --> @mmesgarpour commented on GitHub (Nov 15, 2024): I think it is NOT top of the priority list, because it has a Creative Commons License ("CC BY-NC-ND 4.0"), which means you can't change them in any way or use them commercially.
Author
Owner

@rajhlinux commented on GitHub (Dec 15, 2025):

Any luck today? int8 quantization possible, can this thing fit in a 12GB VRAM RTX 3080 TI GPU? Thanks.

<!-- gh-comment-id:3654906847 --> @rajhlinux commented on GitHub (Dec 15, 2025): Any luck today? int8 quantization possible, can this thing fit in a 12GB VRAM RTX 3080 TI GPU? Thanks.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4168