[GH-ISSUE #834] Bring back the EMBED feature in the Modelfile #46910

Open
opened 2026-04-28 01:59:27 -05:00 by GiteaMirror · 18 comments
Owner

Originally created by @vividfog on GitHub (Oct 18, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/834

I appreciate the effort keeping the codebase simple, Ollama is second to none in its elegance. But this was quick work removing the feature within a week without much debate if and how people use it, and is it really not valuable, or maybe it's a fantastic feature on second thought. I am going to miss this feature a lot and was highlighting it to others as an Ollama special treat. It was in daily use.

Related: #759 (feature removal), #501 (bug), #502 (documentation)

I'd like to bring some more viewpoints to this, as a heavy user who's tried everything I've gotten my hands on:

  1. User experience in comparison to alternatives was great. Ollama comes with an ecosystem of APIs and chatbots. With nothing else to install, Ollama was a one-liner RAG chatbot with multi-line support. Upstream clients needed zero configuration to get these benefits for free.
  2. The alternatives are not good without plenty of developer effort that regular people can't do. Now the users need to ramp up a client for this, and every one of them is poor in their user experience in their own ways. No match for Ollama out of the box. UX doesn't happen in a vacuum, it's in comparison to others. Ollama + any chatbot GUI + dropdown to select a RAG-model was all that was needed, but now that's no longer possible.
  3. The PrivateGPT example is no match even close, I tried it and I've tried them all, built my own RAG routines at some scale for others. All else being equal, Ollama was actually the best no-bells-and-whistles RAG routine out there, ready to run in minutes with zero extra things to install and very few to learn. "Don't make me install new things" is an important UX perspective to this.
  4. Creating embeddings was a bit of extra work, but that's unavoidable if it's generic. Again comparing to alternatives, all other methods need some work to make the embeddings too. Ollama's was easy, even if there can be an argument that "one line per embedding isn't elegant". Well it is in its simplicity. The rest is string manipulation.
  5. It was instant fast at runtime. Embeddings took a while to create, but at runtime there is no delay, it's jut as instant as without embeddings.
  6. Turns out LLMs create totally usable embeddings. Even if Llama2 or Mistral aren't embedding models on paper, they worked great in practice. I was using it daily with esoteric documents and it was fine. This was an issue in theory only.
  7. Instead of outright deletion, it really needed just some cleanup, but not immediately. Finding the root cause for what made longer ingestions not work as a single batch. Create better documentation. That's it. Then it would have been fine to park it for a long time. Even without changes it was usable, and there are always issues in a sufficiently large codebase.

I'll write this as a new issue so it can be tracked, maybe there's more feedback. Please consider bringing it back. I'm going to park to v0.1.3 tag until new killer features come along. Thanks a lot for the great work! Please ask community opinion with a clear issue headline before deprecating powerful capabilities in a breaking change, and give it a few weeks if not urgent.

Other thoughts and viewpoints welcome.

Originally created by @vividfog on GitHub (Oct 18, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/834 I appreciate the effort keeping the codebase simple, Ollama is second to none in its elegance. But this was quick work removing the feature within a week without much debate if and how people use it, and is it really not valuable, or maybe it's a fantastic feature on second thought. I am going to miss this feature a lot and was highlighting it to others as an Ollama special treat. It was in daily use. Related: #759 (feature removal), #501 (bug), #502 (documentation) I'd like to bring some more viewpoints to this, as a heavy user who's tried everything I've gotten my hands on: 1. **User experience in comparison to alternatives was great.** Ollama comes with an ecosystem of APIs and chatbots. With nothing else to install, Ollama was a one-liner RAG chatbot with multi-line support. Upstream clients needed zero configuration to get these benefits for free. 2. **The alternatives are not good without plenty of developer effort** that regular people can't do. Now the users need to ramp up a client for this, and every one of them is poor in their user experience in their own ways. No match for Ollama out of the box. UX doesn't happen in a vacuum, it's in comparison to others. Ollama + any chatbot GUI + dropdown to select a RAG-model was all that was needed, but now that's no longer possible. 3. **The PrivateGPT example is no match even close,** I tried it and I've tried them all, built my own RAG routines at some scale for others. All else being equal, Ollama was actually the best no-bells-and-whistles RAG routine out there, ready to run in minutes with zero extra things to install and very few to learn. "Don't make me install new things" is an important UX perspective to this. 4. **Creating embeddings was a bit of extra work, but that's unavoidable if it's generic.** Again comparing to alternatives, all other methods need some work to make the embeddings too. Ollama's was easy, even if there can be an argument that "one line per embedding isn't elegant". Well it is in its simplicity. The rest is string manipulation. 5. **It was instant fast at runtime.** Embeddings took a while to create, but at runtime there is no delay, it's jut as instant as without embeddings. 6. **Turns out LLMs create totally usable embeddings.** Even if Llama2 or Mistral aren't embedding models on paper, they worked great in practice. I was using it daily with esoteric documents and it was fine. This was an issue in theory only. 7. **Instead of outright deletion, it really needed just some cleanup, but not immediately.** Finding the root cause for what made longer ingestions not work as a single batch. Create better documentation. That's it. _Then it would have been fine_ to park it for a long time. Even without changes it was usable, and there are always issues in a sufficiently large codebase. I'll write this as a new issue so it can be tracked, maybe there's more feedback. Please consider bringing it back. I'm going to park to v0.1.3 tag until new killer features come along. Thanks a lot for the great work! Please ask community opinion with a clear issue headline before deprecating powerful capabilities in a breaking change, and give it a few weeks if not urgent. Other thoughts and viewpoints welcome.
GiteaMirror added the feature requestfeedback wanted labels 2026-04-28 02:00:11 -05:00
Author
Owner

@BruceMacD commented on GitHub (Oct 18, 2023):

Thanks for the great feedback here. I'm going to make sure this get seen by the rest of the maintainers also.

<!-- gh-comment-id:1768629146 --> @BruceMacD commented on GitHub (Oct 18, 2023): Thanks for the great feedback here. I'm going to make sure this get seen by the rest of the maintainers also.
Author
Owner

@jmorganca commented on GitHub (Oct 19, 2023):

Wanted to echo @BruceMacD 's comment! Thank you for opening this discussion (and for the thoughtful and heartwarming writeup). This is definitely something Ollama should make easy - let's see how this feature can be brought in as the primitives improve (embedding models, gpu acceleration, etc)

<!-- gh-comment-id:1771197945 --> @jmorganca commented on GitHub (Oct 19, 2023): Wanted to echo @BruceMacD 's comment! Thank you for opening this discussion (and for the thoughtful and heartwarming writeup). This is definitely something Ollama should make easy - let's see how this feature can be brought in as the primitives improve (embedding models, gpu acceleration, etc)
Author
Owner

@CyrilPeponnet commented on GitHub (Oct 19, 2023):

Especially with proper embedding model support coming "soon" https://github.com/ggerganov/llama.cpp/issues/2872 it would make the feature really useful.

<!-- gh-comment-id:1771404939 --> @CyrilPeponnet commented on GitHub (Oct 19, 2023): Especially with proper embedding model support coming "soon" https://github.com/ggerganov/llama.cpp/issues/2872 it would make the feature really useful.
Author
Owner

@CyrilPeponnet commented on GitHub (Oct 26, 2023):

or we could just use https://github.com/go-skynet/go-bert.cpp for the embedding part.

<!-- gh-comment-id:1781991189 --> @CyrilPeponnet commented on GitHub (Oct 26, 2023): or we could just use https://github.com/go-skynet/go-bert.cpp for the embedding part.
Author
Owner

@jtoy commented on GitHub (Nov 10, 2023):

I would love to see this back as well :)

<!-- gh-comment-id:1806451490 --> @jtoy commented on GitHub (Nov 10, 2023): I would love to see this back as well :)
Author
Owner

@snowyu commented on GitHub (Nov 28, 2023):

In fact go-bert.cpp is just a wrapper of incomplete bert.cpp.

Recommended: tokenizers-cpp is a better HF's tokenizers wrapper.

<!-- gh-comment-id:1829122598 --> @snowyu commented on GitHub (Nov 28, 2023): In fact go-bert.cpp is just a wrapper of incomplete bert.cpp. Recommended: [tokenizers-cpp](https://github.com/mlc-ai/tokenizers-cpp) is a better HF's tokenizers wrapper.
Author
Owner

@kjp-souza commented on GitHub (Dec 8, 2023):

@jmorganca, @BruceMacD, could you please explain what needs to be done to use this /embed API endpoint? I get this error now, but I could not find how to use the endpoint from the documentation:

2023/12/08 21:57:34 parser.go:59: WARNING: Unknown command: ​

Error: deprecated command: EMBED is no longer supported, use the /embed API endpoint instead

Is there a similar command that substitutes EMBED?
Thanks!!

<!-- gh-comment-id:1847805174 --> @kjp-souza commented on GitHub (Dec 8, 2023): @jmorganca, @BruceMacD, could you please explain what needs to be done to use this `/embed API endpoint`? I get this error now, but I could not find how to use the endpoint from the documentation: ``` 2023/12/08 21:57:34 parser.go:59: WARNING: Unknown command: ​ Error: deprecated command: EMBED is no longer supported, use the /embed API endpoint instead ``` **Is there a similar command that substitutes `EMBED`?** Thanks!!
Author
Owner

@sandangel commented on GitHub (Dec 11, 2023):

Hi, I found this: https://github.com/ml-explore/mlx-examples/blob/main/bert/README.md. I think this has a native support for Apple Silicon. Is it possible to replace the current llama.cpp with mlx for mac m1. ?

<!-- gh-comment-id:1849904477 --> @sandangel commented on GitHub (Dec 11, 2023): Hi, I found this: https://github.com/ml-explore/mlx-examples/blob/main/bert/README.md. I think this has a native support for Apple Silicon. Is it possible to replace the current llama.cpp with `mlx` for mac m1. ?
Author
Owner

@jmorganca commented on GitHub (Dec 24, 2023):

@sandangel thanks for the pointer. We are looking at ways to support BERT models and the MLX framework seems like a great fit for that.

<!-- gh-comment-id:1868598844 --> @jmorganca commented on GitHub (Dec 24, 2023): @sandangel thanks for the pointer. We are looking at ways to support BERT models and the MLX framework seems like a great fit for that.
Author
Owner

@sampriti026 commented on GitHub (Dec 27, 2023):

Hey if I want to use the generate embedding api with other embedding models in mteb, is there any way i can do that? if yes, then how?

<!-- gh-comment-id:1870384810 --> @sampriti026 commented on GitHub (Dec 27, 2023): Hey if I want to use the generate embedding api with other embedding models in mteb, is there any way i can do that? if yes, then how?
Author
Owner

@BruceMacD commented on GitHub (Dec 27, 2023):

@sampriti026 ollama has an endpoint to generate embeddings:
https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-embeddings

It sounds like you may be looking for embedding specific models, which ollama doesnt support yet. Support for BERT embedding models is tracked in #327

<!-- gh-comment-id:1870479235 --> @BruceMacD commented on GitHub (Dec 27, 2023): @sampriti026 ollama has an endpoint to generate embeddings: https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-embeddings It sounds like you may be looking for embedding specific models, which ollama doesnt support yet. Support for BERT embedding models is tracked in #327
Author
Owner

@sampriti026 commented on GitHub (Dec 27, 2023):

@BruceMacD unrelated to ollama, what is the alternative to ollama, for running the desired embedding models? any experience? also i was wondering if i can take one of the embedding model of choice and make it, and then run that model to generate embedding.

<!-- gh-comment-id:1870486821 --> @sampriti026 commented on GitHub (Dec 27, 2023): @BruceMacD unrelated to ollama, what is the alternative to ollama, for running the desired embedding models? any experience? also i was wondering if i can take one of the embedding model of choice and make it, and then run that model to generate embedding.
Author
Owner

@sandangel commented on GitHub (Dec 28, 2023):

If you're using Apple Silicon, a good alternative would be adding an API endpoint to https://github.com/ml-explore/mlx-examples/blob/main/bert/README.md . Endpoint can be similar to OpenAI endpoint of Ollama depends on framework you're using (langchain, llama-index, haystack etc...).

<!-- gh-comment-id:1870822066 --> @sandangel commented on GitHub (Dec 28, 2023): If you're using Apple Silicon, a good alternative would be adding an API endpoint to https://github.com/ml-explore/mlx-examples/blob/main/bert/README.md . Endpoint can be similar to OpenAI endpoint of Ollama depends on framework you're using (langchain, llama-index, haystack etc...).
Author
Owner

@espipj commented on GitHub (Dec 31, 2023):

This would be super useful

<!-- gh-comment-id:1872950061 --> @espipj commented on GitHub (Dec 31, 2023): This would be super useful
Author
Owner

@chigkim commented on GitHub (Feb 10, 2024):

Does Ollama support any embedding model yet? If so, which and where can I get?

<!-- gh-comment-id:1937100780 --> @chigkim commented on GitHub (Feb 10, 2024): Does Ollama support any embedding model yet? If so, which and where can I get?
Author
Owner

@sublimator commented on GitHub (Feb 23, 2024):

@chigkim
ICYMI:
https://ollama.com/library/nomic-embed-text
https://ollama.com/library/all-minilm

<!-- gh-comment-id:1960753636 --> @sublimator commented on GitHub (Feb 23, 2024): @chigkim ICYMI: https://ollama.com/library/nomic-embed-text https://ollama.com/library/all-minilm
Author
Owner

@vividfog commented on GitHub (Feb 23, 2024):

Nice, this is an excellent feature done well. Thank you to all contributors.

<!-- gh-comment-id:1960795435 --> @vividfog commented on GitHub (Feb 23, 2024): Nice, this is an excellent feature done well. Thank you to all contributors.
Author
Owner

@qdrddr commented on GitHub (Jun 28, 2024):

Related to this CoreML feature.
https://github.com/ollama/ollama/issues/3898

<!-- gh-comment-id:2197633710 --> @qdrddr commented on GitHub (Jun 28, 2024): Related to this CoreML feature. https://github.com/ollama/ollama/issues/3898
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#46910