[GH-ISSUE #2833] Running ollama on Hugging Face Spaces #1722

Closed
opened 2026-04-12 11:41:59 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @jbdatascience on GitHub (Feb 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2833

I want to run ollama on Hugging Face Spaces, because I run a Streamlit app there that must make use of a LLM and a embedding model served by Ollama. How can I do that?

Originally created by @jbdatascience on GitHub (Feb 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2833 I want to run ollama on Hugging Face Spaces, because I run a Streamlit app there that must make use of a LLM and a embedding model served by Ollama. How can I do that?
Author
Owner

@Fastidious commented on GitHub (Feb 29, 2024):

What do you mean "run ollama on Hugging Face Spaces"? Do you want to consume models available in Hugging Face Spaces (HFS)? Is that your aim? If so, then that's something not available out of the box in Ollama.

If you search The Tubes, you will find a few approaches to do so. The easiest way to use HFS models is by transforming models using GPT-Generated Unified Format (gguf). Not every model on HFS is a gguf model.

<!-- gh-comment-id:1971873682 --> @Fastidious commented on GitHub (Feb 29, 2024): What do you mean "run ollama on Hugging Face Spaces"? Do you want to consume models available in Hugging Face Spaces (HFS)? Is that your aim? If so, then that's something not available out of the box in Ollama. If you search The Tubes, you will find a few approaches to do so. The easiest way to use HFS models is by transforming models using GPT-Generated Unified Format (gguf). Not every model on HFS is a gguf model.
Author
Owner

@jbdatascience commented on GitHub (Mar 1, 2024):

What do you mean "run ollama on Hugging Face Spaces"? Do you want to consume models available in Hugging Face Spaces (HFS)? Is that your aim? If so, then that's something not available out of the box in Ollama.

If you search The Tubes, you will find a few approaches to do so. The easiest way to use HFS models is by transforming models using GPT-Generated Unified Format (gguf). Not every model on HFS is a gguf model.

I gues you are right. Also I came to the conclusion that I oversaw the option of not using ollama in the first place! Just using a LLM model and an embedding model right from HF is the natural (and obvious!) way to go!

My plan now is to modify the existing code to implement this.

thanks for your response!

<!-- gh-comment-id:1972695258 --> @jbdatascience commented on GitHub (Mar 1, 2024): > What do you mean "run ollama on Hugging Face Spaces"? Do you want to consume models available in Hugging Face Spaces (HFS)? Is that your aim? If so, then that's something not available out of the box in Ollama. > > If you search The Tubes, you will find a few approaches to do so. The easiest way to use HFS models is by transforming models using GPT-Generated Unified Format (gguf). Not every model on HFS is a gguf model. I gues you are right. Also I came to the conclusion that I oversaw the option of not using ollama in the first place! Just using a LLM model and an embedding model right from HF is the natural (and obvious!) way to go! My plan now is to modify the existing code to implement this. thanks for your response!
Author
Owner

@pdevine commented on GitHub (May 17, 2024):

Hey @jbdatascience I think it's OK to close this? I think you could do it through Docker Spaces, but I'm not sure if that's the best way to do it as I'm not really familiar w/ Spaces. There are lots of other places to get GPU instances on the cloud where you can run Ollama.

<!-- gh-comment-id:2118469992 --> @pdevine commented on GitHub (May 17, 2024): Hey @jbdatascience I think it's OK to close this? I think you could do it through Docker Spaces, but I'm not sure if that's the best way to do it as I'm not really familiar w/ Spaces. There are lots of other places to get GPU instances on the cloud where you can run Ollama.
Author
Owner

@jbdatascience commented on GitHub (Jun 14, 2024):

Hey @jbdatascience I think it's OK to close this? I think you could do it through Docker Spaces, but I'm not sure if that's the best way to do it as I'm not really familiar w/ Spaces. There are lots of other places to get GPU instances on the cloud where you can run Ollama.

I am still searching for a solution!

<!-- gh-comment-id:2167984771 --> @jbdatascience commented on GitHub (Jun 14, 2024): > Hey @jbdatascience I think it's OK to close this? I think you could do it through Docker Spaces, but I'm not sure if that's the best way to do it as I'm not really familiar w/ Spaces. There are lots of other places to get GPU instances on the cloud where you can run Ollama. I am still searching for a solution!
Author
Owner

@musarehmani291 commented on GitHub (Jun 14, 2024):

Hi @Fastidious and @pdevine
I wanna deploy Ollama to Huggingface space. I used the official Dockerfile of the Ollama repo but it keeps building I could never get the space to running. I'm using the free version of Huggingface space. could you look into this issue?

<!-- gh-comment-id:2168411889 --> @musarehmani291 commented on GitHub (Jun 14, 2024): Hi @Fastidious and @pdevine I wanna deploy Ollama to Huggingface space. I used the official Dockerfile of the Ollama repo but it keeps building I could never get the space to running. I'm using the free version of Huggingface space. could you look into this issue?
Author
Owner

@Amirjab21 commented on GitHub (Jun 24, 2024):

What do you mean "run ollama on Hugging Face Spaces"? Do you want to consume models available in Hugging Face Spaces (HFS)? Is that your aim? If so, then that's something not available out of the box in Ollama.
If you search The Tubes, you will find a few approaches to do so. The easiest way to use HFS models is by transforming models using GPT-Generated Unified Format (gguf). Not every model on HFS is a gguf model.

I gues you are right. Also I came to the conclusion that I oversaw the option of not using ollama in the first place! Just using a LLM model and an embedding model right from HF is the natural (and obvious!) way to go!

My plan now is to modify the existing code to implement this.

thanks for your response!

Did you manage to do this?

<!-- gh-comment-id:2186919372 --> @Amirjab21 commented on GitHub (Jun 24, 2024): > > What do you mean "run ollama on Hugging Face Spaces"? Do you want to consume models available in Hugging Face Spaces (HFS)? Is that your aim? If so, then that's something not available out of the box in Ollama. > > If you search The Tubes, you will find a few approaches to do so. The easiest way to use HFS models is by transforming models using GPT-Generated Unified Format (gguf). Not every model on HFS is a gguf model. > > I gues you are right. Also I came to the conclusion that I oversaw the option of not using ollama in the first place! Just using a LLM model and an embedding model right from HF is the natural (and obvious!) way to go! > > My plan now is to modify the existing code to implement this. > > thanks for your response! Did you manage to do this?
Author
Owner

@myyim commented on GitHub (Jun 21, 2025):

This solution works for me:
https://medium.com/p/1f5d8f871887

If you use an embedding model, when deploying the model on the client end, use "embed" instead of "generate".

For example,

curl http://localhost:11434/api/embed -d '{
"model": "all-minilm",
"input": "Why is the sky blue?"
}'

<!-- gh-comment-id:2993670172 --> @myyim commented on GitHub (Jun 21, 2025): This solution works for me: https://medium.com/p/1f5d8f871887 If you use an embedding model, when deploying the model on the client end, use "embed" instead of "generate". For example, curl http://localhost:11434/api/embed -d '{ "model": "all-minilm", "input": "Why is the sky blue?" }'
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1722