[GH-ISSUE #957] How do I create a Docker image containing a model? #62503

Closed
opened 2026-05-03 09:15:27 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @flemzord on GitHub (Oct 31, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/957

Hello,

I use Modelfile locally. I would like to deploy this one in production on a Kubernetes cluster, but I don't know how to proceed?
How can I create a Docker image containing Ollama and the Model created from the Modelfile?

Originally created by @flemzord on GitHub (Oct 31, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/957 Hello, I use Modelfile locally. I would like to deploy this one in production on a Kubernetes cluster, but I don't know how to proceed? How can I create a Docker image containing Ollama and the Model created from the Modelfile?
GiteaMirror added the question label 2026-05-03 09:15:27 -05:00
Author
Owner

@pepperoni21 commented on GitHub (Nov 5, 2023):

You could simply create a dockerfile based on debian or something else, then install ollama using curl https://ollama.ai/install.sh | sh, copy your modelfile and create the model with ollama create

<!-- gh-comment-id:1793776689 --> @pepperoni21 commented on GitHub (Nov 5, 2023): You could simply create a dockerfile based on debian or something else, then install ollama using `curl https://ollama.ai/install.sh | sh`, copy your modelfile and create the model with `ollama create`
Author
Owner

@ashatch commented on GitHub (Dec 20, 2023):

I'm looking at the same problem. I'd like to add a model to a container without having ollama running in the container. At the moment this will fail:

FROM ollama/ollama
COPY Modelfile .
RUN ollama create mymodel -f Modelfile
EXPOSE 11434
ENTRYPOINT ["/bin/ollama"]
CMD ["serve"]

as the ollama process isn't running (Error: could not connect to ollama server, run ‘ollama serve’ to start it). The same is true of having RUN ollama pull llama2.

Is there a supported way of installing models without having ollama running?

<!-- gh-comment-id:1864566955 --> @ashatch commented on GitHub (Dec 20, 2023): I'm looking at the same problem. I'd like to add a model to a container without having ollama running in the container. At the moment this will fail: ```Dockerfile FROM ollama/ollama COPY Modelfile . RUN ollama create mymodel -f Modelfile EXPOSE 11434 ENTRYPOINT ["/bin/ollama"] CMD ["serve"] ``` as the ollama process isn't running (`Error: could not connect to ollama server, run ‘ollama serve’ to start it`). The same is true of having `RUN ollama pull llama2`. Is there a supported way of installing models without having ollama running?
Author
Owner

@styukovs commented on GitHub (Dec 20, 2023):

Hello,

I create the desired model locally (also in the container), then I copied the contents of .ollama/ to the image /root/.ollama/ for deployment.

FROM ollama/ollama:0.1.17

COPY ./.ollama/ /root/.ollama/

Check that the model is available

docker exec -it ollama-llama bash
root@d3187bcbee04:/# ollama list
NAME      	ID          	SIZE  	MODIFIED    
llama2:13b	376ead63f82c	7.4 GB	5 hours ago
<!-- gh-comment-id:1864893605 --> @styukovs commented on GitHub (Dec 20, 2023): Hello, I create the desired model locally (also in the container), then I copied the contents of `.ollama/` to the image `/root/.ollama/` for deployment. ```dockerfile FROM ollama/ollama:0.1.17 COPY ./.ollama/ /root/.ollama/ ``` Check that the model is available ```bash docker exec -it ollama-llama bash root@d3187bcbee04:/# ollama list NAME ID SIZE MODIFIED llama2:13b 376ead63f82c 7.4 GB 5 hours ago ```
Author
Owner

@BruceMacD commented on GitHub (Mar 11, 2024):

Thanks for the details of how you were able accomplish this @styukovs, resolving this for now where there is a solution. If anyone has any additional questions please feel free to still add them here.

<!-- gh-comment-id:1989227714 --> @BruceMacD commented on GitHub (Mar 11, 2024): Thanks for the details of how you were able accomplish this @styukovs, resolving this for now where there is a solution. If anyone has any additional questions please feel free to still add them here.
Author
Owner

@shebpamm commented on GitHub (Apr 23, 2024):

Not the cleanest but I was able to pre-install a model using this RUN chain which will run ollama serve for that single layer:

FROM ollama/ollama:0.1.32

# Pre-Install llama2
RUN nohup bash -c "ollama serve &" && sleep 5 && ollama pull llama2

<!-- gh-comment-id:2072069206 --> @shebpamm commented on GitHub (Apr 23, 2024): Not the cleanest but I was able to pre-install a model using this RUN chain which will run `ollama serve` for that single layer: ```dockerfile FROM ollama/ollama:0.1.32 # Pre-Install llama2 RUN nohup bash -c "ollama serve &" && sleep 5 && ollama pull llama2 ```
Author
Owner

@rmonteseba commented on GitHub (May 11, 2024):

Not the cleanest but I was able to pre-install a model using this RUN chain which will run ollama serve for that single layer:


FROM ollama/ollama:0.1.32



# Pre-Install llama2

RUN nohup bash -c "ollama serve &" && sleep 5 && ollama pull llama2



Can also consider using wait-for-it.sh for avoid using sleep statements, just as an improvement 😄

<!-- gh-comment-id:2105569483 --> @rmonteseba commented on GitHub (May 11, 2024): > Not the cleanest but I was able to pre-install a model using this RUN chain which will run `ollama serve` for that single layer: > > > > ```dockerfile > > FROM ollama/ollama:0.1.32 > > > > # Pre-Install llama2 > > RUN nohup bash -c "ollama serve &" && sleep 5 && ollama pull llama2 > > > > ``` Can also consider using wait-for-it.sh for avoid using sleep statements, just as an improvement 😄
Author
Owner

@fnacarellidev commented on GitHub (Oct 15, 2024):

Not the cleanest but I was able to pre-install a model using this RUN chain which will run ollama serve for that single layer:

FROM ollama/ollama:0.1.32

# Pre-Install llama2
RUN nohup bash -c "ollama serve &" && sleep 5 && ollama pull llama2

Took that as an inspiration and ended up with

RUN nohup bash -c "ollama serve &" && wait4x http http://127.0.0.1:11434 && ollama pull llava

wait4x

<!-- gh-comment-id:2413944844 --> @fnacarellidev commented on GitHub (Oct 15, 2024): > Not the cleanest but I was able to pre-install a model using this RUN chain which will run `ollama serve` for that single layer: > > ```dockerfile > FROM ollama/ollama:0.1.32 > > # Pre-Install llama2 > RUN nohup bash -c "ollama serve &" && sleep 5 && ollama pull llama2 > ``` Took that as an inspiration and ended up with ```dockerfile RUN nohup bash -c "ollama serve &" && wait4x http http://127.0.0.1:11434 && ollama pull llava ``` [wait4x](https://github.com/atkrad/wait4x)
Author
Owner

@Ghania-Sarwar commented on GitHub (Jan 31, 2025):

[GIN] 2025/01/31 - 20:28:18 | 404 | 2.940145ms | 172.19.0.3 | GET "/api/generate" i am getting this because of this my ollama is not working it says 404 errors indicate that some endpoints (/api/generate) are being requested but not found. what is this endpoint and how to give it?

<!-- gh-comment-id:2628371367 --> @Ghania-Sarwar commented on GitHub (Jan 31, 2025): [GIN] 2025/01/31 - 20:28:18 | 404 | 2.940145ms | 172.19.0.3 | GET "/api/generate" i am getting this because of this my ollama is not working it says 404 errors indicate that some endpoints (/api/generate) are being requested but not found. what is this endpoint and how to give it?
Author
Owner

@nisseknudsen commented on GitHub (Apr 10, 2025):

I turned this into a multi-stage build so I can keep tweaking my actual image without worrying about accidentally invalidating the model cache layer. Plus Docker BuildKit build can run both in parallel until we hit the copy step.

FROM ollama/ollama:0.6.5 AS builder

RUN ollama serve & \
    sleep 3 && \
    ollama pull llama3.2-vision

FROM ollama/ollama:0.6.5

# Custom stuff here

COPY --from=builder /root/.ollama /root/.ollama  # <-- copies the model cache over

# More custom stuff here

CMD ["serve"]
<!-- gh-comment-id:2795265034 --> @nisseknudsen commented on GitHub (Apr 10, 2025): I turned this into a multi-stage build so I can keep tweaking my actual image without worrying about accidentally invalidating the model cache layer. Plus Docker BuildKit build can run both in parallel until we hit the copy step. ``` FROM ollama/ollama:0.6.5 AS builder RUN ollama serve & \ sleep 3 && \ ollama pull llama3.2-vision FROM ollama/ollama:0.6.5 # Custom stuff here COPY --from=builder /root/.ollama /root/.ollama # <-- copies the model cache over # More custom stuff here CMD ["serve"] ```
Author
Owner

@leoncydsilva commented on GitHub (Apr 30, 2025):

This worked for me

FROM ollama/ollama:latest AS builder

# Start Ollama server in background and wait for it to become ready
RUN nohup ollama serve > /tmp/ollama.log 2>&1 & \
    sleep 10 && \
    ollama pull qwen2.5-coder:32b

# Final image
FROM ollama/ollama:latest

# Copy preloaded model cache
COPY --from=builder /root/.ollama /root/.ollama

# Expose Ollama port
EXPOSE 11434

# Run the Ollama server normally
CMD ["serve"]
<!-- gh-comment-id:2841372698 --> @leoncydsilva commented on GitHub (Apr 30, 2025): This worked for me <pre> FROM ollama/ollama:latest AS builder # Start Ollama server in background and wait for it to become ready RUN nohup ollama serve > /tmp/ollama.log 2>&1 & \ sleep 10 && \ ollama pull qwen2.5-coder:32b # Final image FROM ollama/ollama:latest # Copy preloaded model cache COPY --from=builder /root/.ollama /root/.ollama # Expose Ollama port EXPOSE 11434 # Run the Ollama server normally CMD ["serve"] </pre>
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62503