[GH-ISSUE #7046] Loading Llama model to a Google Cloud Run Ollama Container through a Dockerfile #4472

Closed
opened 2026-04-12 15:23:52 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @waynemorphic on GitHub (Sep 30, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7046

What is the issue?

I have been trying to Dockerize Ollama and consequently load the Llama3.1 model into the Google Cloud Run deployment. While Ollama is running as expected in Cloud Run, the model is not loaded as expected since hitting v1/models returns a null result. I have a hacky solution with Compute Engine where I have an SSH connection to run the Dockerized image and consequently to pull and run the model. However, this solution will neither be cost-effective nor efficient in the long term. I want help figuring out how to load LLMs into Ollama through a single Dockerfile that will be deployed to Google Cloud Run if this is possible. Here is my current Dockerfile.

FROM ollama/ollama
WORKDIR /app
RUN apt-get update && apt-get install -y wget && apt-get install -y --no-install-recommends git curl
ENV DEBIAN_FRONTEND=noninteractive
ENV OLLAMA_KEEP_ALIVE=24h
EXPOSE 11434
VOLUME [ "./ollama/ollama:/root/.ollama" ]
ENTRYPOINT ["/bin/bash", "-c", "ollama serve & sleep 5 && ollama run llama3.1 && tail -f /dev/null"]

OS

Docker

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @waynemorphic on GitHub (Sep 30, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7046 ### What is the issue? I have been trying to Dockerize Ollama and consequently load the Llama3.1 model into the Google Cloud Run deployment. While Ollama is running as expected in Cloud Run, the model is not loaded as expected since hitting `v1/models` returns a null result. I have a hacky solution with Compute Engine where I have an SSH connection to run the Dockerized image and consequently to pull and run the model. However, this solution will neither be cost-effective nor efficient in the long term. I want help figuring out how to load LLMs into Ollama through a single Dockerfile that will be deployed to Google Cloud Run if this is possible. Here is my current Dockerfile. ```Dockerfile FROM ollama/ollama WORKDIR /app RUN apt-get update && apt-get install -y wget && apt-get install -y --no-install-recommends git curl ENV DEBIAN_FRONTEND=noninteractive ENV OLLAMA_KEEP_ALIVE=24h EXPOSE 11434 VOLUME [ "./ollama/ollama:/root/.ollama" ] ENTRYPOINT ["/bin/bash", "-c", "ollama serve & sleep 5 && ollama run llama3.1 && tail -f /dev/null"] ``` ### OS Docker ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the dockerfeature request labels 2026-04-12 15:23:52 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 30, 2024):

Are you trying to create an image that has embedded models? If so, do you want the available models to be static or do you want to be able to add models after deployment?

Static is straightforward:

FROM ollama/ollama
ARG MODELS="nomic-embed-text:latest qwen2.5:0.5b-instruct"
ENV OLLAMA_KEEP_ALIVE=24h
RUN ollama serve & server=$! ; sleep 5 ; for m in $MODELS ; do ollama pull $m ; done ; kill $server
$ docker build -f Dockerfile -t ollama-models .
$ docker run --rm -d --name ollama-models ollama-models
$ docker exec -it ollama-models ollama list
NAME                       ID              SIZE      MODIFIED      
qwen2.5:0.5b-instruct      a8b0c5157701    397 MB    3 minutes ago    
nomic-embed-text:latest    0a109f422b47    274 MB    8 minutes ago    
$ docker exec -it ollama-models ollama run qwen2.5:0.5b-instruct hello
Hello! How can I assist you today?

Since rebuilding the image pulls the models from the ollama library, you could archive models locally and copy them into the image and expand them during build, see https://github.com/ollama/ollama/issues/6037#issuecomment-2255540441

Dynamic is harder. The VOLUME statement doesn't do what I think you are trying to do. It doesn't create a mapping from the host directory of ./ollama/ollama to the container directory of /root/.ollama. What it does is create a path /ollama/ollama:/root/.ollama inside the container that maps to the _data directory for the container in /var/lib/docker/volumes. This directory doesn't survive a container restart, so even if you map the model directory correctly with VOLUME [ "/root/.ollama" ], any models added will vanish when the container is restarted.

The canonical way to create a persistent volume is with the --volume argument (or volumes stanza for compose). The problem with that is that it "hides" the contents of the directory in the container, so --volume ./ollama/ollama:/root/.ollama results in the pre-loaded models vanishing. So what you have to do is load the models in to a staging area and then copy them into the model directory after the external volume has been mounted. Symlinks are used here to reduce disk usage, but it does mean that the host cannot access the pre-loaded models.

FROM ollama/ollama
ARG MODELS="nomic-embed-text:latest qwen2.5:0.5b-instruct"
ENV OLLAMA_KEEP_ALIVE=24h
RUN OLLAMA_MODELS=/root/models ollama serve & server=$! ; sleep 5 ; for m in $MODELS ; do ollama pull $m ; done ; kill $server
ENTRYPOINT [ "/bin/bash", "-c", "cp -as /root/models /root/.ollama ; exec /bin/ollama $0" ]
CMD [ "serve" ]
$ docker run --rm -d -v ./ollama:/root/.ollama  --name ollama-models ollama-models
$ docker exec -it ollama-models ollama list
NAME                   	ID          	SIZE  	MODIFIED      
qwen2.5:0.5b-instruct  	a8b0c5157701	397 MB	7 minutes ago	
nomic-embed-text:latest	0a109f422b47	274 MB	8 minutes ago	
$ docker exec -it ollama-models ollama pull qwen:0.5b-text
$ docker exec -it ollama-models ollama list
NAME                   	ID          	SIZE  	MODIFIED      
qwen:0.5b-text         	f92ac32068ca	394 MB	8 seconds ago	
qwen2.5:0.5b-instruct  	a8b0c5157701	397 MB	8 minutes ago	
nomic-embed-text:latest	0a109f422b47	274 MB	8 minutes ago	
$ docker stop ollama-models
ollama-models
$ docker run --rm -d -v ./ollama:/root/.ollama  --name ollama-models ollama-models
1abe24f3067ba6b98bbdcaa869164866b0d4fe5d923162b67162cb263e043fa9
$ docker exec -it ollama-models ollama list
NAME                   	ID          	SIZE  	MODIFIED       
qwen:0.5b-text         	f92ac32068ca	394 MB	27 seconds ago	
qwen2.5:0.5b-instruct  	a8b0c5157701	397 MB	8 minutes ago 	
nomic-embed-text:latest	0a109f422b47	274 MB	9 minutes ago
$ docker stop ollama-models
ollama-models
$ sudo rm -rf ollama/
$ docker run --rm -d -v ./ollama:/root/.ollama  --name ollama-models ollama-models
bc3e9eb3e42ea2f694f79b86dc4d8833b0a921f1d6f7ed3ccfdf2941d3c7f846
$ docker exec -it ollama-models ollama list
NAME                   	ID          	SIZE  	MODIFIED       
qwen2.5:0.5b-instruct  	a8b0c5157701	397 MB	16 minutes ago	
nomic-embed-text:latest	0a109f422b47	274 MB	16 minutes ago 	

If you want pull models when the container starts:

FROM ollama/ollama
ENV MODELS="nomic-embed-text:latest qwen2.5:0.5b-instruct"
ENV OLLAMA_KEEP_ALIVE=24h
ENTRYPOINT [ "/bin/bash", "-c", "(sleep 5 ; for m in $MODELS ; do ollama pull $m ; done) & exec /bin/ollama $0" ]
CMD [ "serve" ]
$ docker run --rm -d -v ./ollama:/root/.ollama  --name ollama-models ollama-models
607a0548e3dab65e93507de63e4d40fafd89331e442666bad06800bbec5999bf
# time passes
$ docker exec -it ollama-models ollama list
NAME                   	ID          	SIZE  	MODIFIED       
qwen2.5:0.5b-instruct  	a8b0c5157701	397 MB	19 seconds ago	
nomic-embed-text:latest	0a109f422b47	274 MB	43 seconds ago	
$ docker stop ollama-models
ollama-models
$ docker run --rm -d -v ./ollama:/root/.ollama -e MODELS=qwen:0.5b-text --name ollama-models ollama-models
$ docker exec -it ollama-models ollama list
NAME                   	ID          	SIZE  	MODIFIED       
qwen:0.5b-text         	f92ac32068ca	394 MB	25 seconds ago	
qwen2.5:0.5b-instruct  	a8b0c5157701	397 MB	4 minutes ago 	
nomic-embed-text:latest	0a109f422b47	274 MB	5 minutes ago 	
<!-- gh-comment-id:2383792234 --> @rick-github commented on GitHub (Sep 30, 2024): Are you trying to create an image that has embedded models? If so, do you want the available models to be static or do you want to be able to add models after deployment? Static is straightforward: ```dockerfile FROM ollama/ollama ARG MODELS="nomic-embed-text:latest qwen2.5:0.5b-instruct" ENV OLLAMA_KEEP_ALIVE=24h RUN ollama serve & server=$! ; sleep 5 ; for m in $MODELS ; do ollama pull $m ; done ; kill $server ``` ```console $ docker build -f Dockerfile -t ollama-models . $ docker run --rm -d --name ollama-models ollama-models $ docker exec -it ollama-models ollama list NAME ID SIZE MODIFIED qwen2.5:0.5b-instruct a8b0c5157701 397 MB 3 minutes ago nomic-embed-text:latest 0a109f422b47 274 MB 8 minutes ago $ docker exec -it ollama-models ollama run qwen2.5:0.5b-instruct hello Hello! How can I assist you today? ``` Since rebuilding the image pulls the models from the ollama library, you could archive models locally and copy them into the image and expand them during build, see https://github.com/ollama/ollama/issues/6037#issuecomment-2255540441 Dynamic is harder. The `VOLUME` statement doesn't do what I think you are trying to do. It doesn't create a mapping from the host directory of `./ollama/ollama` to the container directory of `/root/.ollama`. What it does is create a path `/ollama/ollama:/root/.ollama` inside the container that maps to the `_data` directory for the container in `/var/lib/docker/volumes`. This directory doesn't survive a container restart, so even if you map the model directory correctly with `VOLUME [ "/root/.ollama" ]`, any models added will vanish when the container is restarted. The canonical way to create a persistent volume is with the `--volume` argument (or `volumes` stanza for compose). The problem with that is that it "hides" the contents of the directory in the container, so `--volume ./ollama/ollama:/root/.ollama` results in the pre-loaded models vanishing. So what you have to do is load the models in to a staging area and then copy them into the model directory after the external volume has been mounted. Symlinks are used here to reduce disk usage, but it does mean that the host cannot access the pre-loaded models. ```dockerfile FROM ollama/ollama ARG MODELS="nomic-embed-text:latest qwen2.5:0.5b-instruct" ENV OLLAMA_KEEP_ALIVE=24h RUN OLLAMA_MODELS=/root/models ollama serve & server=$! ; sleep 5 ; for m in $MODELS ; do ollama pull $m ; done ; kill $server ENTRYPOINT [ "/bin/bash", "-c", "cp -as /root/models /root/.ollama ; exec /bin/ollama $0" ] CMD [ "serve" ] ``` ```console $ docker run --rm -d -v ./ollama:/root/.ollama --name ollama-models ollama-models $ docker exec -it ollama-models ollama list NAME ID SIZE MODIFIED qwen2.5:0.5b-instruct a8b0c5157701 397 MB 7 minutes ago nomic-embed-text:latest 0a109f422b47 274 MB 8 minutes ago $ docker exec -it ollama-models ollama pull qwen:0.5b-text $ docker exec -it ollama-models ollama list NAME ID SIZE MODIFIED qwen:0.5b-text f92ac32068ca 394 MB 8 seconds ago qwen2.5:0.5b-instruct a8b0c5157701 397 MB 8 minutes ago nomic-embed-text:latest 0a109f422b47 274 MB 8 minutes ago $ docker stop ollama-models ollama-models $ docker run --rm -d -v ./ollama:/root/.ollama --name ollama-models ollama-models 1abe24f3067ba6b98bbdcaa869164866b0d4fe5d923162b67162cb263e043fa9 $ docker exec -it ollama-models ollama list NAME ID SIZE MODIFIED qwen:0.5b-text f92ac32068ca 394 MB 27 seconds ago qwen2.5:0.5b-instruct a8b0c5157701 397 MB 8 minutes ago nomic-embed-text:latest 0a109f422b47 274 MB 9 minutes ago $ docker stop ollama-models ollama-models $ sudo rm -rf ollama/ $ docker run --rm -d -v ./ollama:/root/.ollama --name ollama-models ollama-models bc3e9eb3e42ea2f694f79b86dc4d8833b0a921f1d6f7ed3ccfdf2941d3c7f846 $ docker exec -it ollama-models ollama list NAME ID SIZE MODIFIED qwen2.5:0.5b-instruct a8b0c5157701 397 MB 16 minutes ago nomic-embed-text:latest 0a109f422b47 274 MB 16 minutes ago ``` If you want pull models when the container starts: ```dockerfile FROM ollama/ollama ENV MODELS="nomic-embed-text:latest qwen2.5:0.5b-instruct" ENV OLLAMA_KEEP_ALIVE=24h ENTRYPOINT [ "/bin/bash", "-c", "(sleep 5 ; for m in $MODELS ; do ollama pull $m ; done) & exec /bin/ollama $0" ] CMD [ "serve" ] ``` ```console $ docker run --rm -d -v ./ollama:/root/.ollama --name ollama-models ollama-models 607a0548e3dab65e93507de63e4d40fafd89331e442666bad06800bbec5999bf # time passes $ docker exec -it ollama-models ollama list NAME ID SIZE MODIFIED qwen2.5:0.5b-instruct a8b0c5157701 397 MB 19 seconds ago nomic-embed-text:latest 0a109f422b47 274 MB 43 seconds ago $ docker stop ollama-models ollama-models $ docker run --rm -d -v ./ollama:/root/.ollama -e MODELS=qwen:0.5b-text --name ollama-models ollama-models $ docker exec -it ollama-models ollama list NAME ID SIZE MODIFIED qwen:0.5b-text f92ac32068ca 394 MB 25 seconds ago qwen2.5:0.5b-instruct a8b0c5157701 397 MB 4 minutes ago nomic-embed-text:latest 0a109f422b47 274 MB 5 minutes ago ```
Author
Owner

@dhiltgen commented on GitHub (Sep 30, 2024):

We're tracking this via #3369

<!-- gh-comment-id:2383943315 --> @dhiltgen commented on GitHub (Sep 30, 2024): We're tracking this via #3369
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4472