[GH-ISSUE #6702] Problem Serving Custom LLAMA3 Using Google Cloud Run #4218

Closed
opened 2026-04-12 15:09:13 -05:00 by GiteaMirror · 16 comments
Owner

Originally created by @Oluwafemi-Jegede on GitHub (Sep 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6702

What is the issue?

I can run a custom LLAMA3 model locally using this docker config

FROM ollama/ollama:latest


COPY custom_llama.txt /App/custom_llama.txt

WORKDIR /App

RUN ollama serve & sleep 5 && ollama create ai-agent -f custom_llama.txt && ollama run ai-agent

EXPOSE 11434

However when I deploy on GCP cloud run, I don't see any model running. $URL/api/tags = {"models":[]}, but it says ollama running on the homepage

FYI: Custom model is LLAMA3:8B

OS

Docker

GPU

No response

CPU

No response

Ollama version

LLAMA3

Originally created by @Oluwafemi-Jegede on GitHub (Sep 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6702 ### What is the issue? I can run a custom LLAMA3 model locally using this docker config ``` FROM ollama/ollama:latest COPY custom_llama.txt /App/custom_llama.txt WORKDIR /App RUN ollama serve & sleep 5 && ollama create ai-agent -f custom_llama.txt && ollama run ai-agent EXPOSE 11434 ``` However when I deploy on GCP cloud run, I don't see any model running. `$URL/api/tags = {"models":[]}`, but it says `ollama running` on the homepage FYI: Custom model is LLAMA3:8B ### OS Docker ### GPU _No response_ ### CPU _No response_ ### Ollama version LLAMA3
GiteaMirror added the dockerquestion labels 2026-04-12 15:09:13 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 8, 2024):

What's in custom_llama.txt?

<!-- gh-comment-id:2336755527 --> @rick-github commented on GitHub (Sep 8, 2024): What's in `custom_llama.txt`?
Author
Owner

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

What's in custom_llama.txt? @rick-github

FROM llama3:8b

PARAMETER temperature 0.8
PARAMETER top_k 30
PARAMETER top_p 0.7

PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>
PARAMETER stop <|reserved_special_token

TEMPLATE """
{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>
"""

SYSTEM You are a bot that helps infer ........

<!-- gh-comment-id:2336756069 --> @Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): > What's in `custom_llama.txt`? @rick-github FROM llama3:8b PARAMETER temperature 0.8 PARAMETER top_k 30 PARAMETER top_p 0.7 PARAMETER stop <|start_header_id|> PARAMETER stop <|end_header_id|> PARAMETER stop <|eot_id|> PARAMETER stop <|reserved_special_token TEMPLATE """ {{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|> """ SYSTEM You are a bot that helps infer ........
Author
Owner

@rick-github commented on GitHub (Sep 8, 2024):

Where is the llama3:8b model located?

<!-- gh-comment-id:2336756658 --> @rick-github commented on GitHub (Sep 8, 2024): Where is the `llama3:8b` model located?
Author
Owner

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

I am unsure if I understand what you mean, but shouldn't this FROM ollama/ollama:latest in the docker file already resolve that?

<!-- gh-comment-id:2336757643 --> @Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): I am unsure if I understand what you mean, but shouldn't this `FROM ollama/ollama:latest` in the docker file already resolve that?
Author
Owner

@rick-github commented on GitHub (Sep 8, 2024):

FROM ollama/ollama:latest just pulls the program, not any models. If you want to create a new model, you need to pull the model you want to base your custom one on: ollama pull llama3:8b.

<!-- gh-comment-id:2336758484 --> @rick-github commented on GitHub (Sep 8, 2024): `FROM ollama/ollama:latest` just pulls the program, not any models. If you want to create a new model, you need to pull the model you want to base your custom one on: `ollama pull llama3:8b`.
Author
Owner

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

@rick-github Okay thanks so the docker file should look like this?

FROM ollama/ollama:latest


COPY custom_llama.txt /App/custom_llama.txt

WORKDIR /App

RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt && ollama run ai-agent

EXPOSE 11434

Also curious how it runs locally without running the ollama application in the background

<!-- gh-comment-id:2336759573 --> @Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): @rick-github Okay thanks so the docker file should look like this? ``` FROM ollama/ollama:latest COPY custom_llama.txt /App/custom_llama.txt WORKDIR /App RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt && ollama run ai-agent EXPOSE 11434 ``` Also curious how it runs locally without running the ollama application in the background
Author
Owner

@rick-github commented on GitHub (Sep 8, 2024):

The RUN commands you have there are only running during the container build process, the container automatically starts the ollama server when it's instantiated, so when running locally it's just ready. The final ollama run ai-agent is unnecessary.

<!-- gh-comment-id:2336762270 --> @rick-github commented on GitHub (Sep 8, 2024): The RUN commands you have there are only running during the container build process, the container automatically starts the ollama server when it's instantiated, so when running locally it's just ready. The final `ollama run ai-agent` is unnecessary.
Author
Owner

@rick-github commented on GitHub (Sep 8, 2024):

Note that the way you are doing this, every time you build the container, ollama will re-pull the model, which can be slow, error prone, and impactful on your bandwidth budget. It may be better to pull the model to your work space just once, and then COPY the model in to the container during the build process.

<!-- gh-comment-id:2336763189 --> @rick-github commented on GitHub (Sep 8, 2024): Note that the way you are doing this, every time you build the container, ollama will re-pull the model, which can be slow, error prone, and impactful on your bandwidth budget. It may be better to pull the model to your work space just once, and then COPY the model in to the container during the build process.
Author
Owner

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

@rick-github Yeah, thanks for the suggestion will try COPY to reduce overhead, I tried Dockerfile below and I still can not see any model on cloud run after adding the pull command

FROM ollama/ollama:latest


COPY custom_llama.txt /App/custom_llama.txt

WORKDIR /App

RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt

EXPOSE 11434

$URL/api/tags => {"models":[]}

<!-- gh-comment-id:2336774195 --> @Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): @rick-github Yeah, thanks for the suggestion will try COPY to reduce overhead, I tried Dockerfile below and I still can not see any model on cloud run after adding the pull command ``` FROM ollama/ollama:latest COPY custom_llama.txt /App/custom_llama.txt WORKDIR /App RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt EXPOSE 11434 ``` `$URL/api/tags => {"models":[]}`
Author
Owner

@rick-github commented on GitHub (Sep 8, 2024):

Worked locally for me. I don't have a GCP account so can't test cloud run. Do you get any logs from the GCP attempt?

Build:

$  docker build -f Dockerfile -t 6702 --progress plain .
...
#8 0.180 2024/09/08 18:12:46 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
...
pulling manifest
#8 81.61 pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
...
#8 81.61 success
...
#8 81.66 transferring model data
#8 81.66 using existing layer sha256:6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa
...
#8 81.66 success
...
#9 exporting layers 14.9s done
#9 writing image sha256:9c602d2c645c0ced9f6010250c0f7876d771c5634e799cfdcc6c335ed55fc4d6 done
#9 naming to docker.io/library/6702 done
#9 DONE 14.9s

Run:

$ docker run -d --name 6702 6702
4faf0f6f995e88003c274200f743a50615f4146f15e0965cdbed306e89f3c04a
$ docker exec -it 6702 bash
root@4faf0f6f995e:/App# ollama list
NAME            ID              SIZE    MODIFIED
ai-agent:latest 3f2762d3ecf4    4.7 GB  7 minutes ago
llama3:8b       365c0bd3c000    4.7 GB  7 minutes ago
root@4faf0f6f995e:/App# ollama run ai-agent:latest hello
Hello! I'm a bot designed to help infer information from text-based input. I can assist with tasks such as answering questions, summarizing content, and generating ideas. What would you like to talk about
or ask?

root@4faf0f6f995e:/App#
<!-- gh-comment-id:2336780403 --> @rick-github commented on GitHub (Sep 8, 2024): Worked locally for me. I don't have a GCP account so can't test cloud run. Do you get any logs from the GCP attempt? Build: ``` $ docker build -f Dockerfile -t 6702 --progress plain . ... #8 0.180 2024/09/08 18:12:46 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" ... pulling manifest #8 81.61 pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB ... #8 81.61 success ... #8 81.66 transferring model data #8 81.66 using existing layer sha256:6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa ... #8 81.66 success ... #9 exporting layers 14.9s done #9 writing image sha256:9c602d2c645c0ced9f6010250c0f7876d771c5634e799cfdcc6c335ed55fc4d6 done #9 naming to docker.io/library/6702 done #9 DONE 14.9s ``` Run: ``` $ docker run -d --name 6702 6702 4faf0f6f995e88003c274200f743a50615f4146f15e0965cdbed306e89f3c04a $ docker exec -it 6702 bash root@4faf0f6f995e:/App# ollama list NAME ID SIZE MODIFIED ai-agent:latest 3f2762d3ecf4 4.7 GB 7 minutes ago llama3:8b 365c0bd3c000 4.7 GB 7 minutes ago root@4faf0f6f995e:/App# ollama run ai-agent:latest hello Hello! I'm a bot designed to help infer information from text-based input. I can assist with tasks such as answering questions, summarizing content, and generating ideas. What would you like to talk about or ask? root@4faf0f6f995e:/App# ```
Author
Owner

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

Yeah same here, works perfectly locally for me but when I move to cloud it just showsollama is running

<!-- gh-comment-id:2336792172 --> @Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): Yeah same here, works perfectly locally for me but when I move to cloud it just shows` ollama is running `
Author
Owner

@rick-github commented on GitHub (Sep 8, 2024):

Are you running it in a VM instance in the cloud, or just the container with gcloud compute instances create-with-container?

<!-- gh-comment-id:2336792745 --> @rick-github commented on GitHub (Sep 8, 2024): Are you running it in a VM instance in the cloud, or just the container with `gcloud compute instances create-with-container`?
Author
Owner

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

so I am using Google Cloud Run more like a managed container to run workloads in the cloud, with no direct access to the VMs or Compute Engine instances.

<!-- gh-comment-id:2336795796 --> @Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): so I am using [Google Cloud Run ](https://cloud.google.com/run/?utm_source=google&utm_medium=cpc&utm_campaign=na-CA-all-en-dr-bkws-all-all-trial-e-dr-1707554&utm_content=text-ad-none-any-DEV_c-CRE_665735485586-ADGP_Hybrid+%7C+BKWS+-+MIX+%7C+Txt-Serverless+Computing-Cloud+Run-KWID_43700077225654501-kwd-678836618089&utm_term=KW_google%20cloud%20run-ST_google+cloud+run&gad_source=1&gclid=EAIaIQobChMImJeFgoi0iAMV7CfUAR3n4hZ-EAAYASAAEgJknPD_BwE&gclsrc=aw.ds) more like a managed container to run workloads in the cloud, with no direct access to the VMs or Compute Engine instances.
Author
Owner

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

Not Ideal or my plan, but I created the model with two API request

$URL/api/pull => to pull llama3:8b
$URL/api/create (with the content of the model file ) => to create the bot model

However, it will be nice to just run the container image, which contains all the config, and have it ready to serve

<!-- gh-comment-id:2336823473 --> @Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): Not Ideal or my plan, but I created the model with two API request $URL/api/pull => to pull llama3:8b $URL/api/create (with the content of the model file ) => to create the bot model However, it will be nice to just run the container image, which contains all the config, and have it ready to serve
Author
Owner

@rick-github commented on GitHub (Sep 8, 2024):

It's because the cloud built container has OLLAMA_MODELS=/home/.ollama/models while the locally built container uses OLLAMA_MODELS=/root/.ollama/models. Not sure why, I assume the build or run process in GCP sets some environment variables (maybe HOME) that results in a different path for ollama state. I don't know enough about GCP to fix this the right way, but a workaround is to set HOME in the Dockerfile:

--- Dockerfile.orig	2024-09-08 23:42:50.799039526 +0200
+++ Dockerfile	2024-09-08 23:34:44.897002700 +0200
@@ -5,6 +5,6 @@
 
 WORKDIR /App
 
-RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt
+RUN HOME=/home ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt
 
 EXPOSE 11434

Build and deploy and when the container starts it will see the models:

$ OLLAMA_HOST=https://test-123412341234.us-west1.run.app:443 ollama list
NAME           	ID          	SIZE  	MODIFIED       
ai-agent:latest	3f2762d3ecf4	4.7 GB	18 minutes ago	
llama3:8b      	365c0bd3c000	4.7 GB	18 minutes ago	
$ OLLAMA_HOST=https://test-123412341234.us-west1.run.app:443 ollama run ai-agent hello
Hello! I'm a bot that helps infer the meaning of text. You can provide me with some text, and I'll do my best to understand its meaning and provide you with relevant information or insights.

What would you like to talk about? Do you have any specific topics in mind, or would you like me to suggest some prompts to get us started?
<!-- gh-comment-id:2336838824 --> @rick-github commented on GitHub (Sep 8, 2024): It's because the cloud built container has `OLLAMA_MODELS=/home/.ollama/models` while the locally built container uses `OLLAMA_MODELS=/root/.ollama/models`. Not sure why, I assume the build or run process in GCP sets some environment variables (maybe `HOME`) that results in a different path for ollama state. I don't know enough about GCP to fix this the right way, but a workaround is to set HOME in the Dockerfile: ```diff --- Dockerfile.orig 2024-09-08 23:42:50.799039526 +0200 +++ Dockerfile 2024-09-08 23:34:44.897002700 +0200 @@ -5,6 +5,6 @@ WORKDIR /App -RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt +RUN HOME=/home ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt EXPOSE 11434 ``` Build and deploy and when the container starts it will see the models: ``` $ OLLAMA_HOST=https://test-123412341234.us-west1.run.app:443 ollama list NAME ID SIZE MODIFIED ai-agent:latest 3f2762d3ecf4 4.7 GB 18 minutes ago llama3:8b 365c0bd3c000 4.7 GB 18 minutes ago $ OLLAMA_HOST=https://test-123412341234.us-west1.run.app:443 ollama run ai-agent hello Hello! I'm a bot that helps infer the meaning of text. You can provide me with some text, and I'll do my best to understand its meaning and provide you with relevant information or insights. What would you like to talk about? Do you have any specific topics in mind, or would you like me to suggest some prompts to get us started? ```
Author
Owner

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

Setting HOME fixed the issue. Will be interesting to know if the same issue is observed with similar products from other cloud platforms Azure, AWS, or its just GCP

Thanks @rick-github

<!-- gh-comment-id:2336869574 --> @Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): Setting HOME fixed the issue. Will be interesting to know if the same issue is observed with similar products from other cloud platforms Azure, AWS, or its just GCP Thanks @rick-github
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4218