[GH-ISSUE #6702] Problem Serving Custom LLAMA3 Using Google Cloud Run #4218

New Issue

GiteaMirror · 2026-04-12T15:09:13-05:00

GiteaMirror commented

2026-04-12 15:09:13 -05:00

Originally created by @Oluwafemi-Jegede on GitHub (Sep 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6702

What is the issue?

I can run a custom LLAMA3 model locally using this docker config

FROM ollama/ollama:latest


COPY custom_llama.txt /App/custom_llama.txt

WORKDIR /App

RUN ollama serve & sleep 5 && ollama create ai-agent -f custom_llama.txt && ollama run ai-agent

EXPOSE 11434

However when I deploy on GCP cloud run, I don't see any model running. $URL/api/tags = {"models":[]}, but it says ollama running on the homepage

FYI: Custom model is LLAMA3:8B

OS

Docker

GPU

No response

CPU

No response

Ollama version

LLAMA3

Originally created by @Oluwafemi-Jegede on GitHub (Sep 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6702 ### What is the issue? I can run a custom LLAMA3 model locally using this docker config ``` FROM ollama/ollama:latest COPY custom_llama.txt /App/custom_llama.txt WORKDIR /App RUN ollama serve & sleep 5 && ollama create ai-agent -f custom_llama.txt && ollama run ai-agent EXPOSE 11434 ``` However when I deploy on GCP cloud run, I don't see any model running. `$URL/api/tags = {"models":[]}`, but it says `ollama running` on the homepage FYI: Custom model is LLAMA3:8B ### OS Docker ### GPU _No response_ ### CPU _No response_ ### Ollama version LLAMA3

GiteaMirror added the docker question labels 2026-04-12 15:09:13 -05:00

GiteaMirror closed this issue

2026-04-12 15:09:13 -05:00

GiteaMirror commented

2026-04-12 15:09:14 -05:00

@rick-github commented on GitHub (Sep 8, 2024):

What's in custom_llama.txt?

@rick-github commented on GitHub (Sep 8, 2024): What's in `custom_llama.txt`?

GiteaMirror commented

2026-04-12 15:09:14 -05:00

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

What's in custom_llama.txt? @rick-github

FROM llama3:8b

PARAMETER temperature 0.8
PARAMETER top_k 30
PARAMETER top_p 0.7

TEMPLATE """
{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>
"""

SYSTEM You are a bot that helps infer ........

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): > What's in `custom_llama.txt`? @rick-github FROM llama3:8b PARAMETER temperature 0.8 PARAMETER top_k 30 PARAMETER top_p 0.7 PARAMETER stop <|start_header_id|> PARAMETER stop <|end_header_id|> PARAMETER stop <|eot_id|> PARAMETER stop <|reserved_special_token TEMPLATE """ {{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|> """ SYSTEM You are a bot that helps infer ........

GiteaMirror commented

2026-04-12 15:09:14 -05:00

@rick-github commented on GitHub (Sep 8, 2024):

Where is the llama3:8b model located?

@rick-github commented on GitHub (Sep 8, 2024): Where is the `llama3:8b` model located?

GiteaMirror commented

2026-04-12 15:09:15 -05:00

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

I am unsure if I understand what you mean, but shouldn't this FROM ollama/ollama:latest in the docker file already resolve that?

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): I am unsure if I understand what you mean, but shouldn't this `FROM ollama/ollama:latest` in the docker file already resolve that?

GiteaMirror commented

2026-04-12 15:09:15 -05:00

@rick-github commented on GitHub (Sep 8, 2024):

FROM ollama/ollama:latest just pulls the program, not any models. If you want to create a new model, you need to pull the model you want to base your custom one on: ollama pull llama3:8b.

@rick-github commented on GitHub (Sep 8, 2024): `FROM ollama/ollama:latest` just pulls the program, not any models. If you want to create a new model, you need to pull the model you want to base your custom one on: `ollama pull llama3:8b`.

GiteaMirror commented

2026-04-12 15:09:15 -05:00

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

@rick-github Okay thanks so the docker file should look like this?

FROM ollama/ollama:latest


COPY custom_llama.txt /App/custom_llama.txt

WORKDIR /App

RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt && ollama run ai-agent

EXPOSE 11434

Also curious how it runs locally without running the ollama application in the background

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): @rick-github Okay thanks so the docker file should look like this? ``` FROM ollama/ollama:latest COPY custom_llama.txt /App/custom_llama.txt WORKDIR /App RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt && ollama run ai-agent EXPOSE 11434 ``` Also curious how it runs locally without running the ollama application in the background

GiteaMirror commented

2026-04-12 15:09:15 -05:00

@rick-github commented on GitHub (Sep 8, 2024):

The RUN commands you have there are only running during the container build process, the container automatically starts the ollama server when it's instantiated, so when running locally it's just ready. The final ollama run ai-agent is unnecessary.

@rick-github commented on GitHub (Sep 8, 2024): The RUN commands you have there are only running during the container build process, the container automatically starts the ollama server when it's instantiated, so when running locally it's just ready. The final `ollama run ai-agent` is unnecessary.

GiteaMirror commented

2026-04-12 15:09:15 -05:00

@rick-github commented on GitHub (Sep 8, 2024):

Note that the way you are doing this, every time you build the container, ollama will re-pull the model, which can be slow, error prone, and impactful on your bandwidth budget. It may be better to pull the model to your work space just once, and then COPY the model in to the container during the build process.

@rick-github commented on GitHub (Sep 8, 2024): Note that the way you are doing this, every time you build the container, ollama will re-pull the model, which can be slow, error prone, and impactful on your bandwidth budget. It may be better to pull the model to your work space just once, and then COPY the model in to the container during the build process.

GiteaMirror commented

2026-04-12 15:09:16 -05:00

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

@rick-github Yeah, thanks for the suggestion will try COPY to reduce overhead, I tried Dockerfile below and I still can not see any model on cloud run after adding the pull command

FROM ollama/ollama:latest


COPY custom_llama.txt /App/custom_llama.txt

WORKDIR /App

RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt

EXPOSE 11434

$URL/api/tags => {"models":[]}

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): @rick-github Yeah, thanks for the suggestion will try COPY to reduce overhead, I tried Dockerfile below and I still can not see any model on cloud run after adding the pull command ``` FROM ollama/ollama:latest COPY custom_llama.txt /App/custom_llama.txt WORKDIR /App RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt EXPOSE 11434 ``` `$URL/api/tags => {"models":[]}`

GiteaMirror commented

2026-04-12 15:09:16 -05:00

@rick-github commented on GitHub (Sep 8, 2024):

Worked locally for me. I don't have a GCP account so can't test cloud run. Do you get any logs from the GCP attempt?

Build:

$  docker build -f Dockerfile -t 6702 --progress plain .
...
#8 0.180 2024/09/08 18:12:46 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
...
pulling manifest
#8 81.61 pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
...
#8 81.61 success
...
#8 81.66 transferring model data
#8 81.66 using existing layer sha256:6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa
...
#8 81.66 success
...
#9 exporting layers 14.9s done
#9 writing image sha256:9c602d2c645c0ced9f6010250c0f7876d771c5634e799cfdcc6c335ed55fc4d6 done
#9 naming to docker.io/library/6702 done
#9 DONE 14.9s

Run:

$ docker run -d --name 6702 6702
4faf0f6f995e88003c274200f743a50615f4146f15e0965cdbed306e89f3c04a
$ docker exec -it 6702 bash
root@4faf0f6f995e:/App# ollama list
NAME            ID              SIZE    MODIFIED
ai-agent:latest 3f2762d3ecf4    4.7 GB  7 minutes ago
llama3:8b       365c0bd3c000    4.7 GB  7 minutes ago
root@4faf0f6f995e:/App# ollama run ai-agent:latest hello
Hello! I'm a bot designed to help infer information from text-based input. I can assist with tasks such as answering questions, summarizing content, and generating ideas. What would you like to talk about
or ask?

root@4faf0f6f995e:/App#

@rick-github commented on GitHub (Sep 8, 2024): Worked locally for me. I don't have a GCP account so can't test cloud run. Do you get any logs from the GCP attempt? Build: ``` $ docker build -f Dockerfile -t 6702 --progress plain . ... #8 0.180 2024/09/08 18:12:46 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" ... pulling manifest #8 81.61 pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB ... #8 81.61 success ... #8 81.66 transferring model data #8 81.66 using existing layer sha256:6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa ... #8 81.66 success ... #9 exporting layers 14.9s done #9 writing image sha256:9c602d2c645c0ced9f6010250c0f7876d771c5634e799cfdcc6c335ed55fc4d6 done #9 naming to docker.io/library/6702 done #9 DONE 14.9s ``` Run: ``` $ docker run -d --name 6702 6702 4faf0f6f995e88003c274200f743a50615f4146f15e0965cdbed306e89f3c04a $ docker exec -it 6702 bash root@4faf0f6f995e:/App# ollama list NAME ID SIZE MODIFIED ai-agent:latest 3f2762d3ecf4 4.7 GB 7 minutes ago llama3:8b 365c0bd3c000 4.7 GB 7 minutes ago root@4faf0f6f995e:/App# ollama run ai-agent:latest hello Hello! I'm a bot designed to help infer information from text-based input. I can assist with tasks such as answering questions, summarizing content, and generating ideas. What would you like to talk about or ask? root@4faf0f6f995e:/App# ```

GiteaMirror commented

2026-04-12 15:09:16 -05:00

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

Yeah same here, works perfectly locally for me but when I move to cloud it just showsollama is running

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): Yeah same here, works perfectly locally for me but when I move to cloud it just shows` ollama is running `

GiteaMirror commented

2026-04-12 15:09:16 -05:00

@rick-github commented on GitHub (Sep 8, 2024):

Are you running it in a VM instance in the cloud, or just the container with gcloud compute instances create-with-container?

@rick-github commented on GitHub (Sep 8, 2024): Are you running it in a VM instance in the cloud, or just the container with `gcloud compute instances create-with-container`?

GiteaMirror commented

2026-04-12 15:09:17 -05:00

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

so I am using Google Cloud Run more like a managed container to run workloads in the cloud, with no direct access to the VMs or Compute Engine instances.

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): so I am using [Google Cloud Run ](https://cloud.google.com/run/?utm_source=google&utm_medium=cpc&utm_campaign=na-CA-all-en-dr-bkws-all-all-trial-e-dr-1707554&utm_content=text-ad-none-any-DEV_c-CRE_665735485586-ADGP_Hybrid+%7C+BKWS+-+MIX+%7C+Txt-Serverless+Computing-Cloud+Run-KWID_43700077225654501-kwd-678836618089&utm_term=KW_google%20cloud%20run-ST_google+cloud+run&gad_source=1&gclid=EAIaIQobChMImJeFgoi0iAMV7CfUAR3n4hZ-EAAYASAAEgJknPD_BwE&gclsrc=aw.ds) more like a managed container to run workloads in the cloud, with no direct access to the VMs or Compute Engine instances.

GiteaMirror commented

2026-04-12 15:09:17 -05:00

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

Not Ideal or my plan, but I created the model with two API request

$URL/api/pull => to pull llama3:8b
$URL/api/create (with the content of the model file ) => to create the bot model

However, it will be nice to just run the container image, which contains all the config, and have it ready to serve

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): Not Ideal or my plan, but I created the model with two API request $URL/api/pull => to pull llama3:8b $URL/api/create (with the content of the model file ) => to create the bot model However, it will be nice to just run the container image, which contains all the config, and have it ready to serve

GiteaMirror commented

2026-04-12 15:09:18 -05:00

@rick-github commented on GitHub (Sep 8, 2024):

It's because the cloud built container has OLLAMA_MODELS=/home/.ollama/models while the locally built container uses OLLAMA_MODELS=/root/.ollama/models. Not sure why, I assume the build or run process in GCP sets some environment variables (maybe HOME) that results in a different path for ollama state. I don't know enough about GCP to fix this the right way, but a workaround is to set HOME in the Dockerfile:

--- Dockerfile.orig	2024-09-08 23:42:50.799039526 +0200
+++ Dockerfile	2024-09-08 23:34:44.897002700 +0200
@@ -5,6 +5,6 @@
 
 WORKDIR /App
 
-RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt
+RUN HOME=/home ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt
 
 EXPOSE 11434

Build and deploy and when the container starts it will see the models:

$ OLLAMA_HOST=https://test-123412341234.us-west1.run.app:443 ollama list
NAME           	ID          	SIZE  	MODIFIED       
ai-agent:latest	3f2762d3ecf4	4.7 GB	18 minutes ago	
llama3:8b      	365c0bd3c000	4.7 GB	18 minutes ago	
$ OLLAMA_HOST=https://test-123412341234.us-west1.run.app:443 ollama run ai-agent hello
Hello! I'm a bot that helps infer the meaning of text. You can provide me with some text, and I'll do my best to understand its meaning and provide you with relevant information or insights.

What would you like to talk about? Do you have any specific topics in mind, or would you like me to suggest some prompts to get us started?

@rick-github commented on GitHub (Sep 8, 2024): It's because the cloud built container has `OLLAMA_MODELS=/home/.ollama/models` while the locally built container uses `OLLAMA_MODELS=/root/.ollama/models`. Not sure why, I assume the build or run process in GCP sets some environment variables (maybe `HOME`) that results in a different path for ollama state. I don't know enough about GCP to fix this the right way, but a workaround is to set HOME in the Dockerfile: ```diff --- Dockerfile.orig 2024-09-08 23:42:50.799039526 +0200 +++ Dockerfile 2024-09-08 23:34:44.897002700 +0200 @@ -5,6 +5,6 @@ WORKDIR /App -RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt +RUN HOME=/home ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt EXPOSE 11434 ``` Build and deploy and when the container starts it will see the models: ``` $ OLLAMA_HOST=https://test-123412341234.us-west1.run.app:443 ollama list NAME ID SIZE MODIFIED ai-agent:latest 3f2762d3ecf4 4.7 GB 18 minutes ago llama3:8b 365c0bd3c000 4.7 GB 18 minutes ago $ OLLAMA_HOST=https://test-123412341234.us-west1.run.app:443 ollama run ai-agent hello Hello! I'm a bot that helps infer the meaning of text. You can provide me with some text, and I'll do my best to understand its meaning and provide you with relevant information or insights. What would you like to talk about? Do you have any specific topics in mind, or would you like me to suggest some prompts to get us started? ```

GiteaMirror commented

2026-04-12 15:09:18 -05:00

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024):

Setting HOME fixed the issue. Will be interesting to know if the same issue is observed with similar products from other cloud platforms Azure, AWS, or its just GCP

Thanks @rick-github

@Oluwafemi-Jegede commented on GitHub (Sep 8, 2024): Setting HOME fixed the issue. Will be interesting to know if the same issue is observed with similar products from other cloud platforms Azure, AWS, or its just GCP Thanks @rick-github

GiteaMirror referenced this issue

2026-04-12 23:29:47 -05:00

[PR #4218] [MERGED] Enable concurrency by default #11418

GiteaMirror referenced this issue

2026-04-12 23:38:21 -05:00

[PR #5364] [MERGED] Document concurrent behavior and settings #11758

GiteaMirror referenced this issue

2026-04-16 05:40:00 -05:00

[PR #4218] [MERGED] Enable concurrency by default #16689

GiteaMirror referenced this issue

2026-04-16 05:50:59 -05:00

[PR #5364] [MERGED] Document concurrent behavior and settings #17029

GiteaMirror referenced this issue

2026-04-19 15:58:56 -05:00

[PR #4218] [MERGED] Enable concurrency by default #21958

GiteaMirror referenced this issue

2026-04-19 16:14:10 -05:00

[PR #5364] [MERGED] Document concurrent behavior and settings #22298

GiteaMirror referenced this issue

2026-04-22 22:00:16 -05:00

[PR #4218] [MERGED] Enable concurrency by default #37291

GiteaMirror referenced this issue

2026-04-22 22:18:26 -05:00

[PR #5364] [MERGED] Document concurrent behavior and settings #37631

GiteaMirror referenced this issue

2026-04-24 22:24:39 -05:00

[PR #4218] [MERGED] Enable concurrency by default #42666

GiteaMirror referenced this issue

2026-04-24 22:43:22 -05:00

[PR #5364] [MERGED] Document concurrent behavior and settings #43006

GiteaMirror referenced this issue

2026-04-29 12:56:13 -05:00

[PR #4218] [MERGED] Enable concurrency by default #58115

GiteaMirror referenced this issue

2026-04-29 13:20:40 -05:00

[PR #5364] [MERGED] Document concurrent behavior and settings #58455

GiteaMirror referenced this issue

2026-05-05 05:38:44 -05:00

[PR #4218] [MERGED] Enable concurrency by default #73712

GiteaMirror referenced this issue

2026-05-05 06:01:00 -05:00

[PR #5364] [MERGED] Document concurrent behavior and settings #74052

Sign in to join this conversation.

Branches Tags

main

parth-update-hermes-launch

parth-agent-system-prompt-cwd

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-fix-claude-model-picker

parth-api-status-context-length

docs/vscode-extension-setup

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#4218