[GH-ISSUE #3369] Pull a model on start or without requiring serve #48584

New Issue

GiteaMirror · 2026-04-28T08:54:07-05:00

GiteaMirror commented

2026-04-28 08:54:07 -05:00

Originally created by @0x77dev on GitHub (Mar 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3369

What are you trying to do?

To enhance the user experience when deploying Ollama with models in a containerized environment, it would be beneficial to enable pre-embedding a model into the image through a custom Dockerfile or pulling a model upon starting Ollama by specifying an argument or environment variable. This eliminates the need for an API request after the container starts.

How should we solve this?

ollama serve --pull [models]
OLLAMA_PULL=model1,model2 ollama serve
ollama pull without ollama serve (a bit harder to implement option, but improves the ability to create custom images with custom models beyond pulling)

What is the impact of not solving this?

This is a significant improvement for hosting Ollama. Without it, deploying Ollama, especially in a production environment, would be more challenging.

Anything else?

Originally created by @0x77dev on GitHub (Mar 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3369 ### What are you trying to do? To enhance the user experience when deploying Ollama with models in a containerized environment, it would be beneficial to enable pre-embedding a model into the image through a custom Dockerfile or pulling a model upon starting Ollama by specifying an argument or environment variable. This eliminates the need for an API request after the container starts. ### How should we solve this? - `ollama serve --pull [models]` - `OLLAMA_PULL=model1,model2 ollama serve` - `ollama pull` without `ollama serve` (a bit harder to implement option, but improves the ability to create custom images with custom models beyond pulling) ### What is the impact of not solving this? This is a significant improvement for hosting Ollama. Without it, deploying Ollama, especially in a production environment, would be more challenging. ### Anything else? Related: - https://github.com/ollama/ollama/issues/1322 - https://github.com/ollama/ollama/issues/358#issuecomment-2022394098

GiteaMirror added the docker feature request labels 2026-04-28 08:54:08 -05:00

GiteaMirror commented

2026-04-28 08:54:14 -05:00

@ip2cloud commented on GitHub (Apr 2, 2024):

This would be very useful in a docker-compose.yaml. Including raising more than one automatic model after container rising.

I saw that in the kubernetes helm this has the entries:

ollama:
  gpu
:
    # -- Enable GPU integration
    enabled: true
    
    # -- GPU type: 'nvidia' or 'amd'
    type: 'nvidia'
    
    # -- Specify the number of GPU to 2
    number: 2
   
  # -- List of models to pull at container startup
  models: 
    - mistral
    - llama2

no values.yaml
How to make this in docker-compose.yaml

@ip2cloud commented on GitHub (Apr 2, 2024): This would be very useful in a docker-compose.yaml. Including raising more than one automatic model after container rising. I saw that in the kubernetes helm this has the entries: ``` ollama: gpu : # -- Enable GPU integration enabled: true # -- GPU type: 'nvidia' or 'amd' type: 'nvidia' # -- Specify the number of GPU to 2 number: 2 # -- List of models to pull at container startup models: - mistral - llama2 ``` no values.yaml How to make this in docker-compose.yaml

GiteaMirror commented

2026-04-28 08:54:15 -05:00

@shivaraj-bh commented on GitHub (Apr 17, 2024):

decoupling pull from serve will also be very helpful to setup the requirements before starting the server.

I would love to see this implemented

@shivaraj-bh commented on GitHub (Apr 17, 2024): decoupling pull from serve will also be very helpful to setup the requirements before starting the server. I would love to see this implemented

GiteaMirror commented

2026-04-28 08:54:15 -05:00

@Chernegi commented on GitHub (Apr 23, 2024):

You may add to your Dockerfile:

RUN ollama serve & sleep 5 ; ollama pull $model_name ;
echo "kill 'ollama serve' process" ;
ps -ef | grep 'ollama serve' | grep -v grep | awk '{print $2}' | xargs -r kill -9

@Chernegi commented on GitHub (Apr 23, 2024): You may add to your Dockerfile: RUN ollama serve & sleep 5 ; ollama pull $model_name ; \ echo "kill 'ollama serve' process" ; \ ps -ef | grep 'ollama serve' | grep -v grep | awk '{print $2}' | xargs -r kill -9

GiteaMirror commented

2026-04-28 08:54:16 -05:00

@xyproto commented on GitHub (Sep 4, 2024):

Not being able to use ollama pull without ollama serve is problematic when trying to package models as Arch Linux packages.

Being able to package models as packages is useful, because then other packages or applications can depend on both Ollama and a specific model being available.

An example of why the current situation is a bit awkward:

pkgname=ollama-tinyllama
_tag=latest
pkgver=1.0.0
pkgrel=1
pkgdesc='The tinyllama (1B) large language model (LLM), for Ollama'
arch=(any)
url='https://github.com/jzhang38/TinyLlama'
license=(Apache-2.0)
depends=(ollama)
makedepends=(python)

prepare() {
  # Find a free port
  export OLLAMA_HOST=":$(python -c 'import socket; s=socket.socket(socket.AF_INET, socket.SOCK_STREAM); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')"

  # Create a place to keep the models
  mkdir -p models
  export OLLAMA_MODELS="$srcdir/models"

  # Start Ollama
  ollama serve &
  serve_pid=$!

  # Try downloading the model with ollama, wait 1 second if ollama is not ready yet, try 10 times
  for i in {1..10}; do
    ollama pull "${pkgname#ollama-}:$_tag" && break || sleep 1
  done

  # Stop Ollama
  kill $serve_pid
}

package() {
  install -d "$pkgdir/var/lib/ollama"
  cp -r models/. "$pkgdir/var/lib/ollama/"
}

Being able to use ollama pull without having to start Ollama would be useful.

If the models were placed in separate directories, it would also be easier to manage permissions, in the context of Linux packages.

@xyproto commented on GitHub (Sep 4, 2024): Not being able to use `ollama pull` without `ollama serve` is problematic when trying to package models as Arch Linux packages. Being able to package models as packages is useful, because then other packages or applications can depend on both Ollama and a specific model being available. An example of why the current situation is a bit awkward: ```bash pkgname=ollama-tinyllama _tag=latest pkgver=1.0.0 pkgrel=1 pkgdesc='The tinyllama (1B) large language model (LLM), for Ollama' arch=(any) url='https://github.com/jzhang38/TinyLlama' license=(Apache-2.0) depends=(ollama) makedepends=(python) prepare() { # Find a free port export OLLAMA_HOST=":$(python -c 'import socket; s=socket.socket(socket.AF_INET, socket.SOCK_STREAM); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')" # Create a place to keep the models mkdir -p models export OLLAMA_MODELS="$srcdir/models" # Start Ollama ollama serve & serve_pid=$! # Try downloading the model with ollama, wait 1 second if ollama is not ready yet, try 10 times for i in {1..10}; do ollama pull "${pkgname#ollama-}:$_tag" && break || sleep 1 done # Stop Ollama kill $serve_pid } package() { install -d "$pkgdir/var/lib/ollama" cp -r models/. "$pkgdir/var/lib/ollama/" } ``` Being able to use `ollama pull` without having to start Ollama would be useful. If the models were placed in separate directories, it would also be easier to manage permissions, in the context of Linux packages.

GiteaMirror commented

2026-04-28 08:54:17 -05:00

@a-h commented on GitHub (Sep 27, 2024):

Raised PR https://github.com/ollama/ollama/pull/7001

@a-h commented on GitHub (Sep 27, 2024): Raised PR https://github.com/ollama/ollama/pull/7001

GiteaMirror commented

2026-04-28 08:54:18 -05:00

@a-h commented on GitHub (Sep 27, 2024):

With the changes, I'm able to define the models I want to have present in a Nix Flake. In my case, I'm using version 3.11 as a base, and I've applied my PR as a patch to test it.

      # Wrap ollama so that we can set environment variables to provide models.
      wrappedOllama = system: pkgs:
        let
          #TODO: When https://github.com/ollama/ollama/pull/7001 is merged and the unstable 
          # nixpkgs uses the version with it, we can remove the src and vendorHash overrides,
          # keeping the acceleration override.
          ollama = (pkgs.ollama.overrideAttrs {
            version = "3.11-patch";
            src = pkgs.fetchFromGitHub {
              owner = "a-h";
              repo = "ollama";
              rev = "42e790d02524f5f461eb241d88de12cf6d9afdb2";
              fetchSubmodules = true;
              hash = "sha256-R7KT1Vg4VRtoI1lXBiIKbQJQfxn6sAYXBwAisl1MN5c=";
            };
            vendorHash = "sha256-hSxcREAujhvzHVNwnRTfhi0MKI3s8HNavER2VLz6SYk=";
          }).override
            (oldAttrs: {
              acceleration =
                if system == "aarch64-darwin" || system == "x86_64-darwin" # If darwin, use metal.
                then null
                else "cuda"; # If linux, use cuda. (change manually to "rocm" for AMD GPUs)
            });
          models = pkgs.runCommand "pull-models" { } ''
            export HOME="$out"
            ${ollama}/bin/ollama pull mistral-nemo --local
            ${ollama}/bin/ollama pull nomic-embedded-text --local
          '';
          wrapped = pkgs.writeShellScriptBin "ollama" ''
            export HOME="${models}"
            export OLLAMA_MODELS="${models}/.ollama/models"
            exec ${ollama}/bin/ollama "$@"
          '';
        in
        pkgs.symlinkJoin {
          name = "ollama";
          paths = [
            models
            wrapped
            ollama
          ];
        };

Then, I can override the ollama that's in nixpkgs with my expression that preloads models:

      forAllSystems = f: nixpkgs.lib.genAttrs allSystems (system: f {
        system = system;
        pkgs = import nixpkgs {
          inherit system;
          overlays = [
            # Use ollama from unstableNixPkgs, because it's a bit more
            # bleeding edge.
            (final: prev: {
              ollama = (wrappedOllama system (unstableNixPkgs system));
            })
          ];
        };
      });

@a-h commented on GitHub (Sep 27, 2024): With the changes, I'm able to define the models I want to have present in a Nix Flake. In my case, I'm using version 3.11 as a base, and I've applied my PR as a patch to test it. ```nix # Wrap ollama so that we can set environment variables to provide models. wrappedOllama = system: pkgs: let #TODO: When https://github.com/ollama/ollama/pull/7001 is merged and the unstable # nixpkgs uses the version with it, we can remove the src and vendorHash overrides, # keeping the acceleration override. ollama = (pkgs.ollama.overrideAttrs { version = "3.11-patch"; src = pkgs.fetchFromGitHub { owner = "a-h"; repo = "ollama"; rev = "42e790d02524f5f461eb241d88de12cf6d9afdb2"; fetchSubmodules = true; hash = "sha256-R7KT1Vg4VRtoI1lXBiIKbQJQfxn6sAYXBwAisl1MN5c="; }; vendorHash = "sha256-hSxcREAujhvzHVNwnRTfhi0MKI3s8HNavER2VLz6SYk="; }).override (oldAttrs: { acceleration = if system == "aarch64-darwin" || system == "x86_64-darwin" # If darwin, use metal. then null else "cuda"; # If linux, use cuda. (change manually to "rocm" for AMD GPUs) }); models = pkgs.runCommand "pull-models" { } '' export HOME="$out" ${ollama}/bin/ollama pull mistral-nemo --local ${ollama}/bin/ollama pull nomic-embedded-text --local ''; wrapped = pkgs.writeShellScriptBin "ollama" '' export HOME="${models}" export OLLAMA_MODELS="${models}/.ollama/models" exec ${ollama}/bin/ollama "$@" ''; in pkgs.symlinkJoin { name = "ollama"; paths = [ models wrapped ollama ]; }; ``` Then, I can override the `ollama` that's in nixpkgs with my expression that preloads models: ```nix forAllSystems = f: nixpkgs.lib.genAttrs allSystems (system: f { system = system; pkgs = import nixpkgs { inherit system; overlays = [ # Use ollama from unstableNixPkgs, because it's a bit more # bleeding edge. (final: prev: { ollama = (wrappedOllama system (unstableNixPkgs system)); }) ]; }; }); ```

GiteaMirror commented

2026-04-28 08:54:19 -05:00

@a-h commented on GitHub (Oct 4, 2024):

@dhiltgen - You recently closed an #7046 saying that it was tracked here. Not sure if you saw that I raised a PR for this? Thanks!

@a-h commented on GitHub (Oct 4, 2024): @dhiltgen - You recently closed an #7046 saying that it was tracked here. Not sure if you saw that I raised a PR for this? Thanks!

GiteaMirror commented

2026-04-28 08:54:20 -05:00

@ghost commented on GitHub (Jan 22, 2025):

is there a reason I still can't ollama pull before serve in CLI? i must have downloaded the models manually the first time, because if you don't specify models before ollama serve, you are forced to download a 1.6GB model. over networks with high security, this takes a really long time, and there's no progress bar even indicating a download is ongoing. i only found out it was doing this after leaving it up on accident searching for a solution. but I can't pull a model using ollama pull, and I don't mind downloading it manually and changing paths - it's just a bit problematic it forces a silent 16 part 100MB each download and doesn't log any of it until it's done. In my case it's the same time investment, but arriving to the conclusion that was the problem at all was an entire day of red herrings, in some cases it can be because the port is not appropriate or firewall issues, and in others, it actually finishes, so it's pretty time consuming. both outputs in the working default port case and a port that you don't have proper permissions to listen to look identical in output up to the point where it silent downloads. ollama -v shows the default port working, ollama -v shows the custom port is not running with the common message asking you to run ollama.

@ghost commented on GitHub (Jan 22, 2025): is there a reason I still can't ollama pull before serve in CLI? i must have downloaded the models manually the first time, because if you don't specify models before ollama serve, you are forced to download a 1.6GB model. over networks with high security, this takes a really long time, and there's no progress bar even indicating a download is ongoing. i only found out it was doing this after leaving it up on accident searching for a solution. but I can't pull a model using ollama pull, and I don't mind downloading it manually and changing paths - it's just a bit problematic it forces a silent 16 part 100MB each download and doesn't log any of it until it's done. In my case it's the same time investment, but arriving to the conclusion that was the problem at all was an entire day of red herrings, in some cases it can be because the port is not appropriate or firewall issues, and in others, it actually finishes, so it's pretty time consuming. both outputs in the working default port case and a port that you don't have proper permissions to listen to look identical in output up to the point where it silent downloads. ollama -v shows the default port working, ollama -v shows the custom port is not running with the common message asking you to run ollama.

GiteaMirror commented

2026-04-28 08:54:23 -05:00

@xyproto commented on GitHub (Jan 22, 2025):

This utility may, maybe, possibly, be helpful to you:

https://github.com/xyproto/ollamaurl

@xyproto commented on GitHub (Jan 22, 2025): This utility may, maybe, possibly, be helpful to you: https://github.com/xyproto/ollamaurl

GiteaMirror commented

2026-04-28 08:54:24 -05:00

@ghost commented on GitHub (Jan 22, 2025):

Thanks. Do you still think it'd be useful given I don't even run ollama pull? Really, my main issue is just signaling to the user a process that can create large time investment in iteration. A download bar during serve, people still saying to use ollama serve and ollama pull model separately, it's hard to know even when I know what model is being downloaded, if ollama serve is actually invoking ollama pull when there isn't a model. It's hard to understand why I'm still seeing an implicit dependence in my experience by nature of expected logic flow. everything seems to imply a different schema. i just assert either it should be more deliberate, or the decoupling should work in a way that you can serve first and then pull model after. i can't pull without serving first, and my two options are scp from faster network or wait. And the latter implies so much more is wrong than actually is because there's just not a simple tdqm style progress tracker on an already split download. could just print chunks as they finish. i didn't even know this was happening at all.

@ghost commented on GitHub (Jan 22, 2025): Thanks. Do you still think it'd be useful given I don't even run ollama pull? Really, my main issue is just signaling to the user a process that can create large time investment in iteration. A download bar during serve, people still saying to use ollama serve and ollama pull model separately, it's hard to know even when I know what model is being downloaded, if ollama serve is actually invoking ollama pull when there isn't a model. It's hard to understand why I'm still seeing an implicit dependence in my experience by nature of expected logic flow. everything seems to imply a different schema. i just assert either it should be more deliberate, or the decoupling should work in a way that you can serve first and then pull model after. i can't pull without serving first, and my two options are scp from faster network or wait. And the latter implies so much more is wrong than actually is because there's just not a simple tdqm style progress tracker on an already split download. could just print chunks as they finish. i didn't even know this was happening at all.

GiteaMirror commented

2026-04-28 08:54:25 -05:00

@ghost commented on GitHub (Jan 22, 2025):

Could also be misdiagnosing, I just notice after the GPU specs, it hangs for 15 minutes. it then shows api calls trying to be made later, and eventually, it will do one more api call and then say 16 chunks of 100MB were downloaded or something similar.

time=2025-01-22T10:59:34.528-06:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-01-22T10:59:34.767-06:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-473729a7-a78c-bd5a-eea8-9888394b121a library=cuda variant=v12 compute=8.9 driver=12.7 name="NVIDIA GeForce RTX 4090" total="23.5 GiB" available="23.1 GiB"
[GIN] 2025/01/22 - 11:22:10 | 200 | 99.533µs | 10.1.120.100 | GET "/"
[GIN] 2025/01/22 - 11:22:10 | 404 | 7.824µs | 10.1.120.100 | GET "/favicon.ico"

@ghost commented on GitHub (Jan 22, 2025): Could also be misdiagnosing, I just notice after the GPU specs, it hangs for 15 minutes. it then shows api calls trying to be made later, and eventually, it will do one more api call and then say 16 chunks of 100MB were downloaded or something similar. time=2025-01-22T10:59:34.528-06:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs" time=2025-01-22T10:59:34.767-06:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-473729a7-a78c-bd5a-eea8-9888394b121a library=cuda variant=v12 compute=8.9 driver=12.7 name="NVIDIA GeForce RTX 4090" total="23.5 GiB" available="23.1 GiB" [GIN] 2025/01/22 - 11:22:10 | 200 | 99.533µs | 10.1.120.100 | GET "/" [GIN] 2025/01/22 - 11:22:10 | 404 | 7.824µs | 10.1.120.100 | GET "/favicon.ico"

GiteaMirror commented

2026-04-28 08:54:25 -05:00

@khteh commented on GitHub (Apr 4, 2025):

https://github.com/ollama/ollama/issues/10122

@khteh commented on GitHub (Apr 4, 2025): https://github.com/ollama/ollama/issues/10122

GiteaMirror commented

2026-04-28 08:54:26 -05:00

@otobonh commented on GitHub (Sep 10, 2025):

You may add to your Dockerfile:

RUN ollama serve & sleep 5 ; ollama pull $model_name ; echo "kill 'ollama serve' process" ; ps -ef | grep 'ollama serve' | grep -v grep | awk '{print $2}' | xargs -r kill -9

This worked for me. Thanks

@otobonh commented on GitHub (Sep 10, 2025): > You may add to your Dockerfile: > > RUN ollama serve & sleep 5 ; ollama pull $model_name ; echo "kill 'ollama serve' process" ; ps -ef | grep 'ollama serve' | grep -v grep | awk '{print $2}' | xargs -r kill -9 This worked for me. Thanks

GiteaMirror commented

2026-04-28 08:54:26 -05:00

@BOPOHA commented on GitHub (Feb 18, 2026):

I know this doesn't solve the core issue of needing the server running to pull models, but here is a working workaround for Kubernetes users.

spec:
  containers:
    - name: ollama
      image: ollama/ollama:0.3.12
      command: ["/bin/sh"]
      args:
        - "-c"
        - |
          set -eu
          ollama serve &
          until ollama list >/dev/null 2>&1; do sleep 1; done
          ollama pull hf.co/Qwen/Qwen3-Embedding-0.6B-GGUF:Q8_0
          ollama pull nomic-embed-text
          wait

You may add to your Dockerfile:

RUN ollama serve & sleep 5 ; ollama pull $model_name ; echo "kill 'ollama serve' process" ; ps -ef | grep 'ollama serve' | grep -v grep | awk '{print $2}' | xargs -r kill -9

lol

RUN ollama serve & \
    PID=$! && \
    sleep 5 && \
    ollama pull $model_name && \
    kill -9 $PID

this approach is more concise and relies on the shell’s built-in $! variable to terminate the background process explicitly:

@BOPOHA commented on GitHub (Feb 18, 2026): I know this doesn't solve the core issue of needing the server running to pull models, but here is a working workaround for Kubernetes users. ```yaml spec: containers: - name: ollama image: ollama/ollama:0.3.12 command: ["/bin/sh"] args: - "-c" - | set -eu ollama serve & until ollama list >/dev/null 2>&1; do sleep 1; done ollama pull hf.co/Qwen/Qwen3-Embedding-0.6B-GGUF:Q8_0 ollama pull nomic-embed-text wait ``` --- > You may add to your Dockerfile: > > RUN ollama serve & sleep 5 ; ollama pull $model_name ; echo "kill 'ollama serve' process" ; ps -ef | grep 'ollama serve' | grep -v grep | awk '{print $2}' | xargs -r kill -9 lol ``` RUN ollama serve & \ PID=$! && \ sleep 5 && \ ollama pull $model_name && \ kill -9 $PID ``` this approach is more concise and relies on the shell’s built-in `$!` variable to terminate the background process explicitly:

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#48584