[GH-ISSUE #1309] [WSL2] Cuda error 222 : the provided PTX was compiled with an unsupported toolchain. #62714

New Issue

GiteaMirror · 2026-05-03T10:03:46-05:00

GiteaMirror commented

2026-05-03 10:03:46 -05:00

Originally created by @fxrobin on GitHub (Nov 29, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1309

Originally assigned to: @dhiltgen on GitHub.

On Windows WSL2, with Cuda Toolkit Installed and Cuda-Container-Toolkit installed, I'm facing this issue running the official Docker image :

ollama-ollama-1    | 2023/11/29 00:36:04 llama.go:292: 3676 MB VRAM available, loading up to 21 GPU layers
ollama-ollama-1    | 2023/11/29 00:36:04 llama.go:421: starting llama runner
ollama-ollama-1    | 2023/11/29 00:36:04 llama.go:479: waiting for llama runner to start responding
ollama-ollama-1    | ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ollama-ollama-1    | ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ollama-ollama-1    | ggml_init_cublas: found 1 CUDA devices:
ollama-ollama-1    |   Device 0: NVIDIA RTX A1000 Laptop GPU, compute capability 8.6
ollama-ollama-1    |
ollama-ollama-1    | CUDA error 222 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5965: the provided PTX was compiled with an unsupported toolchain.
ollama-ollama-1    | current device: 0
ollama-ollama-1    | 2023/11/29 00:36:04 llama.go:436: 222 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5965: the provided PTX was compiled with an unsupported toolchain.
ollama-ollama-1    | current device: 0
ollama-ollama-1    | 2023/11/29 00:36:04 llama.go:444: error starting llama runner: llama runner process has terminated
ollama-ollama-1    | 2023/11/29 00:36:04 llama.go:510: llama runner stopped successfully
ollama-ollama-1    | 2023/11/29 00:36:04 llama.go:421: starting llama runner
ollama-ollama-1    | 2023/11/29 00:36:04 llama.go:479: waiting for llama runner to start responding
ollama-ollama-1    | {"timestamp":1701218164,"level":"WARNING","function":"server_params_parse","line":2035,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}
ollama-ollama-1    | {"timestamp":1701218164,"level":"INFO","function":"main","line":2534,"message":"build info","build":375,"commit":"9656026"}
ollama-ollama-1    | {"timestamp":1701218164,"level":"INFO","function":"main","line":2537,"message":"system info","n_threads":12,"n_threads_batch":-1,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}
ollama-ollama-1    | llama_model_loader: loaded meta data with 18 key-value pairs and 196 tensors from /root/.ollama/models/blobs/sha256:305c4103a989d3f8ac457f912af30f32693f20dcffe1495e18c2ed7b5596b2d1 (version GGUF V2)

So Ollama is not using my GPU.

When I check if Docker can use my GPU, it seems OK :

Tue Nov 28 23:56:24 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.91       Driver Version: 517.89       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A100...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   38C    P8     3W /  N/A |    323MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A        22      G   /Xwayland                       N/A      |
+-----------------------------------------------------------------------------+

On Ollama startup, no warning about not accessing GPU :

ollama-ollama-1    | 2023/11/29 00:07:32 images.go:784: total blobs: 15
ollama-ollama-1    | 2023/11/29 00:07:32 images.go:791: total unused blobs removed: 0
ollama-ollama-1    | 2023/11/29 00:07:32 routes.go:777: Listening on [::]:11434 (version 0.1.12)

Here is my distribution :

$ uname -a
Linux FRLFK0635009890 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy

Models :

root@de433da63a97:/# ollama list
NAME                    ID              SIZE    MODIFIED
codellama:latest        8fdf8f752f6e    3.8 GB  51 minutes ago
codeup:latest           54289661f7a9    7.4 GB  39 minutes ago
falcon:latest           4280f7257e73    4.2 GB  34 minutes ago

When I have a look at the source code of ggml-cuda.cu :

for (int id = 0; id < g_device_count; ++id) {
            CUDA_CHECK(ggml_cuda_set_device(id));

            // create cuda streams
            for (int is = 0; is < MAX_STREAMS; ++is) {
                CUDA_CHECK(cudaStreamCreateWithFlags(&g_cudaStreams[id][is], cudaStreamNonBlocking));
            }

            // create cublas handle
            CUBLAS_CHECK(cublasCreate(&g_cublas_handles[id]));
            CUBLAS_CHECK(cublasSetMathMode(g_cublas_handles[id], CUBLAS_TF32_TENSOR_OP_MATH));
        }

The error is raised by CUDA_CHECK(cudaStreamCreateWithFlags(&g_cudaStreams[id][is], cudaStreamNonBlocking)); in the for loop.

Originally created by @fxrobin on GitHub (Nov 29, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1309 Originally assigned to: @dhiltgen on GitHub. On Windows WSL2, with Cuda Toolkit Installed and Cuda-Container-Toolkit installed, I'm facing this issue running the official Docker image : ``` ollama-ollama-1 | 2023/11/29 00:36:04 llama.go:292: 3676 MB VRAM available, loading up to 21 GPU layers ollama-ollama-1 | 2023/11/29 00:36:04 llama.go:421: starting llama runner ollama-ollama-1 | 2023/11/29 00:36:04 llama.go:479: waiting for llama runner to start responding ollama-ollama-1 | ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ollama-ollama-1 | ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ollama-ollama-1 | ggml_init_cublas: found 1 CUDA devices: ollama-ollama-1 | Device 0: NVIDIA RTX A1000 Laptop GPU, compute capability 8.6 ollama-ollama-1 | ollama-ollama-1 | CUDA error 222 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5965: the provided PTX was compiled with an unsupported toolchain. ollama-ollama-1 | current device: 0 ollama-ollama-1 | 2023/11/29 00:36:04 llama.go:436: 222 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5965: the provided PTX was compiled with an unsupported toolchain. ollama-ollama-1 | current device: 0 ollama-ollama-1 | 2023/11/29 00:36:04 llama.go:444: error starting llama runner: llama runner process has terminated ollama-ollama-1 | 2023/11/29 00:36:04 llama.go:510: llama runner stopped successfully ollama-ollama-1 | 2023/11/29 00:36:04 llama.go:421: starting llama runner ollama-ollama-1 | 2023/11/29 00:36:04 llama.go:479: waiting for llama runner to start responding ollama-ollama-1 | {"timestamp":1701218164,"level":"WARNING","function":"server_params_parse","line":2035,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1} ollama-ollama-1 | {"timestamp":1701218164,"level":"INFO","function":"main","line":2534,"message":"build info","build":375,"commit":"9656026"} ollama-ollama-1 | {"timestamp":1701218164,"level":"INFO","function":"main","line":2537,"message":"system info","n_threads":12,"n_threads_batch":-1,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "} ollama-ollama-1 | llama_model_loader: loaded meta data with 18 key-value pairs and 196 tensors from /root/.ollama/models/blobs/sha256:305c4103a989d3f8ac457f912af30f32693f20dcffe1495e18c2ed7b5596b2d1 (version GGUF V2) ``` So Ollama is not using my GPU. When I check if Docker can use my GPU, it seems OK : ```$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Tue Nov 28 23:56:24 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.91 Driver Version: 517.89 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA RTX A100... On | 00000000:01:00.0 On | N/A | | N/A 38C P8 3W / N/A | 323MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 22 G /Xwayland N/A | +-----------------------------------------------------------------------------+ ``` On Ollama startup, no warning about not accessing GPU : ``` ollama-ollama-1 | 2023/11/29 00:07:32 images.go:784: total blobs: 15 ollama-ollama-1 | 2023/11/29 00:07:32 images.go:791: total unused blobs removed: 0 ollama-ollama-1 | 2023/11/29 00:07:32 routes.go:777: Listening on [::]:11434 (version 0.1.12) ``` Here is my distribution : ``` $ uname -a Linux FRLFK0635009890 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy ``` Models : ``` root@de433da63a97:/# ollama list NAME ID SIZE MODIFIED codellama:latest 8fdf8f752f6e 3.8 GB 51 minutes ago codeup:latest 54289661f7a9 7.4 GB 39 minutes ago falcon:latest 4280f7257e73 4.2 GB 34 minutes ago ``` When I have a look at the source code of `ggml-cuda.cu` : ``` for (int id = 0; id < g_device_count; ++id) { CUDA_CHECK(ggml_cuda_set_device(id)); // create cuda streams for (int is = 0; is < MAX_STREAMS; ++is) { CUDA_CHECK(cudaStreamCreateWithFlags(&g_cudaStreams[id][is], cudaStreamNonBlocking)); } // create cublas handle CUBLAS_CHECK(cublasCreate(&g_cublas_handles[id])); CUBLAS_CHECK(cublasSetMathMode(g_cublas_handles[id], CUBLAS_TF32_TENSOR_OP_MATH)); } ``` The error is raised by `CUDA_CHECK(cudaStreamCreateWithFlags(&g_cudaStreams[id][is], cudaStreamNonBlocking));` in the for loop.

GiteaMirror added the nvidia label 2026-05-03 10:03:46 -05:00

GiteaMirror closed this issue

2026-05-03 10:03:49 -05:00

GiteaMirror commented

2026-05-03 10:03:52 -05:00

@fxrobin commented on GitHub (Nov 29, 2023):

Just to be sure, I have installed another Ollama running natively without Docker on the same computer, and everything is fine. My GPU is used and no error in the log file.

Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: 2023/11/29 09:22:01 llama.go:292: 3758 MB VRAM available, loading up to 24 GPU layers
Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: 2023/11/29 09:22:01 llama.go:421: starting llama runner
Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: 2023/11/29 09:22:01 llama.go:479: waiting for llama runner to start responding
Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: ggml_init_cublas: found 1 CUDA devices:
Nov 29 09:22:01 FRLFK0635009890 ollama[5932]:   Device 0: NVIDIA RTX A1000 Laptop GPU, compute capability 8.6
Nov 29 09:22:03 FRLFK0635009890 ollama[6284]: {"timestamp":1701246123,"level":"INFO","function":"main","line":2534,"message":"build info","build":375,"commit":"9656026"}
Nov 29 09:22:03 FRLFK0635009890 ollama[6284]: {"timestamp":1701246123,"level":"INFO","function":"main","line":2537,"message":"system info","n_threads":12,"n_threads_batch":-1,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}

So my issue is with the Docker official image.

Here is how I use it :

   ollama:
    image: ollama/ollama
    environment:
      - OLLAMA_ORIGINS=*
      - OLLAMA_HOST=0.0.0.0:11434
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    ports:
      - "11434:11434"
    volumes:
      - ./ollama:/root/.ollama

@fxrobin commented on GitHub (Nov 29, 2023): Just to be sure, I have installed another Ollama running natively without Docker on the same computer, and everything is fine. My GPU is used and no error in the log file. ``` Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: 2023/11/29 09:22:01 llama.go:292: 3758 MB VRAM available, loading up to 24 GPU layers Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: 2023/11/29 09:22:01 llama.go:421: starting llama runner Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: 2023/11/29 09:22:01 llama.go:479: waiting for llama runner to start responding Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: ggml_init_cublas: found 1 CUDA devices: Nov 29 09:22:01 FRLFK0635009890 ollama[5932]: Device 0: NVIDIA RTX A1000 Laptop GPU, compute capability 8.6 Nov 29 09:22:03 FRLFK0635009890 ollama[6284]: {"timestamp":1701246123,"level":"INFO","function":"main","line":2534,"message":"build info","build":375,"commit":"9656026"} Nov 29 09:22:03 FRLFK0635009890 ollama[6284]: {"timestamp":1701246123,"level":"INFO","function":"main","line":2537,"message":"system info","n_threads":12,"n_threads_batch":-1,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "} ``` So my issue is with the Docker official image. Here is how I use it : ``` ollama: image: ollama/ollama environment: - OLLAMA_ORIGINS=* - OLLAMA_HOST=0.0.0.0:11434 deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] ports: - "11434:11434" volumes: - ./ollama:/root/.ollama ```

GiteaMirror commented

2026-05-03 10:03:54 -05:00

@fxrobin commented on GitHub (Nov 29, 2023):

Ok, I found a workaround, creating my own Dockerfile (and image) with this :

FROM nvcr.io/nvidia/cuda:11.6.1-devel-ubuntu20.04

RUN apt-get update && apt-get install -y ca-certificates curl

RUN curl https://ollama.ai/install.sh | sh


EXPOSE 11434
ENV OLLAMA_HOST 0.0.0.0
ENTRYPOINT ["/usr/local/bin/ollama"]
CMD ["serve"]

Now it's working like a charm in Docker. No errors. GPU is used.

2023/11/29 11:10:54 llama.go:292: 3641 MB VRAM available, loading up to 21 GPU layers
2023/11/29 11:10:54 llama.go:421: starting llama runner
2023/11/29 11:10:54 llama.go:479: waiting for llama runner to start responding
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA RTX A1000 Laptop GPU, compute capability 8.6
{"timestamp":1701256256,"level":"INFO","function":"main","line":2534,"message":"build info","build":375,"commit":"9656026"}
{"timestamp":1701256256,"level":"INFO","function":"main","line":2537,"message":"system info","n_threads":12,"n_threads_batch":-1,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256:6ae28029995007a3ee8d0b8556d50f3b59b831074cf19c84de87acf51fb54054 (version GGUF V2)

@fxrobin commented on GitHub (Nov 29, 2023): Ok, I found a workaround, creating my own Dockerfile (and image) with this : ``` FROM nvcr.io/nvidia/cuda:11.6.1-devel-ubuntu20.04 RUN apt-get update && apt-get install -y ca-certificates curl RUN curl https://ollama.ai/install.sh | sh EXPOSE 11434 ENV OLLAMA_HOST 0.0.0.0 ENTRYPOINT ["/usr/local/bin/ollama"] CMD ["serve"] ``` Now it's working like a charm in Docker. No errors. GPU is used. ``` 2023/11/29 11:10:54 llama.go:292: 3641 MB VRAM available, loading up to 21 GPU layers 2023/11/29 11:10:54 llama.go:421: starting llama runner 2023/11/29 11:10:54 llama.go:479: waiting for llama runner to start responding ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA RTX A1000 Laptop GPU, compute capability 8.6 {"timestamp":1701256256,"level":"INFO","function":"main","line":2534,"message":"build info","build":375,"commit":"9656026"} {"timestamp":1701256256,"level":"INFO","function":"main","line":2537,"message":"system info","n_threads":12,"n_threads_batch":-1,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "} llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256:6ae28029995007a3ee8d0b8556d50f3b59b831074cf19c84de87acf51fb54054 (version GGUF V2) ```

GiteaMirror commented

2026-05-03 10:03:55 -05:00

@djmaze commented on GitHub (Dec 14, 2023):

Same here. The problem is that the final docker image does not contain any CUDA libraries.

Changing line 17 in the Dockerfile to FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04 should fix this. (Tried this with a derived image successfully.)

@djmaze commented on GitHub (Dec 14, 2023): Same here. The problem is that the final docker image does not contain any CUDA libraries. Changing [line 17 in the Dockerfile](https://github.com/jmorganca/ollama/blob/6e16098a60ae3834cd5f547d7e26f9e800c589c7/Dockerfile#L17C6-L17C18) to `FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04` should fix this. (Tried this with a derived image successfully.)

GiteaMirror commented

2026-05-03 10:03:57 -05:00

@djmaze commented on GitHub (Dec 20, 2023):

I created a PR for this in #1644. See if it helps you as well.

@djmaze commented on GitHub (Dec 20, 2023): I created a PR for this in #1644. See if it helps you as well.

GiteaMirror commented

2026-05-03 10:03:59 -05:00

@pdevine commented on GitHub (Jan 25, 2024):

@fxrobin are you still seeing this issue in 0.1.20?

@pdevine commented on GitHub (Jan 25, 2024): @fxrobin are you still seeing this issue in 0.1.20?

GiteaMirror commented

2026-05-03 10:04:01 -05:00

@dhiltgen commented on GitHub (Mar 12, 2024):

We had a bug a while back where we were not setting the correct environment variables on our container image which resulted in the nvidia container runtime sometimes not mounting the libraries and passing through the GPU into the container as it is supposed to. This should be fixed now. If you're still facing any problems with the latest release, let us know.

@dhiltgen commented on GitHub (Mar 12, 2024): We had a bug a while back where we were not setting the correct environment variables on our container image which resulted in the nvidia container runtime sometimes not mounting the libraries and passing through the GPU into the container as it is supposed to. This should be fixed now. If you're still facing any problems with the latest release, let us know.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#62714