[GH-ISSUE #10986] Ollama 0.9.0, MacOS, gemma3:latest, and vision: Metal acceleration internal error produces inconsistent results #53756

New Issue

GiteaMirror · 2026-04-29T04:40:59-05:00

GiteaMirror commented

2026-04-29 04:40:59 -05:00

Originally created by @stannenb on GitHub (Jun 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10986

What is the issue?

When using Ollama 0.9.0 on a Mac Studio M2 Max to run gemma3:latest to describe in image, server logs show an internal error, but Ollama continues processing, producing bogus results.

[GIN] 2025/06/05 - 13:03:33 | 200 |  2.125130792s |       127.0.0.1 | POST     "/api/generate"
ggml_metal_graph_compute: command buffer 1 failed with status 5
error: Internal Error (0000000e:Internal Error)

If one immediately issues a "/set parameter num_gpu 0" command, Ollama processes in the image, producing a valid result.

❯ ollama run gemma3:latest
>>> describe this image. /Users/xxxx/Downloads/IMG_2001@0.5x.png
Added image '/Users/xxxx/Downloads/IMG_2001@0.5x.png'
This is a screenshot of a text message that is saying "This is a screenshot of a text message that is saying "This is a screenshot of
a text message that is saying ".

The message is a self-referential joke!  It's a way of saying something similar.

>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
>>> describe this image. /Users/xxxx/Downloads/IMG_2001@0.5x.png
Added image '/Users/xxxx/Downloads/IMG_2001@0.5x.png'
Okay, here’s a description of the image:

The image is a close-up portrait of a middle-aged man. He has a pale, somewhat weathered complexion. His most striking features are
his thick, full, and white, slightly unkempt beard and mustache. He is wearing dark, rectangular, aviator-style glasses. He’s looking
directly at the camera with a serious, perhaps slightly skeptical, expression. The background is a blurry, out-of-focus wall,
suggesting the photo was taken indoors. He is wearing a dark, likely gray or black, shirt. The lighting is fairly neutral.

Relevant log output

ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M2 Max
ggml_metal_load_library: using embedded metal library
ggml_metal_init: GPU name:   Apple M2 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction   = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has residency sets    = true
ggml_metal_init: has bfloat            = true
ggml_metal_init: use bfloat            = false
ggml_metal_init: hasUnifiedMemory      = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 51539.61 MB
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h192          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk192_hv128   (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk576_hv512   (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h96       (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h192      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_cpy_f32_bf16                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_f32                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16                     (not supported)
time=2025-06-05T13:03:32.861-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=Metal buffer_type=Metal size="1.1 GiB"
time=2025-06-05T13:03:32.861-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=BLAS buffer_type=CPU size="0 B"
time=2025-06-05T13:03:32.861-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B"
time=2025-06-05T13:03:32.893-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=Metal buffer_type=Metal size="1.1 GiB"
time=2025-06-05T13:03:32.893-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=BLAS buffer_type=CPU size="5.0 MiB"
time=2025-06-05T13:03:32.893-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B"
time=2025-06-05T13:03:33.844-04:00 level=INFO source=server.go:630 msg="llama runner started in 2.01 seconds"
[GIN] 2025/06/05 - 13:03:33 | 200 |  2.125130792s |       127.0.0.1 | POST     "/api/generate"
ggml_metal_graph_compute: command buffer 1 failed with status 5
error: Internal Error (0000000e:Internal Error)
[GIN] 2025/06/05 - 13:03:40 | 200 |  2.147018541s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-05T13:06:10.840-04:00 level=ERROR source=server.go:457 msg="llama runner terminated" error="signal: killed"
time=2025-06-05T13:06:10.927-04:00 level=INFO source=server.go:135 msg="system memory" total="64.0 GiB" free="29.8 GiB" free_swap="0 B"
time=2025-06-05T13:06:10.928-04:00 level=INFO source=server.go:168 msg=offload library=cpu layers.requested=0 layers.model=35 layers.offload=0 layers.split="" memory.available="[29.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="4.4 GiB" memory.required.partial="0 B" memory.required.kv="225.0 MiB" memory.required.allocations="[62.8 MiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-06-05T13:06:10.928-04:00 level=WARN source=server.go:199 msg="flash attention enabled but not supported by gpu"
time=2025-06-05T13:06:10.928-04:00 level=WARN source=server.go:222 msg="quantized kv cache requested but flash attention disabled" type=q8_0
time=2025-06-05T13:06:10.970-04:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/opt/homebrew/Cellar/ollama/0.9.0/bin/ollama runner --ollama-engine --model /Users/saul/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --n-gpu-layers 0 --threads 8 --no-mmap --parallel 2 --port 50865"
time=2025-06-05T13:06:10.972-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-06-05T13:06:10.972-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
time=2025-06-05T13:06:10.972-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding"
time=2025-06-05T13:06:10.995-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-06-05T13:06:10.995-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:50865"
time=2025-06-05T13:06:11.034-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
time=2025-06-05T13:06:11.034-04:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2025-06-05T13:06:11.051-04:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CPU size="3.6 GiB"
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M2 Max
ggml_metal_load_library: using embedded metal library
ggml_metal_init: GPU name:   Apple M2 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction   = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has residency sets    = true
ggml_metal_init: has bfloat            = true
ggml_metal_init: use bfloat            = false
ggml_metal_init: hasUnifiedMemory      = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 51539.61 MB
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h192          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk192_hv128   (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk576_hv512   (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h96       (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h192      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_cpy_f32_bf16                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_f32                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16                     (not supported)
time=2025-06-05T13:06:11.181-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=Metal buffer_type=Metal size="0 B"
time=2025-06-05T13:06:11.181-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=BLAS buffer_type=CPU size="1.1 GiB"
time=2025-06-05T13:06:11.181-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B"
time=2025-06-05T13:06:11.215-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=Metal buffer_type=Metal size="0 B"
time=2025-06-05T13:06:11.215-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=BLAS buffer_type=CPU size="1.1 GiB"
time=2025-06-05T13:06:11.215-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B"
time=2025-06-05T13:06:11.224-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
time=2025-06-05T13:06:11.977-04:00 level=INFO source=server.go:630 msg="llama runner started in 1.01 seconds"
[GIN] 2025/06/05 - 13:06:44 | 200 | 34.082673125s |       127.0.0.1 | POST     "/api/chat"

OS

MacOS

GPU

M2 Max

CPU

M2 Max

Ollama version

0.9.0

Originally created by @stannenb on GitHub (Jun 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10986 ### What is the issue? When using Ollama 0.9.0 on a Mac Studio M2 Max to run gemma3:latest to describe in image, server logs show an internal error, but Ollama continues processing, producing bogus results. ``` [GIN] 2025/06/05 - 13:03:33 | 200 | 2.125130792s | 127.0.0.1 | POST "/api/generate" ggml_metal_graph_compute: command buffer 1 failed with status 5 error: Internal Error (0000000e:Internal Error) ``` If one immediately issues a "/set parameter num_gpu 0" command, Ollama processes in the image, producing a valid result. ``` ❯ ollama run gemma3:latest >>> describe this image. /Users/xxxx/Downloads/IMG_2001@0.5x.png Added image '/Users/xxxx/Downloads/IMG_2001@0.5x.png' This is a screenshot of a text message that is saying "This is a screenshot of a text message that is saying "This is a screenshot of a text message that is saying ". The message is a self-referential joke! It's a way of saying something similar. >>> /set parameter num_gpu 0 Set parameter 'num_gpu' to '0' >>> describe this image. /Users/xxxx/Downloads/IMG_2001@0.5x.png Added image '/Users/xxxx/Downloads/IMG_2001@0.5x.png' Okay, here’s a description of the image: The image is a close-up portrait of a middle-aged man. He has a pale, somewhat weathered complexion. His most striking features are his thick, full, and white, slightly unkempt beard and mustache. He is wearing dark, rectangular, aviator-style glasses. He’s looking directly at the camera with a serious, perhaps slightly skeptical, expression. The background is a blurry, out-of-focus wall, suggesting the photo was taken indoors. He is wearing a dark, likely gray or black, shirt. The lighting is fairly neutral. ``` ### Relevant log output ```shell ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M2 Max ggml_metal_load_library: using embedded metal library ggml_metal_init: GPU name: Apple M2 Max ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_init: simdgroup reduction = true ggml_metal_init: simdgroup matrix mul. = true ggml_metal_init: has residency sets = true ggml_metal_init: has bfloat = true ggml_metal_init: use bfloat = false ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 51539.61 MB ggml_metal_init: skipping kernel_get_rows_bf16 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_bf16 (not supported) ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mm_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h192 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk192_hv128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk576_hv512 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h96 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h192 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk192_hv128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk576_hv512 (not supported) ggml_metal_init: skipping kernel_cpy_f32_bf16 (not supported) ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported) ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported) time=2025-06-05T13:03:32.861-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=Metal buffer_type=Metal size="1.1 GiB" time=2025-06-05T13:03:32.861-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=BLAS buffer_type=CPU size="0 B" time=2025-06-05T13:03:32.861-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B" time=2025-06-05T13:03:32.893-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=Metal buffer_type=Metal size="1.1 GiB" time=2025-06-05T13:03:32.893-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=BLAS buffer_type=CPU size="5.0 MiB" time=2025-06-05T13:03:32.893-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B" time=2025-06-05T13:03:33.844-04:00 level=INFO source=server.go:630 msg="llama runner started in 2.01 seconds" [GIN] 2025/06/05 - 13:03:33 | 200 | 2.125130792s | 127.0.0.1 | POST "/api/generate" ggml_metal_graph_compute: command buffer 1 failed with status 5 error: Internal Error (0000000e:Internal Error) [GIN] 2025/06/05 - 13:03:40 | 200 | 2.147018541s | 127.0.0.1 | POST "/api/chat" time=2025-06-05T13:06:10.840-04:00 level=ERROR source=server.go:457 msg="llama runner terminated" error="signal: killed" time=2025-06-05T13:06:10.927-04:00 level=INFO source=server.go:135 msg="system memory" total="64.0 GiB" free="29.8 GiB" free_swap="0 B" time=2025-06-05T13:06:10.928-04:00 level=INFO source=server.go:168 msg=offload library=cpu layers.requested=0 layers.model=35 layers.offload=0 layers.split="" memory.available="[29.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="4.4 GiB" memory.required.partial="0 B" memory.required.kv="225.0 MiB" memory.required.allocations="[62.8 MiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-06-05T13:06:10.928-04:00 level=WARN source=server.go:199 msg="flash attention enabled but not supported by gpu" time=2025-06-05T13:06:10.928-04:00 level=WARN source=server.go:222 msg="quantized kv cache requested but flash attention disabled" type=q8_0 time=2025-06-05T13:06:10.970-04:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/opt/homebrew/Cellar/ollama/0.9.0/bin/ollama runner --ollama-engine --model /Users/saul/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --n-gpu-layers 0 --threads 8 --no-mmap --parallel 2 --port 50865" time=2025-06-05T13:06:10.972-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 time=2025-06-05T13:06:10.972-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" time=2025-06-05T13:06:10.972-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding" time=2025-06-05T13:06:10.995-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" time=2025-06-05T13:06:10.995-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:50865" time=2025-06-05T13:06:11.034-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 time=2025-06-05T13:06:11.034-04:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2025-06-05T13:06:11.051-04:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CPU size="3.6 GiB" ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M2 Max ggml_metal_load_library: using embedded metal library ggml_metal_init: GPU name: Apple M2 Max ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_init: simdgroup reduction = true ggml_metal_init: simdgroup matrix mul. = true ggml_metal_init: has residency sets = true ggml_metal_init: has bfloat = true ggml_metal_init: use bfloat = false ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 51539.61 MB ggml_metal_init: skipping kernel_get_rows_bf16 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_bf16 (not supported) ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mm_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h192 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk192_hv128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk576_hv512 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h96 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h192 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk192_hv128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk576_hv512 (not supported) ggml_metal_init: skipping kernel_cpy_f32_bf16 (not supported) ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported) ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported) time=2025-06-05T13:06:11.181-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=Metal buffer_type=Metal size="0 B" time=2025-06-05T13:06:11.181-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=BLAS buffer_type=CPU size="1.1 GiB" time=2025-06-05T13:06:11.181-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B" time=2025-06-05T13:06:11.215-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=Metal buffer_type=Metal size="0 B" time=2025-06-05T13:06:11.215-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=BLAS buffer_type=CPU size="1.1 GiB" time=2025-06-05T13:06:11.215-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B" time=2025-06-05T13:06:11.224-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" time=2025-06-05T13:06:11.977-04:00 level=INFO source=server.go:630 msg="llama runner started in 1.01 seconds" [GIN] 2025/06/05 - 13:06:44 | 200 | 34.082673125s | 127.0.0.1 | POST "/api/chat" ``` ### OS MacOS ### GPU M2 Max ### CPU M2 Max ### Ollama version 0.9.0

GiteaMirror added the bug label 2026-04-29 04:40:59 -05:00

GiteaMirror commented

2026-04-29 04:41:01 -05:00

@cwallen commented on GitHub (Jun 6, 2025):

I'm seeing similar issues:

ggml_metal_graph_compute: command buffer 1 failed with status 5
error: Internal Error (0000000e:Internal Error)
panic: failed to sample token: sample: logits sum to NaN, check model output

goroutine 11 [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x1400056b560, {0x10152ed50, 0x14000530640})
	/Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:364 +0x70
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	/Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:960 +0x898
time=2025-06-06T12:20:21.615-04:00 level=ERROR source=server.go:457 msg="llama runner terminated" error="exit status 2"
[GIN] 2025/06/06 - 12:20:21 | 500 | 45.959320292s |       127.0.0.1 | POST     "/api/generate"

Also get the first 2 lines on their own, sometimes with buffer 0 instead of 1.
Seeing the same errors on qwen2.5vl as well as gemma3
I'm also on Apple M2 Max

@cwallen commented on GitHub (Jun 6, 2025): I'm seeing similar issues: ``` ggml_metal_graph_compute: command buffer 1 failed with status 5 error: Internal Error (0000000e:Internal Error) panic: failed to sample token: sample: logits sum to NaN, check model output goroutine 11 [running]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x1400056b560, {0x10152ed50, 0x14000530640}) /Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:364 +0x70 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 /Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:960 +0x898 time=2025-06-06T12:20:21.615-04:00 level=ERROR source=server.go:457 msg="llama runner terminated" error="exit status 2" [GIN] 2025/06/06 - 12:20:21 | 500 | 45.959320292s | 127.0.0.1 | POST "/api/generate" ``` Also get the first 2 lines on their own, sometimes with buffer 0 instead of 1. Seeing the same errors on qwen2.5vl as well as gemma3 I'm also on Apple M2 Max

GiteaMirror commented

2026-04-29 04:41:02 -05:00

@yarmoliq commented on GitHub (Jun 8, 2025):

I also can't get anything from any vision model. It hallucinates random stuff. I'm also on M2 Max, ollama v0.9.0

@yarmoliq commented on GitHub (Jun 8, 2025): I also can't get anything from any vision model. It hallucinates random stuff. I'm also on M2 Max, ollama v0.9.0

GiteaMirror commented

2026-04-29 04:41:03 -05:00

@cwallen commented on GitHub (Jun 8, 2025):

Bit of non-scientific experimentation this weekend:
The errors in the log that I was seeing for qwen2.5vl:7b ~~go away entirely with qwen2.5vl:7b-fp16~~ (thought they had but actually still seeing them.)
gemma3:4b-it-fp16 might have a lower error rate than gemma3:4b, but hard to tell, definitely not zero.

@yarmoliq Are you using CLI, API or something else?
If you are getting garbage for every image on every model, sounds like that might be a higher level problem than what I'm seeing. Even the error prone models for me work most of the time.
When I was first playing around with scripting against the API, I was getting garbage on everything, problem was my base64 encoded jpgs were getting read as pngs. Garbage in, garbage out.

@cwallen commented on GitHub (Jun 8, 2025): Bit of non-scientific experimentation this weekend: The errors in the log that I was seeing for qwen2.5vl:7b ~go away entirely with qwen2.5vl:7b-fp16~ (thought they had but actually still seeing them.) gemma3:4b-it-fp16 might have a lower error rate than gemma3:4b, but hard to tell, definitely not zero. @yarmoliq Are you using CLI, API or something else? If you are getting garbage for every image on every model, sounds like that might be a higher level problem than what I'm seeing. Even the error prone models for me work most of the time. When I was first playing around with scripting against the API, I was getting garbage on everything, problem was my base64 encoded jpgs were getting read as pngs. Garbage in, garbage out.

GiteaMirror commented

2026-04-29 04:41:04 -05:00

@yarmoliq commented on GitHub (Jun 8, 2025):

First I stumbled upon this issue using local API. At first I thought that small vision models are just garbage (I fed an image of a receipt to gemma3:27b asking what it sees, and got responses like "a cat" or "a lion head"). But then I understood that they are not garbage, and that something weird was going on, and decided to try chatting in CLI (ollama run), but still got garbage results. And this is how I got here.

Using /set parameter num_gpu 0 seems to be helping.

Out of like 30+ tries I managed to randomly receive 1 or 2 actually good results (1 from gemma, 1 from llama), but all the other tries just straight up 100% hallucination (all the tries were with the same image)

@yarmoliq commented on GitHub (Jun 8, 2025): First I stumbled upon this issue using local API. At first I thought that small vision models are just garbage (I fed an image of a receipt to gemma3:27b asking what it sees, and got responses like "a cat" or "a lion head"). But then I understood that they are not garbage, and that something weird was going on, and decided to try chatting in CLI (ollama run), but still got garbage results. And this is how I got here. Using `/set parameter num_gpu 0` seems to be helping. Out of like 30+ tries I managed to randomly receive 1 or 2 actually good results (1 from gemma, 1 from llama), but all the other tries just straight up 100% hallucination (all the tries were with the same image)

GiteaMirror commented

2026-04-29 04:41:05 -05:00

@yarmoliq commented on GitHub (Jun 8, 2025):

I was also thinking that maybe the models do some weird cropping that messes everything up. Tried resizing images my self, but that didn't yield anything.

@yarmoliq commented on GitHub (Jun 8, 2025): I was also thinking that maybe the models do some weird cropping that messes everything up. Tried resizing images my self, but that didn't yield anything.

GiteaMirror commented

2026-04-29 04:41:07 -05:00

@stannenb commented on GitHub (Jun 10, 2025):

I think there are two issues here:

Metal acceleration for (some) vision models on M2 Max cpus is broken. You can work around this by disabling gpu processing with "/set parameter num_gpu 0".
Ollama doesn't notice that metal-accelerated computation is failing and pulls a response out of, well, somewhere. The response has nothing to do with the image or the prompt.

How to debug this any further is beyond my current skill set.

@stannenb commented on GitHub (Jun 10, 2025): I think there are two issues here: 1. Metal acceleration for (some) vision models on M2 Max cpus is broken. You can work around this by disabling gpu processing with "/set parameter num_gpu 0". 2. Ollama doesn't notice that metal-accelerated computation is failing and pulls a response out of, well, somewhere. The response has nothing to do with the image or the prompt. How to debug this any further is beyond my current skill set.

GiteaMirror commented

2026-04-29 04:41:07 -05:00

@smileyboy2019 commented on GitHub (Jun 10, 2025):

图片理解，根本就不对。

@smileyboy2019 commented on GitHub (Jun 10, 2025): 图片理解，根本就不对。

GiteaMirror commented

2026-04-29 04:41:08 -05:00

@cwallen commented on GitHub (Jun 10, 2025):

I've seen one other symptom that I'm not sure is related but occurs to me that I'm just seeing on the same models. Most prompts return in a pretty reasonable amount of time, but occasionally it seems to just hang until the fetch request times out at 5min.

Some extra logging from qwen2.5vl:7b-fp16 with debug:

time=2025-06-10T00:54:13.764-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[0]
time=2025-06-10T00:54:13.857-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[0]
time=2025-06-10T00:54:13.857-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=1352 used=0 remaining=1352
time=2025-06-10T00:54:13.873-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[0]
ggml_metal_graph_compute: command buffer 0 failed with status 5
error: Internal Error (0000000e:Internal Error)
time=2025-06-10T00:54:30.838-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=1 cache=0 prompt=1342 used=0 remaining=1342
ggml_metal_graph_compute: command buffer 1 failed with status 5
error: Internal Error (0000000e:Internal Error)
panic: failed to sample token: sample: logits sum to NaN, check model output

goroutine 39 [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x14000493560, {0x10605ed50, 0x140004cd7c0})
	/Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:364 +0x70
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	/Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:960 +0x898
time=2025-06-10T00:54:37.735-04:00 level=DEBUG source=server.go:1023 msg="stopping llama server" pid=93162

One that didn't panic:

time=2025-06-10T01:13:08.774-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[0]
time=2025-06-10T01:13:08.774-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1401 prompt=1349 used=80 remaining=1269
time=2025-06-10T01:13:08.778-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[0]
time=2025-06-10T01:13:30.400-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=1 cache=1394 prompt=1341 used=80 remaining=1261
ggml_metal_graph_compute: command buffer 0 failed with status 5
error: Internal Error (0000000e:Internal Error)
[GIN] 2025/06/10 - 01:13:45 | 200 | 37.164364417s |       127.0.0.1 | POST     "/api/chat"

Timeout:

time=2025-06-10T01:18:01.186-04:00 level=DEBUG source=cache.go:272 msg="context limit hit - shifting" id=1 limit=4096 input=4096 keep=4 discard=2046
[GIN] 2025/06/10 - 01:18:09 | 500 |          5m0s |       127.0.0.1 | POST     "/api/chat"

Happy to pull more logs if it helps, or figure out how to run a dev build to test.

@cwallen commented on GitHub (Jun 10, 2025): I've seen one other symptom that I'm not sure is related but occurs to me that I'm just seeing on the same models. Most prompts return in a pretty reasonable amount of time, but occasionally it seems to just hang until the fetch request times out at 5min. Some extra logging from qwen2.5vl:7b-fp16 with debug: ``` time=2025-06-10T00:54:13.764-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[0] time=2025-06-10T00:54:13.857-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[0] time=2025-06-10T00:54:13.857-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=1352 used=0 remaining=1352 time=2025-06-10T00:54:13.873-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[0] ggml_metal_graph_compute: command buffer 0 failed with status 5 error: Internal Error (0000000e:Internal Error) time=2025-06-10T00:54:30.838-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=1 cache=0 prompt=1342 used=0 remaining=1342 ggml_metal_graph_compute: command buffer 1 failed with status 5 error: Internal Error (0000000e:Internal Error) panic: failed to sample token: sample: logits sum to NaN, check model output goroutine 39 [running]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x14000493560, {0x10605ed50, 0x140004cd7c0}) /Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:364 +0x70 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 /Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:960 +0x898 time=2025-06-10T00:54:37.735-04:00 level=DEBUG source=server.go:1023 msg="stopping llama server" pid=93162 ``` One that didn't panic: ``` time=2025-06-10T01:13:08.774-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[0] time=2025-06-10T01:13:08.774-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1401 prompt=1349 used=80 remaining=1269 time=2025-06-10T01:13:08.778-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[0] time=2025-06-10T01:13:30.400-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=1 cache=1394 prompt=1341 used=80 remaining=1261 ggml_metal_graph_compute: command buffer 0 failed with status 5 error: Internal Error (0000000e:Internal Error) [GIN] 2025/06/10 - 01:13:45 | 200 | 37.164364417s | 127.0.0.1 | POST "/api/chat" ``` Timeout: ``` time=2025-06-10T01:18:01.186-04:00 level=DEBUG source=cache.go:272 msg="context limit hit - shifting" id=1 limit=4096 input=4096 keep=4 discard=2046 [GIN] 2025/06/10 - 01:18:09 | 500 | 5m0s | 127.0.0.1 | POST "/api/chat" ``` Happy to pull more logs if it helps, or figure out how to run a dev build to test.

GiteaMirror commented

2026-04-29 04:41:09 -05:00

@yarmoliq commented on GitHub (Aug 25, 2025):

any news?

@yarmoliq commented on GitHub (Aug 25, 2025): any news?

GiteaMirror commented

2026-04-29 04:41:12 -05:00

@cwallen commented on GitHub (Aug 26, 2025):

@yarmoliq At least for me, the PR from #11070 makes it so that it's no longer giving garbage responses, however it just throws an error and crashes instead, so in my scripts I just catch the error and retry and usually that works. Would be nice to have a full fix.

@cwallen commented on GitHub (Aug 26, 2025): @yarmoliq At least for me, the PR from #11070 makes it so that it's no longer giving garbage responses, however it just throws an error and crashes instead, so in my scripts I just catch the error and retry and usually that works. Would be nice to have a full fix.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#53756