[GH-ISSUE #12886] ollama 0.12.[78] not using AVX in non-GPU situations #70599

Closed
opened 2026-05-04 22:12:37 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @rick-github on GitHub (Oct 31, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12886

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

$ docker run -p 11111:11434 -v ~/.ollama:/root/.ollama ollama/ollama:0.12.6 2>&1 | grep cgo & sleep 2; OLLAMA_HOST=:11111 ollama run qwen2.5:0.5b
⠸ time=2025-10-31T15:24:39.908Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
$ docker run -p 11111:11434 -v ~/.ollama:/root/.ollama ollama/ollama:0.12.7 2>&1 | grep cgo & sleep 2; OLLAMA_HOST=:11111 ollama run qwen2.5:0.5b ''
⠹ time=2025-10-31T15:26:24.078Z level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
$ docker run -p 11111:11434 -v ~/.ollama:/root/.ollama ollama/ollama:0.12.8 2>&1 | grep cgo & sleep 2; OLLAMA_HOST=:11111 ollama run qwen2.5:0.5b ''
⠹ time=2025-10-31T15:26:57.214Z level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
$ docker run -p 11111:11434 --gpus all -v ~/.ollama:/root/.ollama ollama/ollama:0.12.8 2>&1 | grep cgo & sleep 2; OLLAMA_HOST=:11111 ollama run qwen2.5:0.5b ''
⠼ time=2025-10-31T15:44:30.705Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)

git bisect shows the behaviour changed with 3258a89b6e

$ git bisect good
3258a89b6e4c2030ca47dfe51483e768cbd38b33 is the first bad commit
commit 3258a89b6e4c2030ca47dfe51483e768cbd38b33
Author: Daniel Hiltgen <dhiltgen@users.noreply.github.com>
Date:   Thu Oct 23 11:20:02 2025 -0700

    DRY out the runner lifecycle code (#12540)

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @rick-github on GitHub (Oct 31, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12886 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? ```console $ docker run -p 11111:11434 -v ~/.ollama:/root/.ollama ollama/ollama:0.12.6 2>&1 | grep cgo & sleep 2; OLLAMA_HOST=:11111 ollama run qwen2.5:0.5b ⠸ time=2025-10-31T15:24:39.908Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ``` ```console $ docker run -p 11111:11434 -v ~/.ollama:/root/.ollama ollama/ollama:0.12.7 2>&1 | grep cgo & sleep 2; OLLAMA_HOST=:11111 ollama run qwen2.5:0.5b '' ⠹ time=2025-10-31T15:26:24.078Z level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) ``` ```console $ docker run -p 11111:11434 -v ~/.ollama:/root/.ollama ollama/ollama:0.12.8 2>&1 | grep cgo & sleep 2; OLLAMA_HOST=:11111 ollama run qwen2.5:0.5b '' ⠹ time=2025-10-31T15:26:57.214Z level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) ``` ```console $ docker run -p 11111:11434 --gpus all -v ~/.ollama:/root/.ollama ollama/ollama:0.12.8 2>&1 | grep cgo & sleep 2; OLLAMA_HOST=:11111 ollama run qwen2.5:0.5b '' ⠼ time=2025-10-31T15:44:30.705Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) ``` `git bisect` shows the behaviour changed with https://github.com/ollama/ollama/commit/3258a89b6e4c2030ca47dfe51483e768cbd38b33 ```console $ git bisect good 3258a89b6e4c2030ca47dfe51483e768cbd38b33 is the first bad commit commit 3258a89b6e4c2030ca47dfe51483e768cbd38b33 Author: Daniel Hiltgen <dhiltgen@users.noreply.github.com> Date: Thu Oct 23 11:20:02 2025 -0700 DRY out the runner lifecycle code (#12540) ``` ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-05-04 22:12:37 -05:00
Author
Owner

@maternion commented on GitHub (Oct 31, 2025):

So it is an issue from 0.12.7 onwards. Good to know, I reverted back for now, so no qwen3vl.

<!-- gh-comment-id:3473832943 --> @maternion commented on GitHub (Oct 31, 2025): So it is an issue from 0.12.7 onwards. Good to know, I reverted back for now, so no qwen3vl.
Author
Owner

@dhiltgen commented on GitHub (Oct 31, 2025):

Fix will be in 0.12.9

<!-- gh-comment-id:3474975178 --> @dhiltgen commented on GitHub (Oct 31, 2025): Fix will be in 0.12.9
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70599