[GH-ISSUE #899] Big performance hit from v0.1.4 #62471

Closed
opened 2026-05-03 09:05:00 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @imikod on GitHub (Oct 24, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/899

v0.1.4 is around 3 times slower than v0.1.3
I tested 2 models with cpu only.
The models are dolphin-2.1-mistral-7b.Q3_K_M and openhermes-2-mistral-7b.Q5_K_M.
I use Debian 12 with AMD Ryzen 5 5600H.

Originally created by @imikod on GitHub (Oct 24, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/899 v0.1.4 is around 3 times slower than v0.1.3 I tested 2 models with cpu only. The models are [dolphin-2.1-mistral-7b.Q3_K_M](https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF/blob/main/dolphin-2.1-mistral-7b.Q3_K_M.gguf) and [openhermes-2-mistral-7b.Q5_K_M](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GGUF/blob/main/openhermes-2-mistral-7b.Q5_K_M.gguf). I use Debian 12 with AMD Ryzen 5 5600H.
GiteaMirror added the bug label 2026-05-03 09:05:00 -05:00
Author
Owner

@jmorganca commented on GitHub (Oct 24, 2023):

@imikod thanks for creating an issue. I've tested this on a recent Intel cpu and there's no performance difference.

To help us debug this, are you able to share the results of ollama run --verbose (verbose mode)? And then even better would be to share the logs journalctl -u ollama that way we can see what might be happening in the background

<!-- gh-comment-id:1778196214 --> @jmorganca commented on GitHub (Oct 24, 2023): @imikod thanks for creating an issue. I've tested this on a recent Intel cpu and there's no performance difference. To help us debug this, are you able to share the results of `ollama run --verbose` (verbose mode)? And then even better would be to share the logs `journalctl -u ollama` that way we can see what might be happening in the background
Author
Owner

@imikod commented on GitHub (Oct 24, 2023):

Yes I can share the results of the verbose mode now.

For the v0.1.3

total duration: 18.536665894s
load duration: 414.797µs
prompt eval count: 36 token(s)
prompt eval duration: 2.421661s
prompt eval rate: 14.87 tokens/s
eval count: 93 token(s)
eval duration: 16.044084s
eval rate: 5.80 tokens/s

While for v0.1.4

total duration: 1m22.372065006s
load duration: 1.045899ms
prompt eval count: 36 token(s)
prompt eval duration: 26.860807s
prompt eval rate: 1.34 tokens/s
eval count: 73 token(s)
eval duration: 55.477673s
eval rate: 1.32 tokens/s

I will try the journal logs tomorrow because I do not run ollama as a service.

<!-- gh-comment-id:1778225021 --> @imikod commented on GitHub (Oct 24, 2023): Yes I can share the results of the verbose mode now. For the v0.1.3 total duration: 18.536665894s load duration: 414.797µs prompt eval count: 36 token(s) prompt eval duration: 2.421661s prompt eval rate: 14.87 tokens/s eval count: 93 token(s) eval duration: 16.044084s eval rate: 5.80 tokens/s While for v0.1.4 total duration: 1m22.372065006s load duration: 1.045899ms prompt eval count: 36 token(s) prompt eval duration: 26.860807s prompt eval rate: 1.34 tokens/s eval count: 73 token(s) eval duration: 55.477673s eval rate: 1.32 tokens/s I will try the journal logs tomorrow because I do not run ollama as a service.
Author
Owner

@imikod commented on GitHub (Oct 24, 2023):

ok now I noticed the v0.1.3 has this:
{"timestamp":1698189486,"level":"INFO","function":"main","line":1296,"message":"system info","n_threads":6,"total_threads":12,"system_info":"AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}

and the v0.1.4 this:
{"timestamp":1698190231,"level":"INFO","function":"main","line":1325,"message":"system info","n_threads":6,"n_threads_batch":-1,"total_threads":12,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}

<!-- gh-comment-id:1778248515 --> @imikod commented on GitHub (Oct 24, 2023): ok now I noticed the v0.1.3 has this: `{"timestamp":1698189486,"level":"INFO","function":"main","line":1296,"message":"system info","n_threads":6,"total_threads":12,"system_info":"AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}` and the v0.1.4 this: `{"timestamp":1698190231,"level":"INFO","function":"main","line":1325,"message":"system info","n_threads":6,"n_threads_batch":-1,"total_threads":12,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}`
Author
Owner

@jmorganca commented on GitHub (Oct 26, 2023):

Great, thanks for sharing! Yes it looks like there's an issue where AVX flags aren't on in 0.1.4 and 0.1.5 – a fix is on the way in #900

<!-- gh-comment-id:1780249259 --> @jmorganca commented on GitHub (Oct 26, 2023): Great, thanks for sharing! Yes it looks like there's an issue where AVX flags aren't on in 0.1.4 and 0.1.5 – a fix is on the way in #900
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62471