[GH-ISSUE #13860] Serious performance regression on ARM64 systems with 0.14.x ollama version #55586

Closed
opened 2026-04-29 09:27:13 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @rjtokenring on GitHub (Jan 23, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13860

What is the issue?

Hi, I'm using ollama on ARM64 . So in CPU mode.
I noticed a serious performance drop in upgrading to version 0.14.3.
Throughput dropped by 10x.

Using ollama 0.12.1 and 0.14.3 - ARM64 container version.

Host: Ubuntu 24.04

Relevant log output

With
docker run -it -v /home/ubuntu/models/:/root/.ollama -p 11434:11434 ollama/ollama:0.14.3

---------------------------------
>>> Hi!
Hi there! How’s your day going? 😊

What can I help you with today?

total duration:       32.759942325s
load duration:        541.75404ms
prompt eval count:    11 token(s)
prompt eval duration: 8.373959573s
prompt eval rate:     1.31 tokens/s
eval count:           22 token(s)
eval duration:        23.722800925s
eval rate:            0.93 tokens/s



With
docker run -it -v /home/ubuntu/models/:/root/.ollama -p 11434:11434 ollama/ollama:0.12.1

---------------------------------

>>> Hi!
Hi there! How's your day going so far? 😊

Is there anything you'd like to chat about or any help I can offer?

total duration:       4.066736884s
load duration:        431.213999ms
prompt eval count:    11 token(s)
prompt eval duration: 320.176938ms
prompt eval rate:     34.36 tokens/s
eval count:           33 token(s)
eval duration:        3.165976348s
eval rate:            10.42 tokens/s

OS

Docker

GPU

No response

CPU

Other

Ollama version

0.14.3

Originally created by @rjtokenring on GitHub (Jan 23, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13860 ### What is the issue? Hi, I'm using ollama on ARM64 . So in CPU mode. I noticed a serious performance drop in upgrading to version 0.14.3. Throughput dropped by 10x. Using ollama 0.12.1 and 0.14.3 - ARM64 container version. Host: Ubuntu 24.04 ### Relevant log output ```shell With docker run -it -v /home/ubuntu/models/:/root/.ollama -p 11434:11434 ollama/ollama:0.14.3 --------------------------------- >>> Hi! Hi there! How’s your day going? 😊 What can I help you with today? total duration: 32.759942325s load duration: 541.75404ms prompt eval count: 11 token(s) prompt eval duration: 8.373959573s prompt eval rate: 1.31 tokens/s eval count: 22 token(s) eval duration: 23.722800925s eval rate: 0.93 tokens/s With docker run -it -v /home/ubuntu/models/:/root/.ollama -p 11434:11434 ollama/ollama:0.12.1 --------------------------------- >>> Hi! Hi there! How's your day going so far? 😊 Is there anything you'd like to chat about or any help I can offer? total duration: 4.066736884s load duration: 431.213999ms prompt eval count: 11 token(s) prompt eval duration: 320.176938ms prompt eval rate: 34.36 tokens/s eval count: 33 token(s) eval duration: 3.165976348s eval rate: 10.42 tokens/s ``` ### OS Docker ### GPU _No response_ ### CPU Other ### Ollama version 0.14.3
GiteaMirror added the bug label 2026-04-29 09:27:13 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 23, 2026):

Confirm performance issue with 0.14.2->0.14.3, continues in 0.15.0-rc1.

0.14.2     prompt: 369.71 eval: 20.94 prompt: 389.94 eval: 22.09 prompt: 639.82 eval: 22.72 prompt: 704.21 eval: 24.85 prompt: 526.89 eval: 20.96
0.14.3     prompt: 19.56 eval: 7.12 prompt: 209.67 eval: 7.20 prompt: 330.19 eval: 7.30 prompt: 235.31 eval: 7.31 prompt: 330.94 eval: 7.22
0.15.0-rc1 prompt: 19.92 eval: 7.48 prompt: 238.43 eval: 7.32 prompt: 330.77 eval: 7.26 prompt: 330.56 eval: 7.37 prompt: 229.11 eval: 7.20
<!-- gh-comment-id:3790811653 --> @rick-github commented on GitHub (Jan 23, 2026): Confirm performance issue with 0.14.2->0.14.3, continues in 0.15.0-rc1. ``` 0.14.2 prompt: 369.71 eval: 20.94 prompt: 389.94 eval: 22.09 prompt: 639.82 eval: 22.72 prompt: 704.21 eval: 24.85 prompt: 526.89 eval: 20.96 0.14.3 prompt: 19.56 eval: 7.12 prompt: 209.67 eval: 7.20 prompt: 330.19 eval: 7.30 prompt: 235.31 eval: 7.31 prompt: 330.94 eval: 7.22 0.15.0-rc1 prompt: 19.92 eval: 7.48 prompt: 238.43 eval: 7.32 prompt: 330.77 eval: 7.26 prompt: 330.56 eval: 7.37 prompt: 229.11 eval: 7.20 ```
Author
Owner

@mchiang0610 commented on GitHub (Jan 24, 2026):

Thanks @rjtokenring filing this and @rick-github for reproducing. May I ask what hardware you are using? We should add something similar to our hardware testing cluster.

<!-- gh-comment-id:3793749273 --> @mchiang0610 commented on GitHub (Jan 24, 2026): Thanks @rjtokenring filing this and @rick-github for reproducing. May I ask what hardware you are using? We should add something similar to our hardware testing cluster.
Author
Owner

@rjtokenring commented on GitHub (Jan 24, 2026):

Reproduced on Qualcomm 6490, 8275 and 2210.
Recompiling main on debian trixie seems to fix the problem.

<!-- gh-comment-id:3793996912 --> @rjtokenring commented on GitHub (Jan 24, 2026): Reproduced on Qualcomm 6490, 8275 and 2210. Recompiling main on debian trixie seems to fix the problem.
Author
Owner

@crumble0815 commented on GitHub (Jan 24, 2026):

Reproduced on Raspberry Pi 5 16GB.
Trixie with stable and experimental kernel 6.18.6-v8-16k+
100% CPU with model phi4

Windows on Arm (SQ2) feels only a little slower than 13.x

I assume that linux version does not use NEON

added stderr from ollama serve
performanceIssuePi5.log

<!-- gh-comment-id:3794773204 --> @crumble0815 commented on GitHub (Jan 24, 2026): Reproduced on Raspberry Pi 5 16GB. Trixie with stable and experimental kernel 6.18.6-v8-16k+ 100% CPU with model phi4 Windows on Arm (SQ2) feels only a little slower than 13.x I assume that linux version does not use NEON added stderr from ollama serve [performanceIssuePi5.log](https://github.com/user-attachments/files/24837781/performanceIssuePi5.log)
Author
Owner

@rick-github commented on GitHub (Jan 25, 2026):

0.15.0-rc1 prompt: 19.79 eval: 7.02 prompt: 333.01 eval: 7.16 prompt: 292.77 eval: 7.17 prompt: 333.07 eval: 7.10 prompt: 330.62 eval: 7.49
0.15.0     prompt: 19.80 eval: 7.30 prompt: 332.39 eval: 7.24 prompt: 233.94 eval: 7.72 prompt: 286.12 eval: 7.40 prompt: 331.43 eval: 6.87
0.15.1     prompt: 561.51 eval: 23.02 prompt: 5239.68 eval: 20.08 prompt: 596.83 eval: 22.91 prompt: 432.20 eval: 25.98 prompt: 5186.53 eval: 27.94
<!-- gh-comment-id:3797086443 --> @rick-github commented on GitHub (Jan 25, 2026): ``` 0.15.0-rc1 prompt: 19.79 eval: 7.02 prompt: 333.01 eval: 7.16 prompt: 292.77 eval: 7.17 prompt: 333.07 eval: 7.10 prompt: 330.62 eval: 7.49 0.15.0 prompt: 19.80 eval: 7.30 prompt: 332.39 eval: 7.24 prompt: 233.94 eval: 7.72 prompt: 286.12 eval: 7.40 prompt: 331.43 eval: 6.87 0.15.1 prompt: 561.51 eval: 23.02 prompt: 5239.68 eval: 20.08 prompt: 596.83 eval: 22.91 prompt: 432.20 eval: 25.98 prompt: 5186.53 eval: 27.94 ```
Author
Owner

@rjtokenring commented on GitHub (Jan 26, 2026):

@mchiang0610 @rick-github tested 0.15.1 and I can confirm that regression has been fixed!

Thanks!

<!-- gh-comment-id:3800739024 --> @rjtokenring commented on GitHub (Jan 26, 2026): @mchiang0610 @rick-github tested 0.15.1 and I can confirm that regression has been fixed! Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55586