[GH-ISSUE #4705] arm64 llama runner takes a long time to start compared to amd64 arch #2965

Closed
opened 2026-04-12 13:20:33 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @glenamac on GitHub (May 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4705

What is the issue?

I'm comparing two different machine/gpu cards/architectures so I realize this is not an apples to apples comparision.

On a grace hopper NVIDIA GH200 arm64 system, llama runner startup (cold start but with model pre-downloaded) takes about 200 seconds:
time=2024-05-29T10:26:54.860-04:00 level=INFO source=server.go:569 msg="llama runner started in 232.18 seconds"

What Happened

Running the same model (llama3:8b) on various amd64 based systems with older NVIDIA GPUs (A100, V100, and P100), cold start up is much faster (between 3-10 seconds, typically. again, with model pre-downloaded):
time=2024-05-29T10:06:26.124-04:00 level=INFO source=server.go:545 msg="llama runner started in 3.41 seconds"

The version for both arm64 and amd64 arch is 0.1.39 but I noticed this with version 0.1.38 also. Once runner start up completes the model runs very fast.

For what it's worth the llama3 blob is saved to a SAMSUNG MZ1L2960HCJR-00A07 nvme drive. I don't think that is the bottleneck.

Are there other reports of arm64 arch start up taking a long time?

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.1.39

Originally created by @glenamac on GitHub (May 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4705 ### What is the issue? I'm comparing two different machine/gpu cards/architectures so I realize this is not an apples to apples comparision. On a grace hopper NVIDIA GH200 arm64 system, llama runner startup (cold start but with model pre-downloaded) takes about 200 seconds: `time=2024-05-29T10:26:54.860-04:00 level=INFO source=server.go:569 msg="llama runner started in 232.18 seconds"` # What Happened Running the same model (llama3:8b) on various amd64 based systems with older NVIDIA GPUs (A100, V100, and P100), cold start up is much faster (between 3-10 seconds, typically. again, with model pre-downloaded): `time=2024-05-29T10:06:26.124-04:00 level=INFO source=server.go:545 msg="llama runner started in 3.41 seconds"` The version for both arm64 and amd64 arch is 0.1.39 but I noticed this with version 0.1.38 also. Once runner start up completes the model runs very fast. For what it's worth the llama3 blob is saved to a SAMSUNG MZ1L2960HCJR-00A07 nvme drive. I don't think that is the bottleneck. Are there other reports of arm64 arch start up taking a long time? ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.1.39
GiteaMirror added the bug label 2026-04-12 13:20:33 -05:00
Author
Owner

@glenamac commented on GitHub (Jun 13, 2024):

For anyone that might stumble upon this, the issue is not an Ollama problem. Using the 64k page size kernel for arm64 resolved the issue for me. See this article for some background.

<!-- gh-comment-id:2166933570 --> @glenamac commented on GitHub (Jun 13, 2024): For anyone that might stumble upon this, the issue is not an Ollama problem. Using the 64k page size kernel for arm64 resolved the issue for me. See [this](https://ubuntu.com/server/docs/choosing-between-the-arm64-and-arm64-largemem-installer-options) article for some background.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2965