[GH-ISSUE #10630] Update llama.cpp to include new optimizations for MoE models on Apple Silicon #6993

Closed
opened 2026-04-12 18:53:14 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @filipwiech on GitHub (May 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10630

Very recently llama.cpp project has merged the following PR:
https://github.com/ggml-org/llama.cpp/pull/13388
It seems to significantly increase the prompt processing speed on Apple Silicon (Metal backend) for Mixture-of-Experts models, such as Qwen 3 30B-A3B. Would be great to have it integrated into the next Ollama version if possible. 👍

CC @jmorganca. 😉

Originally created by @filipwiech on GitHub (May 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10630 Very recently llama.cpp project has merged the following PR: https://github.com/ggml-org/llama.cpp/pull/13388 It seems to significantly increase the prompt processing speed on Apple Silicon (Metal backend) for Mixture-of-Experts models, such as Qwen 3 30B-A3B. Would be great to have it integrated into the next Ollama version if possible. 👍 CC @jmorganca. 😉
GiteaMirror added the feature request label 2026-04-12 18:53:14 -05:00
Author
Owner

@filipwiech commented on GitHub (May 13, 2025):

This should be now taken care of by the PR #10655, thanks @jmorganca! 🙂

<!-- gh-comment-id:2876213450 --> @filipwiech commented on GitHub (May 13, 2025): This should be now taken care of by the PR #10655, thanks @jmorganca! 🙂
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6993