[GH-ISSUE #10190] llama vs ollama #6685

Closed
opened 2026-04-12 18:24:50 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @nigelzzz on GitHub (Apr 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10190

What is the issue?

Hi all,
recently, I tested llama.cpp (llama_cli) and ollama with llama3.2 q8 model, about prefill has some concern,

llama.cpp -> prefill : 2.5token/second
ollama -> prefill: 27 token/second

the ollama is faster than llama.cpp example, can i know the ollama have any improvement about prefill phase.
or is there any pull request can be reference?

Relevant log output


OS

Linux

GPU

No response

CPU

Other

Ollama version

No response

Originally created by @nigelzzz on GitHub (Apr 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10190 ### What is the issue? Hi all, recently, I tested llama.cpp (llama_cli) and ollama with llama3.2 q8 model, about prefill has some concern, llama.cpp -> prefill : 2.5token/second ollama -> prefill: 27 token/second the ollama is faster than llama.cpp example, can i know the ollama have any improvement about prefill phase. or is there any pull request can be reference? ### Relevant log output ```shell ``` ### OS Linux ### GPU _No response_ ### CPU Other ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 18:24:50 -05:00
Author
Owner

@igorschlum commented on GitHub (Apr 9, 2025):

Hi @nigelzzz

Thanks for sharing your findings — it’s always interesting to see performance comparisons across different tools like llama.cpp and Ollama.

That said, this doesn’t seem to describe a bug or issue specific to Ollama itself. From what I understand, it’s more of a general question about performance optimizations, which might be better suited for a conversation on Discord, where there’s more room for discussion and exchange with the community.

Just a suggestion: since the GitHub issue tracker is typically used for actionable bug reports or feature requests (especially with so many open issues here), you might consider closing this one if it doesn’t point to a specific problem.

<!-- gh-comment-id:2788285183 --> @igorschlum commented on GitHub (Apr 9, 2025): Hi @nigelzzz Thanks for sharing your findings — it’s always interesting to see performance comparisons across different tools like llama.cpp and Ollama. That said, this doesn’t seem to describe a bug or issue specific to Ollama itself. From what I understand, it’s more of a general question about performance optimizations, which might be better suited for a conversation on Discord, where there’s more room for discussion and exchange with the community. Just a suggestion: since the GitHub issue tracker is typically used for actionable bug reports or feature requests (especially with so many open issues here), you might consider closing this one if it doesn’t point to a specific problem.
Author
Owner

@igorschlum commented on GitHub (Apr 9, 2025):

@nigelzzz thank you!

<!-- gh-comment-id:2791181559 --> @igorschlum commented on GitHub (Apr 9, 2025): @nigelzzz thank you!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6685