[GH-ISSUE #1087] System Performance Benchmarking #26298

Open
opened 2026-04-22 02:28:17 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @K1ngjulien on GitHub (Nov 11, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1087

Hi!

In threads like #738, I see a lot of people trying different hardware and software setups, followed by checking the logs for the llama_print_timings output to see performance results.

From my (admittedly short) time playing around with my own hardware, I've noticed a lot of inconsistency between runs, making it difficult to evaluate changes.

I would suggest an enhancement like an ollama bench <model> command, which would set up a suite of example prompts, which would be sequentially or randomly sent to the LLM and the data recorded.

This way, we can all have a consistent way of comparing benchmark runs, which would also be excellent for development.

Introspecting a running session and just keeping a performance log, separate from stdout, would also be excellent.

Is there a way to do this already, maybe through llama.cpp?
I would be happy to try and implement this with some help 👍

Cheers,
Julian

Originally created by @K1ngjulien on GitHub (Nov 11, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1087 Hi! In threads like #738, I see a lot of people trying different hardware and software setups, followed by checking the logs for the `llama_print_timings` output to see performance results. From my (admittedly short) time playing around with my own hardware, I've noticed a lot of inconsistency between runs, making it difficult to evaluate changes. I would suggest an enhancement like an `ollama bench <model>` command, which would set up a suite of example prompts, which would be sequentially or randomly sent to the LLM and the data recorded. This way, we can all have a consistent way of comparing benchmark runs, which would also be excellent for development. Introspecting a running session and just keeping a performance log, separate from stdout, would also be excellent. Is there a way to do this already, maybe through `llama.cpp`? I would be happy to try and implement this with some help 👍 Cheers, Julian
GiteaMirror added the documentationfeature request labels 2026-04-22 02:28:17 -05:00
Author
Owner

@Necmttn commented on GitHub (Dec 24, 2023):

I been looking for something like this last couple days to decide next macbook i'm going to buy.
we need some sort of public repo to push the data

<!-- gh-comment-id:1868431153 --> @Necmttn commented on GitHub (Dec 24, 2023): I been looking for something like this last couple days to decide next macbook i'm going to buy. we need some sort of public repo to push the data
Author
Owner

@chuangtc commented on GitHub (Jan 20, 2024):

ollama-benchmark was created on Github

https://github.com/aidatatools/ollama-benchmark

Benchmark of running local LLMs on Raspberry Pi 5 RAM 8GB

https://www.youtube.com/watch?v=F3avMe8NvJk
08:59 Throughput rate of mistral:7b model (1.43 tokens/s)
09:12 Throughput rate of llama2:7b model (0.93 tokens/s)
09:22 Throughput rate of llava:7b model (0.336 tokens/s)

<!-- gh-comment-id:1902069004 --> @chuangtc commented on GitHub (Jan 20, 2024): ### ollama-benchmark was created on Github https://github.com/aidatatools/ollama-benchmark ### Benchmark of running local LLMs on Raspberry Pi 5 RAM 8GB https://www.youtube.com/watch?v=F3avMe8NvJk 08:59 Throughput rate of mistral:7b model (1.43 tokens/s) 09:12 Throughput rate of llama2:7b model (0.93 tokens/s) 09:22 Throughput rate of llava:7b model (0.336 tokens/s)
Author
Owner

@chuangtc commented on GitHub (Apr 1, 2024):

It's easy to install now.
https://llm.aidatatools.com/

pip install llm-benchmark
llm_benchmark run
<!-- gh-comment-id:2029479411 --> @chuangtc commented on GitHub (Apr 1, 2024): It's easy to install now. https://llm.aidatatools.com/ ```bash pip install llm-benchmark llm_benchmark run ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26298