[GH-ISSUE #4843] Ollama running locally with very high latency #65100

Closed
opened 2026-05-03 19:44:57 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @vsatyakiran on GitHub (Jun 5, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4843

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I have installed ollama and tried to run llama2, llama3:8b but it is generating just 5 to 8 tokens per second , my system config: windows os, 16gb RAM
I also tried it on ec2 instance in aws with g5.xlarge instance type but facing the same latency, why it is happening?

OS

Windows

GPU

Intel

CPU

Intel

Ollama version

0.1.39

Originally created by @vsatyakiran on GitHub (Jun 5, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4843 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I have installed ollama and tried to run llama2, llama3:8b but it is generating just 5 to 8 tokens per second , my system config: windows os, 16gb RAM I also tried it on ec2 instance in aws with g5.xlarge instance type but facing the same latency, why it is happening? ### OS Windows ### GPU Intel ### CPU Intel ### Ollama version 0.1.39
GiteaMirror added the bugneeds more info labels 2026-05-03 19:44:58 -05:00
Author
Owner

@pdevine commented on GitHub (Jun 6, 2024):

@vsatyakiran you're almost certainly running on the CPU instead of the GPU. What is the output of ollama ps? Also, what type of GPU are you trying to use? With the g5.xlarge I believe it's an a100 and there was an issue w/ nvidia driver version 555 which should be fixed in Ollama 0.1.40.

<!-- gh-comment-id:2153446291 --> @pdevine commented on GitHub (Jun 6, 2024): @vsatyakiran you're almost certainly running on the CPU instead of the GPU. What is the output of `ollama ps`? Also, what type of GPU are you trying to use? With the g5.xlarge I believe it's an a100 and there was an issue w/ nvidia driver version 555 which should be fixed in Ollama 0.1.40.
Author
Owner

@dhiltgen commented on GitHub (Jun 18, 2024):

@vsatyakiran if you're still having troubles, please upgrade to the latest version, and if that doesn't get you running on your GPU, please share your server log and I'll reopen the issue.

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

<!-- gh-comment-id:2176985062 --> @dhiltgen commented on GitHub (Jun 18, 2024): @vsatyakiran if you're still having troubles, please upgrade to the latest version, and if that doesn't get you running on your GPU, please share your server log and I'll reopen the issue. https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65100