[GH-ISSUE #8295] Considerably slower response via API than terminal #5310

Closed
opened 2026-04-12 16:30:06 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @leo-petrucci on GitHub (Jan 3, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8295

What is the issue?

I've been trying to run a small model (gemma2:2b) on my Raspberry Pi 5. I heard the performance was quite impressive and it's true! When running Ollama in terminal responses are very fast considering the hardware.

https://github.com/user-attachments/assets/eb7b3add-a35a-4123-8e5b-01ca109ae395

That said, I wanted to try to use it via a frontend so I attempted to hit the API endpoints Ollama exposes and the response time was much slower than the terminal.

In the video below the first response is almost instant, and then it slows down to a crawl before almost crashing. Most times the first response crashes the system before finishing.

https://github.com/user-attachments/assets/53691d7a-c263-4482-905e-9e8dc9893946

Is there a reason for this? How can I debug the issue?
I'm aware Ollama isn't really made for a GPU-less Rpi but hopefully someone can let me know if I'm doing something wrong or if it's a known issue!

OS

Linux

GPU

Other

CPU

Other

Ollama version

0.5.4-0-g2ddc32d-dirty

Originally created by @leo-petrucci on GitHub (Jan 3, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8295 ### What is the issue? I've been trying to run a small model (`gemma2:2b`) on my Raspberry Pi 5. I heard the performance was quite impressive and it's true! When running Ollama in terminal responses are very fast considering the hardware. https://github.com/user-attachments/assets/eb7b3add-a35a-4123-8e5b-01ca109ae395 That said, I wanted to try to use it via a frontend so I attempted to hit the API endpoints Ollama exposes and the response time was **much** slower than the terminal. In the video below the first response is almost instant, and then it slows down to a crawl before almost crashing. Most times the first response crashes the system before finishing. https://github.com/user-attachments/assets/53691d7a-c263-4482-905e-9e8dc9893946 Is there a reason for this? How can I debug the issue? I'm aware Ollama isn't really made for a GPU-less Rpi but hopefully someone can let me know if I'm doing something wrong or if it's a known issue! ### OS Linux ### GPU Other ### CPU Other ### Ollama version 0.5.4-0-g2ddc32d-dirty
GiteaMirror added the needs more infobug labels 2026-04-12 16:30:06 -05:00
Author
Owner

@dpk-it commented on GitHub (Jan 3, 2025):

Maybe it will help https://github.com/open-webui/open-webui/discussions/8088

in the background, open webui generates tags and a title for the chat after the first message, and every time you write something in the input field, the autocomplete function is executed

<!-- gh-comment-id:2569573623 --> @dpk-it commented on GitHub (Jan 3, 2025): Maybe it will help https://github.com/open-webui/open-webui/discussions/8088 in the background, open webui generates tags and a title for the chat after the first message, and every time you write something in the input field, the autocomplete function is executed
Author
Owner

@rick-github commented on GitHub (Jan 3, 2025):

Server logs will aid in debugging.

<!-- gh-comment-id:2569932038 --> @rick-github commented on GitHub (Jan 3, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.
Author
Owner

@pdevine commented on GitHub (Jan 8, 2025):

Can you show the output of ollama ps? I'm guessing you got lucky on the first request and then started swapping. It seems like ollama is working correctly though.

<!-- gh-comment-id:2578287683 --> @pdevine commented on GitHub (Jan 8, 2025): Can you show the output of `ollama ps`? I'm guessing you got lucky on the first request and then started swapping. It seems like ollama is working correctly though.
Author
Owner

@leo-petrucci commented on GitHub (Jan 9, 2025):

Sorry for the slow response. I'm unsure of what the initial issue was, but I waited a couple of days and tore down the Docker container to look into it and it now seems to work just fine.

Hitting the API directly was just as slow so I'm not sure what the initial issue was.

<!-- gh-comment-id:2579811137 --> @leo-petrucci commented on GitHub (Jan 9, 2025): Sorry for the slow response. I'm unsure of what the initial issue was, but I waited a couple of days and tore down the Docker container to look into it and it now seems to work just fine. Hitting the API directly was just as slow so I'm not sure what the initial issue was.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5310