[GH-ISSUE #10281] Time to first token about 50s? #53263

Open
opened 2026-04-29 02:26:13 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @khteh on GitHub (Apr 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10281

I run ollama server on a local k8s cluster and I use llama3.3 model in my Langchain app. Why does the time to first token take so long? Where is the bottleneck? How to reduce this latency?

Image

Originally created by @khteh on GitHub (Apr 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10281 I run ollama server on a local k8s cluster and I use `llama3.3` model in my Langchain app. Why does the time to first token take so long? Where is the bottleneck? How to reduce this latency? ![Image](https://github.com/user-attachments/assets/b5153a2d-78dd-4553-ae0b-bbeb56d5ba0d)
Author
Owner

@khteh commented on GitHub (Apr 17, 2025):

Because of this https://github.com/ollama/ollama/issues/10137?

<!-- gh-comment-id:2811772904 --> @khteh commented on GitHub (Apr 17, 2025): Because of this https://github.com/ollama/ollama/issues/10137?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53263