[GH-ISSUE #5481] Time waste in API remote call #49938

Closed
opened 2026-04-28 13:29:47 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @hesetone on GitHub (Jul 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5481

  1. BackGround
    I have a llama3-8B instance running in docker, nothing strange happen during submit requests from open-webUI. But it appears that handling requests from curl or postman will be a painful experience. it waste a lot of time to get the total answer. how can i fix it out?

  2. Problem
    Image 1 is captured from something like postman, it takes almost 20s waiting for response from open-WebUI. Image 2 is captured from logs of container in which the open-WebUI running. it seems that time waste just between Cannonball results 572 -> 287 tokens and [TELEMETRY SENT].
    image
    image
    Is there any solutions to fix it entirely? less time waste would also be appreciated.

Originally created by @hesetone on GitHub (Jul 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5481 1. **BackGround** I have a llama3-8B instance running in docker, nothing strange happen during submit requests from open-webUI. But it appears that handling requests from `curl` or `postman` will be a painful experience. it waste a lot of time to get the total answer. how can i fix it out? 2. **Problem** Image 1 is captured from something like `postman`, it takes almost 20s waiting for response from open-WebUI. Image 2 is captured from logs of container in which the open-WebUI running. it seems that time waste just between `Cannonball results 572 -> 287 tokens` and `[TELEMETRY SENT]`. ![image](https://github.com/ollama/ollama/assets/18364031/498b6e15-8848-4015-8516-cd7b3259589a) ![image](https://github.com/ollama/ollama/assets/18364031/7a052203-a75f-482b-acde-7afc15086583) Is there any solutions to fix it entirely? less time waste would also be appreciated.
GiteaMirror added the feature request label 2026-04-28 13:29:47 -05:00
Author
Owner

@jmorganca commented on GitHub (Jul 4, 2024):

Hi @hesetone. Are you streaming the response? It seems it may either be:

  1. Prompt evaluation - this currently takes some time if the prompt is long (working on speeding that up)
  2. Non-streaming: it takes time to respond entirely

Check out https://github.com/open-webui as well in case that repo may be able to help. If Ollama is slow for you with ollama run or curl directly, let me know and we can look into that!

<!-- gh-comment-id:2209302068 --> @jmorganca commented on GitHub (Jul 4, 2024): Hi @hesetone. Are you streaming the response? It seems it may either be: 1. Prompt evaluation - this currently takes some time if the prompt is long (working on speeding that up) 2. Non-streaming: it takes time to respond entirely Check out https://github.com/open-webui as well in case that repo may be able to help. If Ollama is slow for you with `ollama run` or `curl` directly, let me know and we can look into that!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49938