UI shows result much slower than it is generated #13

New Issue

2025-11-11T14:02:00-06:00

GiteaMirror commented

2025-11-11 14:02:00 -06:00

Originally created by @lbenedetto on GitHub (Oct 27, 2023).

Describe the bug
The UI looks like it is loading tokens in from the server one at a time, but it's actually much slower than the model is running. Sometimes it speeds up a bit and loads in entire paragraphs at a time, but mostly it runs painfully slowly even after the server has finished responding

In the console logs I see it took 19.5 seconds to generate the response:

ollama        | llama_print_timings:        load time =    1102.04 ms
ollama        | llama_print_timings:      sample time =     284.30 ms /  1027 runs   (    0.28 ms per token,  3612.33 tokens per second)
ollama        | llama_print_timings: prompt eval time =     273.78 ms /   146 tokens (    1.88 ms per token,   533.28 tokens per second)
ollama        | llama_print_timings:        eval time =   18724.47 ms /  1026 runs   (   18.25 ms per token,    54.79 tokens per second)
ollama        | llama_print_timings:       total time =   19506.92 ms

And in the network console in the browser, I see that the chunked response streamed in over the course of 21 seconds.
However, the UI took several minutes to display the full prompt. During that time there was no further network traffic until the automatic prompt for the chat title.

To Reproduce
Steps to reproduce the behavior:

Run a prompt

Expected behavior
When the server is finished streaming the prompt to the client, the full prompt should be displayed.

Screenshots

Desktop (please complete the following information):

OS: Linux Mint 21.1
Browser: Vivaldi (Chromium based)
Version: Not sure, but here's my docker compose file:

version: '3.3'
services:
   ollama-webui:
       ports:
           - '3000:8080'
       container_name: ollama-webui
       image: ollamawebui/ollama-webui
   ollama:
       volumes:
           - './ollama:/root/.ollama'
       ports:
           - '11434:11434'
       environment:
           - 'OLLAMA_ORIGINS=*'
       container_name: ollama
       image: ollama/ollama
       deploy:
         resources:
           reservations:
             devices:
               - driver: nvidia
                 count: 1
                 capabilities: [gpu]

Originally created by @lbenedetto on GitHub (Oct 27, 2023). **Describe the bug** The UI looks like it is loading tokens in from the server one at a time, but it's actually much slower than the model is running. Sometimes it speeds up a bit and loads in entire paragraphs at a time, but mostly it runs painfully slowly even after the server has finished responding In the console logs I see it took 19.5 seconds to generate the response: ``` ollama | llama_print_timings: load time = 1102.04 ms ollama | llama_print_timings: sample time = 284.30 ms / 1027 runs ( 0.28 ms per token, 3612.33 tokens per second) ollama | llama_print_timings: prompt eval time = 273.78 ms / 146 tokens ( 1.88 ms per token, 533.28 tokens per second) ollama | llama_print_timings: eval time = 18724.47 ms / 1026 runs ( 18.25 ms per token, 54.79 tokens per second) ollama | llama_print_timings: total time = 19506.92 ms ``` And in the network console in the browser, I see that the chunked response streamed in over the course of 21 seconds. However, the UI took several minutes to display the full prompt. During that time there was no further network traffic until the automatic prompt for the chat title. **To Reproduce** Steps to reproduce the behavior: 1. Run a prompt **Expected behavior** When the server is finished streaming the prompt to the client, the full prompt should be displayed. **Screenshots** ![image](https://github.com/ollama-webui/ollama-webui/assets/4466272/aaba66e9-5588-4d99-9afa-9908ab0db98f) **Desktop (please complete the following information):** - OS: Linux Mint 21.1 - Browser: Vivaldi (Chromium based) - Version: Not sure, but here's my docker compose file: ``` version: '3.3' services: ollama-webui: ports: - '3000:8080' container_name: ollama-webui image: ollamawebui/ollama-webui ollama: volumes: - './ollama:/root/.ollama' ports: - '11434:11434' environment: - 'OLLAMA_ORIGINS=*' container_name: ollama image: ollama/ollama deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] ```

GiteaMirror closed this issue

2025-11-11 14:02:01 -06:00

GiteaMirror commented

2025-11-11 14:02:02 -06:00

@tjbck commented on GitHub (Oct 27, 2023):

Hi, could you also tell us the model you're using? Or maybe even a video demonstrating the issue you're experiencing? I cannot reproduce the issue and also for most other people the WebUI seems to work just fine. Keep us updated. Thanks.

@tjbck commented on GitHub (Oct 27, 2023): Hi, could you also tell us the model you're using? Or maybe even a video demonstrating the issue you're experiencing? I cannot reproduce the issue and also for most other people the WebUI seems to work just fine. Keep us updated. Thanks.

GiteaMirror commented

2025-11-11 14:02:02 -06:00

@lbenedetto commented on GitHub (Oct 27, 2023):

dolphin2.1-mistral:latest
The model shouldn't matter since I can see the complete response sitting in the Network tab while the UI is still working on rendering it one token at a time.

I'll put together a demo video.

@lbenedetto commented on GitHub (Oct 27, 2023): `dolphin2.1-mistral:latest` The model shouldn't matter since I can see the complete response sitting in the Network tab while the UI is still working on rendering it one token at a time. I'll put together a demo video.

GiteaMirror commented

2025-11-11 14:02:02 -06:00

@tjbck commented on GitHub (Oct 27, 2023):

Also if you have other computers available please try the Web UI on other devices and see if it experiences the same issues. Thanks!

@tjbck commented on GitHub (Oct 27, 2023): Also if you have other computers available please try the Web UI on other devices and see if it experiences the same issues. Thanks!

GiteaMirror commented

2025-11-11 14:02:02 -06:00

@lbenedetto commented on GitHub (Oct 27, 2023):

I tried it in Firefox and it works fine.

I've tried disabling all my Chrome extensions in Vivaldi but it was still broken. So I guess it's confirmed to be a Vivaldi issue.

Did you try to reproduce the issue in Vivaldi?

@lbenedetto commented on GitHub (Oct 27, 2023): I tried it in Firefox and it works fine. I've tried disabling all my Chrome extensions in Vivaldi but it was still broken. So I guess it's confirmed to be a Vivaldi issue. Did you try to reproduce the issue in Vivaldi?

GiteaMirror commented

2025-11-11 14:02:02 -06:00

@tjbck commented on GitHub (Oct 27, 2023):

Haven't tried with vivaldi, but I'm guessing it might have something to do with the VPN feature. Were you able to rule that out?

@tjbck commented on GitHub (Oct 27, 2023): Haven't tried with vivaldi, but I'm guessing it might have something to do with the VPN feature. Were you able to rule that out?

GiteaMirror commented

2025-11-11 14:02:03 -06:00

@lbenedetto commented on GitHub (Oct 27, 2023):

I don't know what you're referring to. I'm not aware of any VPNs on my system.

@lbenedetto commented on GitHub (Oct 27, 2023): I don't know what you're referring to. I'm not aware of any VPNs on my system.

GiteaMirror commented

2025-11-11 14:02:03 -06:00

@tjbck commented on GitHub (Oct 28, 2023):

Tried Vivaldi on Mac M1 and other Macs, No issues for me. Keep us updated if you manage to find the issue. Thanks.

@tjbck commented on GitHub (Oct 28, 2023): Tried Vivaldi on Mac M1 and other Macs, No issues for me. Keep us updated if you manage to find the issue. Thanks.

GiteaMirror commented

2025-11-11 14:02:03 -06:00

@JaminJiang commented on GitHub (Aug 19, 2024):

I have encounter the same promblem on Mac Chrome. Have you figured it out? Do you know why and how to fix it? @lbenedetto

@JaminJiang commented on GitHub (Aug 19, 2024): I have encounter the same promblem on Mac Chrome. Have you figured it out? Do you know why and how to fix it? @lbenedetto

GiteaMirror commented

2025-11-11 14:02:04 -06:00

@loteque commented on GitHub (Dec 29, 2024):

here is an example video that shows the issue:

https://github.com/user-attachments/assets/d32d3db4-b80b-4e97-9994-6018c767f8f1

@loteque commented on GitHub (Dec 29, 2024): here is an example video that shows the issue: https://github.com/user-attachments/assets/d32d3db4-b80b-4e97-9994-6018c767f8f1

GiteaMirror commented

2025-11-11 14:02:04 -06:00

@maurerle commented on GitHub (Apr 8, 2025):

I can confirm that this is not an issue.
@loteque what you see is the previous prompt.

You do have an open Websocket connection (see status code 101 in network tab) with path wss://myhost/ws/socket.io/?EIO=4&transport=websocket
To which the current output is streamed.
You do need to open the network tab, then press F5 to see all requests from the start of visiting a website.

The speed on the network tab is the same as shown in the UI.

@maurerle commented on GitHub (Apr 8, 2025): I can confirm that this is not an issue. @loteque what you see is the previous prompt. You do have an open Websocket connection (see status code 101 in network tab) with path wss://myhost/ws/socket.io/?EIO=4&transport=websocket To which the current output is streamed. You do need to open the network tab, then press F5 to see all requests from the start of visiting a website. The speed on the network tab is the same as shown in the UI.