[GH-ISSUE #2699] Slow Response Time on Windows Prompt Compared to WSL #27373

Closed
opened 2026-04-22 04:40:49 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @samer-alhalabi on GitHub (Feb 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2699

Originally assigned to: @dhiltgen on GitHub.

When executing prompts on Ollama using Windows version, I experience considerable delays and slowness in response time. However, when running the exact same model and prompt via WSL, the response time is notably faster. Given that the Windows version of Ollama is currently in preview, I understand there may be optimizations underway. Could you provide insight into whether there's a timeline for the next version release that addresses performance ?

Originally created by @samer-alhalabi on GitHub (Feb 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2699 Originally assigned to: @dhiltgen on GitHub. When executing prompts on Ollama using Windows version, I experience considerable delays and slowness in response time. However, when running the exact same model and prompt via WSL, the response time is notably faster. Given that the Windows version of Ollama is currently in preview, I understand there may be optimizations underway. Could you provide insight into whether there's a timeline for the next version release that addresses performance ?
Author
Owner

@seanmavley commented on GitHub (Feb 23, 2024):

Is the Ollama on windows using GPU? Are you able to confirm that?

Because slower response time on Windows may be because on Windows, somehow Ollama isn't using the GPU, compared to on WSL

<!-- gh-comment-id:1960945125 --> @seanmavley commented on GitHub (Feb 23, 2024): Is the Ollama on windows using GPU? Are you able to confirm that? Because slower response time on Windows may be because on Windows, somehow Ollama isn't using the GPU, compared to on WSL
Author
Owner

@Huarch commented on GitHub (Feb 23, 2024):

Did you run with Windows app? If you run ollama serve through terminal you may find the speed between them should be close. Maybe this is a bug or something.

<!-- gh-comment-id:1961245627 --> @Huarch commented on GitHub (Feb 23, 2024): Did you run with Windows app? If you run ollama serve through terminal you may find the speed between them should be close. Maybe this is a bug or something.
Author
Owner

@samer-alhalabi commented on GitHub (Feb 23, 2024):

Is the Ollama on windows using GPU? Are you able to confirm that?

Because slower response time on Windows may be because on Windows, somehow Ollama isn't using the GPU, compared to on WSL

CPU

<!-- gh-comment-id:1961411215 --> @samer-alhalabi commented on GitHub (Feb 23, 2024): > Is the Ollama on windows using GPU? Are you able to confirm that? > > Because slower response time on Windows may be because on Windows, somehow Ollama isn't using the GPU, compared to on WSL CPU
Author
Owner

@samer-alhalabi commented on GitHub (Feb 23, 2024):

Did you run with Windows app? If you run ollama serve through terminal you may find the speed between them should be close. Maybe this is a bug or something.

Running both through the terminal.

<!-- gh-comment-id:1961412728 --> @samer-alhalabi commented on GitHub (Feb 23, 2024): > Did you run with Windows app? If you run ollama serve through terminal you may find the speed between them should be close. Maybe this is a bug or something. Running both through the terminal.
Author
Owner

@dhiltgen commented on GitHub (Feb 26, 2024):

We fixed a bug in 0.1.27 where CPU performance on windows was ~1/2 what it should be. If you're running the prior version, please upgrade and that should fix the performance. If you are running this version and still see slower performance on native windows vs. WSL2, please attach server logs from both so we can see why it's behaving differently.

<!-- gh-comment-id:1964744497 --> @dhiltgen commented on GitHub (Feb 26, 2024): We fixed a bug in 0.1.27 where CPU performance on windows was ~1/2 what it should be. If you're running the prior version, please upgrade and that should fix the performance. If you are running this version and still see slower performance on native windows vs. WSL2, please attach server logs from both so we can see why it's behaving differently.
Author
Owner

@samer-alhalabi commented on GitHub (Feb 26, 2024):

We fixed a bug in 0.1.27 where CPU performance on windows was ~1/2 what it should be. If you're running the prior version, please upgrade and that should fix the performance. If you are running this version and still see slower performance on native windows vs. WSL2, please attach server logs from both so we can see why it's behaving differently.

Yes, I did notice a noticeable improvement in performance with the newer version [0.1.27]. Thanks for the update.

<!-- gh-comment-id:1965030364 --> @samer-alhalabi commented on GitHub (Feb 26, 2024): > We fixed a bug in 0.1.27 where CPU performance on windows was ~1/2 what it should be. If you're running the prior version, please upgrade and that should fix the performance. If you are running this version and still see slower performance on native windows vs. WSL2, please attach server logs from both so we can see why it's behaving differently. Yes, I did notice a noticeable improvement in performance with the newer version [0.1.27]. Thanks for the update.
Author
Owner

@Skyggedans commented on GitHub (Apr 14, 2024):

Using deepseek-coder:33b with some simple prompt like "Write a linked list implementation in Rust" mine 4090 has 100% load, but draws only approximately 50 watts from PSU. The request completes in 1-1.5 minutes, which seems way too slow to me. While inside WSL2 it's 90% load and 260 watts of power consumption and response takes 10-13 seconds.

<!-- gh-comment-id:2053984710 --> @Skyggedans commented on GitHub (Apr 14, 2024): Using deepseek-coder:33b with some simple prompt like "Write a linked list implementation in Rust" mine 4090 has 100% load, but draws only approximately 50 watts from PSU. The request completes in 1-1.5 minutes, which seems way too slow to me. While inside WSL2 it's 90% load and 260 watts of power consumption and response takes 10-13 seconds.
Author
Owner

@dhiltgen commented on GitHub (Apr 15, 2024):

@Skyggedans take a look at #3511 and see if that captures your scenario. If not, go ahead and file a new issue with your server log so we can see what might be going wrong.

<!-- gh-comment-id:2057868454 --> @dhiltgen commented on GitHub (Apr 15, 2024): @Skyggedans take a look at #3511 and see if that captures your scenario. If not, go ahead and file a new issue with your server log so we can see what might be going wrong.
Author
Owner

@shreyasdeodhare commented on GitHub (Jul 8, 2024):

Hello @dhiltgen , I am running the Ollama tinyllama model on the cpu windows machine with Windows10 and I am having the Ollama 0.2.1 version ,but the resposne time for getting the response is about 10 sec which is slow can you please suggest any solution for it I am running the Ollama model through the code

<!-- gh-comment-id:2212989845 --> @shreyasdeodhare commented on GitHub (Jul 8, 2024): Hello @dhiltgen , I am running the Ollama tinyllama model on the cpu windows machine with Windows10 and I am having the Ollama 0.2.1 version ,but the resposne time for getting the response is about 10 sec which is slow can you please suggest any solution for it I am running the Ollama model through the code
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27373