[GH-ISSUE #2766] Some issues on Windows #48179

Closed
opened 2026-04-28 07:03:20 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @vrubzov1957 on GitHub (Feb 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2766

Originally assigned to: @dhiltgen on GitHub.

Guys, have some issues with Ollama on Windows (11 + WSL2).
Ollama version - was downloaded 24.02.2024 from off-site, version for Windows.

  1. Ollama models works on CPU, not on GPU (Nvidia 1080 11G). Once upon a time it somehow run on the video card - but the pattern of how and when it works could not be found out, looks that select GPU/CPU randomly. On CPU good/big models works very slow.
  2. After restart of Windows Ollama server not up. Need manually open cmd-terminal, type "ollama serve", and hold terminal-windows opened.
Originally created by @vrubzov1957 on GitHub (Feb 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2766 Originally assigned to: @dhiltgen on GitHub. Guys, have some issues with Ollama on Windows (11 + WSL2). Ollama version - was downloaded 24.02.2024 from off-site, version for Windows. 1. Ollama models works on CPU, not on GPU (Nvidia 1080 11G). Once upon a time it somehow run on the video card - but the pattern of how and when it works could not be found out, looks that select GPU/CPU randomly. On CPU good/big models works very slow. 2. After restart of Windows Ollama server not up. Need manually open cmd-terminal, type "ollama serve", and hold terminal-windows opened.
Author
Owner

@dhiltgen commented on GitHub (Feb 26, 2024):

Could you share the server log?

Have you considered the new native Windows version instead of WSL2?

<!-- gh-comment-id:1964722860 --> @dhiltgen commented on GitHub (Feb 26, 2024): Could you share the server log? Have you considered the new native Windows version instead of WSL2?
Author
Owner

@vrubzov1957 commented on GitHub (Feb 26, 2024):

Could you share the server log?

Sure.
I found this in \AppData\Local\Ollama folder. But I see that here a last logs was only 2 days ago
server.log

And from terminal window (run manually)
manual_logs.txt
Here

  1. I start ollama.
  2. Then I first run generation with small model (4 Gb) - It works fast and have load of my GPU about 30%, CPU load about 60%
  3. And after this I run generation with big model (18G) - It works slow and have load of my GPU about 0-6%, CPU load about 60-70%. Idk why((
    Screen of GPU load when doing generation with big model:
    gpu

And

Have you considered the new native Windows version instead of WSL2?

I am using this version:
ollama

PS. My Windows-Defender periodically delete some files in ollama
q_en

<!-- gh-comment-id:1965540230 --> @vrubzov1957 commented on GitHub (Feb 26, 2024): > Could you share the server log? Sure. I found this in <User>\AppData\Local\Ollama folder. But I see that here a last logs was only 2 days ago [server.log](https://github.com/ollama/ollama/files/14412548/server.log) And from terminal window (run manually) [manual_logs.txt](https://github.com/ollama/ollama/files/14412684/manual_logs.txt) Here 1. I start ollama. 2. Then I first run generation with small model (4 Gb) - It works fast and have load of my GPU about 30%, CPU load about 60% 3. And after this I run generation with big model (18G) - It works slow and have load of my GPU about 0-6%, CPU load about 60-70%. Idk why(( Screen of GPU load when doing generation with big model: <img width="701" alt="gpu" src="https://github.com/ollama/ollama/assets/54937209/8326255d-31c0-43eb-b2d7-5d4919671f9e"> And > Have you considered the new native Windows version instead of WSL2? I am using this version: <img width="522" alt="ollama" src="https://github.com/ollama/ollama/assets/54937209/61e6c78e-e39c-439a-a989-cc33fc4c6bea"> PS. My Windows-Defender periodically delete some files in ollama <img width="866" alt="q_en" src="https://github.com/ollama/ollama/assets/54937209/a56af82c-f38b-471f-bfea-f247b5f018fb">
Author
Owner

@dhiltgen commented on GitHub (Feb 27, 2024):

Thanks for the log files.

It looks like things are behaving as expected, and the models are being loaded into the GPU. When you load the larger model, it's only able to load 26/81 layers to GPU given your limited VRAM. As a result, the GPU spends most of its time idle waiting for the CPU to process the layers it has loaded into system memory. This is expected behavior.

In the opening comment you mention After restart of Windows Ollama server not up which sounds like it is likely the result of the AV scan. That was tracked in issue #2519 and has been resolved with Microsoft. Please make sure you've updated your AV definitions, and Ollama should no longer be tagged with this false positive.

<!-- gh-comment-id:1967040594 --> @dhiltgen commented on GitHub (Feb 27, 2024): Thanks for the log files. It looks like things are behaving as expected, and the models are being loaded into the GPU. When you load the larger model, it's only able to load `26/81 layers to GPU` given your limited VRAM. As a result, the GPU spends most of its time idle waiting for the CPU to process the layers it has loaded into system memory. This is expected behavior. In the opening comment you mention `After restart of Windows Ollama server not up` which sounds like it is likely the result of the AV scan. That was tracked in issue #2519 and has been resolved with Microsoft. Please make sure you've updated your AV definitions, and Ollama should no longer be tagged with this false positive.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48179