[GH-ISSUE #10588] Ollama doesn't use GPU anymore #69026

Closed
opened 2026-05-04 16:54:07 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @TheEisbaer on GitHub (May 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10588

What is the issue?

I am trying to use qwen3, after updating to 0.6.8 stopped using GPU

Relevant log output

source=memory.go:194 msg="gpu has too little memory to allocate any layers" id=GPU-152ad784-f966-afea-c7eb-3ef56b5c7522 library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA RTX 2000 Ada Generation Laptop GPU" total="8.0 GiB" available="6.9 GiB" minimum_memory=479199232 layer_size="375.6 MiB" gpu_zer_overhead="0 B" partial_offload="6.0 GiB" full_offload="6.0 GiB"

OS

WSL2

GPU

AMD, Nvidia

CPU

Intel

Ollama version

0.6.8

Originally created by @TheEisbaer on GitHub (May 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10588 ### What is the issue? I am trying to use qwen3, after updating to 0.6.8 stopped using GPU ### Relevant log output ```shell source=memory.go:194 msg="gpu has too little memory to allocate any layers" id=GPU-152ad784-f966-afea-c7eb-3ef56b5c7522 library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA RTX 2000 Ada Generation Laptop GPU" total="8.0 GiB" available="6.9 GiB" minimum_memory=479199232 layer_size="375.6 MiB" gpu_zer_overhead="0 B" partial_offload="6.0 GiB" full_offload="6.0 GiB" ``` ### OS WSL2 ### GPU AMD, Nvidia ### CPU Intel ### Ollama version 0.6.8
GiteaMirror added the bug label 2026-05-04 16:54:07 -05:00
Author
Owner

@TheEisbaer commented on GitHub (May 6, 2025):

My mistake, set 65k context size but default qwen3 only supports 32k

<!-- gh-comment-id:2853967940 --> @TheEisbaer commented on GitHub (May 6, 2025): My mistake, set 65k context size but default qwen3 only supports 32k
Author
Owner

@AlessandroSpallina commented on GitHub (May 9, 2025):

Context size of qwen3 in Ollama isn’t clear to me. My understanding is that qwen3 support 32K context but that can be increased up to 120K with YaRN; is this the case in Ollama?

Moreover, anyone have tried the qwen3 suggested parameters with RAG use cases? Alibaba suggests temperature around 0.7..

<!-- gh-comment-id:2867580120 --> @AlessandroSpallina commented on GitHub (May 9, 2025): Context size of qwen3 in Ollama isn’t clear to me. My understanding is that qwen3 support 32K context but that can be increased up to 120K with YaRN; is this the case in Ollama? Moreover, anyone have tried the qwen3 suggested parameters with RAG use cases? Alibaba suggests temperature around 0.7..
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69026