[GH-ISSUE #8906] how to disable using cpu #5774

Closed
opened 2026-04-12 17:06:32 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @gyfprivate on GitHub (Feb 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8906

How can i just use gpu

Relevant log output

2025/02/07 09:49:54 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:E:\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-02-07T09:49:54.790+08:00 level=INFO source=images.go:432 msg="total blobs: 7"
time=2025-02-07T09:49:54.791+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-02-07T09:49:54.791+08:00 level=INFO source=routes.go:1238 msg="Listening on [::]:11434 (version 0.5.7)"
time=2025-02-07T09:49:54.792+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm_avx]"
time=2025-02-07T09:49:54.792+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-02-07T09:49:54.792+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-02-07T09:49:54.792+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=4 efficiency=0 threads=8
time=2025-02-07T09:49:54.954+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-34f0da31-c41c-fe70-128e-663782b879ee library=cuda variant=v12 compute=6.1 driver=12.8 name="NVIDIA GeForce GTX 1070 Ti" total="8.0 GiB" available="7.0 GiB"

Image

Image

OS
Windows

GPU
Nvidia

CPU
Intel

Ollama version
0.5.7

Originally created by @gyfprivate on GitHub (Feb 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8906 How can i just use gpu **Relevant log output** ``` 2025/02/07 09:49:54 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:E:\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-02-07T09:49:54.790+08:00 level=INFO source=images.go:432 msg="total blobs: 7" time=2025-02-07T09:49:54.791+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-02-07T09:49:54.791+08:00 level=INFO source=routes.go:1238 msg="Listening on [::]:11434 (version 0.5.7)" time=2025-02-07T09:49:54.792+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm_avx]" time=2025-02-07T09:49:54.792+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs" time=2025-02-07T09:49:54.792+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-02-07T09:49:54.792+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=4 efficiency=0 threads=8 time=2025-02-07T09:49:54.954+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-34f0da31-c41c-fe70-128e-663782b879ee library=cuda variant=v12 compute=6.1 driver=12.8 name="NVIDIA GeForce GTX 1070 Ti" total="8.0 GiB" available="7.0 GiB" ``` ![Image](https://github.com/user-attachments/assets/534c2fca-7436-41bf-a256-28aaf17efd7e) ![Image](https://github.com/user-attachments/assets/612e24b6-66b1-4e0a-a0f5-fc17b82e04bb) **OS** Windows GPU Nvidia CPU Intel Ollama version 0.5.7
GiteaMirror added the feature request label 2026-04-12 17:06:32 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 7, 2025):

Screenshot shows GPU being used. CPU is still used to give instructions and receive results.

<!-- gh-comment-id:2642301038 --> @rick-github commented on GitHub (Feb 7, 2025): Screenshot shows GPU being used. CPU is still used to give instructions and receive results.
Author
Owner

@pdevine commented on GitHub (Feb 8, 2025):

As @rick-github mentioned, this is working correctly. ollama ps shows the model offloaded to the GPU.

<!-- gh-comment-id:2644392292 --> @pdevine commented on GitHub (Feb 8, 2025): As @rick-github mentioned, this is working correctly. `ollama ps` shows the model offloaded to the GPU.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5774