[GH-ISSUE #12217] macOS 15: Ollama defaults to CPU on AMD RX 5700 despite a fully working Metal benchmark #70188

Open
opened 2026-05-04 20:37:27 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @denesfr on GitHub (Sep 8, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12217

What is the issue?

System Details

  • OS: macOS 15.0.1 (Build 24A910)
  • CPU: Intel Core i5-12600KF
  • GPU: AMD Radeon RX 5700
  • RAM: 32 GB
  • System Type: Hackintosh (SMBIOS MacPro7,1)
  • Ollama Version: 0.11.10

Problem Description

Ollama is consistently defaulting to CPU-only inference. The server log explicitly shows library=cpu. It does not appear to even attempt to search for or initialize a Metal-capable GPU, as the "Searching for Metal GPU" log message is absent. This results in 100% CPU usage and 0% GPU usage during inference.

Evidence

1. Ollama Server Log

The server log clearly shows the fallback to the CPU library, even with default environment variables.

time=2025-09-08T14:44:13.644-03:00 level=INFO source=routes.go:1331 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:[http://127.0.0.1:11434](http://127.0.0.1:11434) OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/denesferreira/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* [http://127.0.0.1](http://127.0.0.1) [https://127.0.0.1](https://127.0.0.1) [http://127.0.0.1](http://127.0.0.1):* [https://127.0.0.1](https://127.0.0.1):* [http://0.0.0.0](http://0.0.0.0) [https://0.0.0.0](https://0.0.0.0) [http://0.0.0.0](http://0.0.0.0):* [https://0.0.0.0](https://0.0.0.0):* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-09-08T14:44:13.644-03:00 level=INFO source=images.go:477 msg="total blobs: 5"
time=2025-09-08T14:44:13.645-03:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2025-09-08T14:44:13.645-03:00 level=INFO source=routes.go:1384 msg="Listening on 127.0.0.1:11434 (version 0.11.10)"
time=2025-09-08T14:44:13.645-03:00 level=INFO source=types.go:131 msg="inference compute" id="" library=cpu variant="" compute="" driver=0.0 name="" total="32.0 GiB" available="18.3 GiB"

2. Proof of Working Metal Compute

Despite Ollama failing to use the GPU, the system's Metal compute capabilities are fully functional, as proven by a successful Geekbench 6 Metal benchmark.

Image

Troubleshooting Steps Taken

  • Verified that the Ollama version is the latest available.
  • Confirmed that the Hackintosh configuration is stable and correctly identifies the GPU for other Metal compute applications (like Geekbench).
  • Tested with both a custom OLLAMA_MODELS environment variable and with the variable unset; the result is the same fallback to CPU.

Relevant log output


OS

macOS

GPU

AMD

CPU

Intel

Ollama version

0.11.10

Originally created by @denesfr on GitHub (Sep 8, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12217 ### What is the issue? ### System Details - **OS:** macOS 15.0.1 (Build 24A910) - **CPU:** Intel Core i5-12600KF - **GPU:** AMD Radeon RX 5700 - **RAM:** 32 GB - **System Type:** Hackintosh (SMBIOS `MacPro7,1`) - **Ollama Version:** 0.11.10 ### Problem Description Ollama is consistently defaulting to CPU-only inference. The server log explicitly shows `library=cpu`. It does not appear to even attempt to search for or initialize a Metal-capable GPU, as the "Searching for Metal GPU" log message is absent. This results in 100% CPU usage and 0% GPU usage during inference. ### Evidence #### 1. Ollama Server Log The server log clearly shows the fallback to the CPU library, even with default environment variables. ``` time=2025-09-08T14:44:13.644-03:00 level=INFO source=routes.go:1331 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:[http://127.0.0.1:11434](http://127.0.0.1:11434) OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/denesferreira/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* [http://127.0.0.1](http://127.0.0.1) [https://127.0.0.1](https://127.0.0.1) [http://127.0.0.1](http://127.0.0.1):* [https://127.0.0.1](https://127.0.0.1):* [http://0.0.0.0](http://0.0.0.0) [https://0.0.0.0](https://0.0.0.0) [http://0.0.0.0](http://0.0.0.0):* [https://0.0.0.0](https://0.0.0.0):* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2025-09-08T14:44:13.644-03:00 level=INFO source=images.go:477 msg="total blobs: 5" time=2025-09-08T14:44:13.645-03:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2025-09-08T14:44:13.645-03:00 level=INFO source=routes.go:1384 msg="Listening on 127.0.0.1:11434 (version 0.11.10)" time=2025-09-08T14:44:13.645-03:00 level=INFO source=types.go:131 msg="inference compute" id="" library=cpu variant="" compute="" driver=0.0 name="" total="32.0 GiB" available="18.3 GiB" ``` #### 2. Proof of Working Metal Compute Despite Ollama failing to use the GPU, the system's Metal compute capabilities are fully functional, as proven by a successful Geekbench 6 Metal benchmark. <img width="1201" height="2352" alt="Image" src="https://github.com/user-attachments/assets/bd767bfa-aa0d-4fd5-b565-ac00998fdfd3" /> ### Troubleshooting Steps Taken - Verified that the Ollama version is the latest available. - Confirmed that the Hackintosh configuration is stable and correctly identifies the GPU for other Metal compute applications (like Geekbench). - Tested with both a custom `OLLAMA_MODELS` environment variable and with the variable unset; the result is the same fallback to CPU. ### Relevant log output ```shell ``` ### OS macOS ### GPU AMD ### CPU Intel ### Ollama version 0.11.10
GiteaMirror added the bug label 2026-05-04 20:37:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70188