[GH-ISSUE #13127] Vulkan - macOS AMD GPU #55201

Open
opened 2026-04-29 08:29:58 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @n-connect on GitHub (Nov 18, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13127

What is the issue?

Thank you for the update! Notified from #2033

On an Intel macOS system with AMD 5500M, the GPU(s) not listed after running ollama-v0.12.11 via OLLAMA_VULKAN=1 ollama serve

Log:

OLLAMA_VULKAN=1 ollama serve
time=2025-11-18T10:52:29.451+01:00 level=INFO source=routes.go:1544 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/attis/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-11-18T10:52:29.455+01:00 level=INFO source=images.go:522 msg="total blobs: 4"
time=2025-11-18T10:52:29.456+01:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-18T10:52:29.457+01:00 level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)"
time=2025-11-18T10:52:29.458+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-18T10:52:29.459+01:00 level=INFO source=server.go:392 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --ollama-engine --port 57093"
time=2025-11-18T10:52:29.613+01:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="32.0 GiB" available="14.6 GiB"
time=2025-11-18T10:52:29.613+01:00 level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

On the same system llama.cpp compiled with Vulcan(MoltenVK) support works just fine:

ggml_vulkan: 0 = AMD Radeon Pro 5500M (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = Intel(R) UHD Graphics 630 (MoltenVK) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 0 | matrix cores: none
main: setting n_parallel = 4 and kv_unified = true
build: 6955 (9f052478c) with Apple clang version 16.0.0 (clang-1600.0.26.4) for x86_64-apple-darwin23.6.0
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16

system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | REPACK = 1 |

main: binding port with default address family
main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 15
main: loading model
srv    load_model: loading model 'models/gemma-3-4b-it-qat-Q4_0.gguf'
llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon Pro 5500M) (unknown id) - 4080 MiB free
llama_model_loader: loaded meta data with 40 key-value pairs and 444 tensors from models/gemma-3-4b-it-qat-Q4_0.gguf (version GGUF V3 (latest))

Relevant log output

OLLAMA_VULKAN=1 ollama serve
time=2025-11-18T10:52:29.451+01:00 level=INFO source=routes.go:1544 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/attis/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-11-18T10:52:29.455+01:00 level=INFO source=images.go:522 msg="total blobs: 4"
time=2025-11-18T10:52:29.456+01:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-18T10:52:29.457+01:00 level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)"
time=2025-11-18T10:52:29.458+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-18T10:52:29.459+01:00 level=INFO source=server.go:392 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --ollama-engine --port 57093"
time=2025-11-18T10:52:29.613+01:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="32.0 GiB" available="14.6 GiB"
time=2025-11-18T10:52:29.613+01:00 level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

OS

macOS

GPU

AMD

CPU

Intel

Ollama version

v0.12.11

Originally created by @n-connect on GitHub (Nov 18, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13127 ### What is the issue? Thank you for the update! Notified from #2033 On an Intel macOS system with AMD 5500M, the GPU(s) not listed after running ollama-v0.12.11 via `OLLAMA_VULKAN=1 ollama serve` Log: ``` OLLAMA_VULKAN=1 ollama serve time=2025-11-18T10:52:29.451+01:00 level=INFO source=routes.go:1544 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/attis/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2025-11-18T10:52:29.455+01:00 level=INFO source=images.go:522 msg="total blobs: 4" time=2025-11-18T10:52:29.456+01:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-18T10:52:29.457+01:00 level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)" time=2025-11-18T10:52:29.458+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-18T10:52:29.459+01:00 level=INFO source=server.go:392 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --ollama-engine --port 57093" time=2025-11-18T10:52:29.613+01:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="32.0 GiB" available="14.6 GiB" time=2025-11-18T10:52:29.613+01:00 level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ``` On the same system llama.cpp compiled with Vulcan(MoltenVK) support works just fine: ```ggml_vulkan: Found 2 Vulkan devices: ggml_vulkan: 0 = AMD Radeon Pro 5500M (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none ggml_vulkan: 1 = Intel(R) UHD Graphics 630 (MoltenVK) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 0 | matrix cores: none main: setting n_parallel = 4 and kv_unified = true build: 6955 (9f052478c) with Apple clang version 16.0.0 (clang-1600.0.26.4) for x86_64-apple-darwin23.6.0 system info: n_threads = 8, n_threads_batch = 8, total_threads = 16 system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | REPACK = 1 | main: binding port with default address family main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 15 main: loading model srv load_model: loading model 'models/gemma-3-4b-it-qat-Q4_0.gguf' llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon Pro 5500M) (unknown id) - 4080 MiB free llama_model_loader: loaded meta data with 40 key-value pairs and 444 tensors from models/gemma-3-4b-it-qat-Q4_0.gguf (version GGUF V3 (latest)) ``` ### Relevant log output ```shell OLLAMA_VULKAN=1 ollama serve time=2025-11-18T10:52:29.451+01:00 level=INFO source=routes.go:1544 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/attis/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2025-11-18T10:52:29.455+01:00 level=INFO source=images.go:522 msg="total blobs: 4" time=2025-11-18T10:52:29.456+01:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-18T10:52:29.457+01:00 level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)" time=2025-11-18T10:52:29.458+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-18T10:52:29.459+01:00 level=INFO source=server.go:392 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --ollama-engine --port 57093" time=2025-11-18T10:52:29.613+01:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="32.0 GiB" available="14.6 GiB" time=2025-11-18T10:52:29.613+01:00 level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ``` ### OS macOS ### GPU AMD ### CPU Intel ### Ollama version v0.12.11
GiteaMirror added the vulkanmacosfeature request labels 2026-04-29 08:29:59 -05:00
Author
Owner

@dhiltgen commented on GitHub (Nov 21, 2025):

Our Vulkan support is currently focused on Linux and Windows.

<!-- gh-comment-id:3565007651 --> @dhiltgen commented on GitHub (Nov 21, 2025): Our Vulkan support is currently focused on Linux and Windows.
Author
Owner

@n-connect commented on GitHub (Nov 22, 2025):

Thanks, fair enough. I'll then stay with llama.cpp, while which already works with vulkan support independently if it macOS or Linux (did not try Windows).

Can you point me to some detailed instructions on the build steps of ollama in macOS? I tried to include the working llama.cpp or its building steps into ollama but did not succeed.

<!-- gh-comment-id:3565930192 --> @n-connect commented on GitHub (Nov 22, 2025): Thanks, fair enough. I'll then stay with llama.cpp, while which already works with vulkan support independently if it macOS or Linux (did not try Windows). Can you point me to some detailed instructions on the build steps of ollama in macOS? I tried to include the working llama.cpp or its building steps into ollama but did not succeed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55201