[GH-ISSUE #2315] Apple gpu support for Linux #1336

Open
opened 2026-04-12 11:10:11 -05:00 by GiteaMirror · 20 comments
Owner

Originally created by @maxiwee69 on GitHub (Feb 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2315

So maybe you know about https://asahilinux.org/, if not, it’s Fedora for m series Mac’s. But when i tried to get ollama to run on it, i got it told me WARNING: No NVIDIA GPU detected. Ollama will run in CPU-only mode, i know fixing this would only be a fix for such a small amount of people but i would highly appreciate it.

Originally created by @maxiwee69 on GitHub (Feb 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2315 So maybe you know about [https://asahilinux.org/](https://asahilinux.org/), if not, it’s Fedora for m series Mac’s. But when i tried to get ollama to run on it, i got it told me `WARNING: No NVIDIA GPU detected. Ollama will run in CPU-only mode`, i know fixing this would only be a fix for such a small amount of people but i would highly appreciate it.
Author
Owner

@MichaelFomenko commented on GitHub (Feb 2, 2024):

There are no Drivers for Apple Silicon GPU in Linux yet to do GPGPU. This have nothing to do with Ollama.

<!-- gh-comment-id:1923189363 --> @MichaelFomenko commented on GitHub (Feb 2, 2024): There are no Drivers for Apple Silicon GPU in Linux yet to do GPGPU. This have nothing to do with Ollama.
Author
Owner

@igorschlum commented on GitHub (Feb 2, 2024):

@maxiwee69 as it's not a issue in Ollama, could you please close the Issue?

<!-- gh-comment-id:1923304587 --> @igorschlum commented on GitHub (Feb 2, 2024): @maxiwee69 as it's not a issue in Ollama, could you please close the Issue?
Author
Owner

@maxiwee69 commented on GitHub (Feb 2, 2024):

But just a question, will gpu support be implemented once the drivers are supporting GPGPU?

<!-- gh-comment-id:1923515531 --> @maxiwee69 commented on GitHub (Feb 2, 2024): But just a question, will gpu support be implemented once the drivers are supporting GPGPU?
Author
Owner

@igorschlum commented on GitHub (Feb 2, 2024):

@maxiwee69 if the GPU is visible to Ollama it will be used. On mac m1, the GPU and CPU memory are shared.

<!-- gh-comment-id:1923541017 --> @igorschlum commented on GitHub (Feb 2, 2024): @maxiwee69 if the GPU is visible to Ollama it will be used. On mac m1, the GPU and CPU memory are shared.
Author
Owner

@karatekid430 commented on GitHub (Jul 27, 2024):

It sounds like a problem with ollama if it is looking for an Nvidia GPU on an Apple system (and potentially PC) when there maybe compute-capable GPUs of other brands.

<!-- gh-comment-id:2254277502 --> @karatekid430 commented on GitHub (Jul 27, 2024): It sounds like a problem with ollama if it is looking for an Nvidia GPU on an Apple system (and potentially PC) when there maybe compute-capable GPUs of other brands.
Author
Owner

@igorschlum commented on GitHub (Jul 28, 2024):

@karatekid430 I don't see any problems on macOS, except when the memory is not sufficient for the LLM to run on the GPU. I don't use Unix or Windows. I read that with Windows, you need to have the correct driver for the Nvidia card and that the driver needs to be installed manually. Some users don't keep their drivers up to date, which can cause issues.

<!-- gh-comment-id:2254487805 --> @igorschlum commented on GitHub (Jul 28, 2024): @karatekid430 I don't see any problems on macOS, except when the memory is not sufficient for the LLM to run on the GPU. I don't use Unix or Windows. I read that with Windows, you need to have the correct driver for the Nvidia card and that the driver needs to be installed manually. Some users don't keep their drivers up to date, which can cause issues.
Author
Owner

@atmosuwiryo commented on GitHub (Dec 9, 2024):

@igorschlum The GPU is visible in fastfetch, but not in ollama.

atmosuwiryo@fedora:~$ fastfetch
                   ##  **                     atmosuwiryo@fedora
                *####****.                    ------------------
                  ###,                        OS: Fedora Linux Asahi Remix 40 aarch64
               ...,/#,,,..                    Host: Apple MacBook Air (M1, 2020)
          /*,,,,,,,,*,........,,              Kernel: Linux 6.12.1-404.asahi.fc40.aarch64+16k
        ,((((((//*,,,,,,,,,......             Uptime: 2 hours, 22 mins
       ((((((((((((((%............            Packages: 2009 (rpm)
     ,(((((((((((((((@@(............          Shell: bash 5.2.26
    (((((((((((((((((@@@@/............        Display (eDP-1): 2560x1600 @ 60 Hz (as 1706x1066) in 13" [Built-in]
  ,((((((((((((((((((@@@@@&*...........       DE: KDE Plasma 6.2.4
 ((((((((((((((((((((@@@@@@@&,...........     WM: KWin (Wayland)
(((((((((((((((((((((@@@&%&@@@%,..........    WM Theme: Breeze
 /(((((((((((((((((((@@@&%%&@@@@(........     Theme: Breeze (Dark) [Qt], Breeze [GTK3]
    ,((((((((((((((((@@@&&@@&/&@@@/..         Icons: breeze-dark [Qt], breeze-dark [GTK3/4]
        /((((((((((((@@@@@@/.../&&            Font: Noto Sans (10pt) [Qt], Noto Sans (10pt) [GTK3/4]
           .(((((((((@@@@(....                Cursor: breeze (24px)
               /(((((@@#...                   Terminal: konsole 24.8.3
                  .((&,                       CPU: Apple M1 (8) @ 3.20 GHz
                                              GPU: Apple M1 (7) @ 1.28 GHz [Integrated]
                                              Memory: 7.48 GiB / 15.28 GiB (49%)
                                              Swap: 0 B / 8.00 GiB (0%)
                                              Disk (/): 18.73 GiB / 38.19 GiB (49%) - btrfs
                                              Local IP (wlp1s0f0): 192.168.0.197/24
                                              Battery (bq20z451): 17% (57 mins remaining) [Discharging]
                                              Locale: en_US.UTF-8

but ollama gave me warning:

atmosuwiryo@fedora:~$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
[sudo] password for atmosuwiryo: 
>>> Downloading Linux arm64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode.

<!-- gh-comment-id:2526660301 --> @atmosuwiryo commented on GitHub (Dec 9, 2024): @igorschlum The GPU is visible in fastfetch, but not in ollama. ```bash atmosuwiryo@fedora:~$ fastfetch ## ** atmosuwiryo@fedora *####****. ------------------ ###, OS: Fedora Linux Asahi Remix 40 aarch64 ...,/#,,,.. Host: Apple MacBook Air (M1, 2020) /*,,,,,,,,*,........,, Kernel: Linux 6.12.1-404.asahi.fc40.aarch64+16k ,((((((//*,,,,,,,,,...... Uptime: 2 hours, 22 mins ((((((((((((((%............ Packages: 2009 (rpm) ,(((((((((((((((@@(............ Shell: bash 5.2.26 (((((((((((((((((@@@@/............ Display (eDP-1): 2560x1600 @ 60 Hz (as 1706x1066) in 13" [Built-in] ,((((((((((((((((((@@@@@&*........... DE: KDE Plasma 6.2.4 ((((((((((((((((((((@@@@@@@&,........... WM: KWin (Wayland) (((((((((((((((((((((@@@&%&@@@%,.......... WM Theme: Breeze /(((((((((((((((((((@@@&%%&@@@@(........ Theme: Breeze (Dark) [Qt], Breeze [GTK3] ,((((((((((((((((@@@&&@@&/&@@@/.. Icons: breeze-dark [Qt], breeze-dark [GTK3/4] /((((((((((((@@@@@@/.../&& Font: Noto Sans (10pt) [Qt], Noto Sans (10pt) [GTK3/4] .(((((((((@@@@(.... Cursor: breeze (24px) /(((((@@#... Terminal: konsole 24.8.3 .((&, CPU: Apple M1 (8) @ 3.20 GHz GPU: Apple M1 (7) @ 1.28 GHz [Integrated] Memory: 7.48 GiB / 15.28 GiB (49%) Swap: 0 B / 8.00 GiB (0%) Disk (/): 18.73 GiB / 38.19 GiB (49%) - btrfs Local IP (wlp1s0f0): 192.168.0.197/24 Battery (bq20z451): 17% (57 mins remaining) [Discharging] Locale: en_US.UTF-8 ``` but ollama gave me warning: ```bash atmosuwiryo@fedora:~$ curl -fsSL https://ollama.com/install.sh | sh >>> Installing ollama to /usr/local [sudo] password for atmosuwiryo: >>> Downloading Linux arm64 bundle ######################################################################## 100.0% >>> Creating ollama user... >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service. >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode. ```
Author
Owner

@tjex commented on GitHub (Dec 30, 2024):

There is no reason on a hardware level that this shouldn't work. It's the same machine. So its either an Asahi problem or an Ollama problem. At some point, somewhere, the internal GPU is not being advertised.

Running ollama on my mac partition is blazzingly fast (32core GPU), but on the asahi partition, it's an agonizing word ... by ... word affaire :(

<!-- gh-comment-id:2565047271 --> @tjex commented on GitHub (Dec 30, 2024): There is no reason on a hardware level that this shouldn't work. It's the same machine. So its either an Asahi problem or an Ollama problem. At some point, somewhere, the internal GPU is not being advertised. Running ollama on my mac partition is blazzingly fast (32core GPU), but on the asahi partition, it's an agonizing word ... by ... word affaire :(
Author
Owner

@tjex commented on GitHub (Dec 30, 2024):

@MichaelFomenko Asahi have created open source gpu drivers 2 years ago: https://asahilinux.org/2022/12/gpu-drivers-now-in-asahi-linux/

<!-- gh-comment-id:2565048721 --> @tjex commented on GitHub (Dec 30, 2024): @MichaelFomenko Asahi have created open source gpu drivers 2 years ago: https://asahilinux.org/2022/12/gpu-drivers-now-in-asahi-linux/
Author
Owner

@chrootchad commented on GitHub (Feb 2, 2025):

Just when I thought I was going to be able to do something useful with my old m1 mini smashed into this issue... first via a docker install and then after a native install 😭

<!-- gh-comment-id:2629225632 --> @chrootchad commented on GitHub (Feb 2, 2025): Just when I thought I was going to be able to do something useful with my old m1 mini smashed into this issue... first via a docker install and then after a native install 😭
Author
Owner

@igorschlum commented on GitHub (Feb 2, 2025):

@chrootchad you can use your mac mini M1 with Ollama, it's working well if you are using MacOS and not Docker.

<!-- gh-comment-id:2629438563 --> @igorschlum commented on GitHub (Feb 2, 2025): @chrootchad you can use your mac mini M1 with Ollama, it's working well if you are using MacOS and not Docker.
Author
Owner

@chrootchad commented on GitHub (Feb 2, 2025):

For anyone else in the same boat (likely repurposing an old Mac M1/M2 as a dev/home lab server), I found a solution - ramalama and it’s/podman's GPU support on Asahi Linux.

See comments here to get DeepSeek-R1 with GPU support on Asahi Linux (Ubuntu Asahi 24.04 server in my case) working:

https://github.com/containers/ramalama/issues/616#issuecomment-2629417564

<!-- gh-comment-id:2629564015 --> @chrootchad commented on GitHub (Feb 2, 2025): For anyone else in the same boat (likely repurposing an old Mac M1/M2 as a dev/home lab server), I found a solution - ramalama and it’s/podman's GPU support on Asahi Linux. See comments here to get DeepSeek-R1 with GPU support on Asahi Linux (Ubuntu Asahi 24.04 server in my case) working: https://github.com/containers/ramalama/issues/616#issuecomment-2629417564
Author
Owner

@TheButlah commented on GitHub (Jun 13, 2025):

M2 Pro 16 inch on asahi: vulkaninfo.txt
This issue should be reopened. Asahi supports hardware accelerated GPGPU and vulkan.

<!-- gh-comment-id:2971792222 --> @TheButlah commented on GitHub (Jun 13, 2025): M2 Pro 16 inch on asahi: [vulkaninfo.txt](https://github.com/user-attachments/files/20733266/vulkaninfo.txt) This issue should be reopened. Asahi supports hardware accelerated GPGPU and vulkan.
Author
Owner

@muzzah commented on GitHub (Aug 5, 2025):

Im able to run ollama on asahi linux (Mac Studio with 64 GB of ram).

sudo ramalama --debug --engine docker serve -c 0 -n agi -p 1234 --runtime-args="--flash-attn --mlock" -d gemma3:27b

Which converts to the following docker command you can also run

docker run --rm --label ai.ramalama.model=hf://ggml-org/gemma-3-27b-it-GGUF --label ai.ramalama.engine=docker --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=1234 --label ai.ramalama.command=serve --device /dev/dri -e ASAHI_VISIBLE_DEVICES=1 -p 1234:1234 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges -d --label ai.ramalama --name agi --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-27b-it-GGUF/blobs/sha256-edc9aff4d811a285b9157618130b08688b0768d94ee5355b02dc0cb713012e15,destination=/mnt/models/gemma-3-27b-it-Q4_K_M.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-27b-it-GGUF/blobs/sha256-54cb61c842fe49ac3c89bc1a614a2778163eb49f3dec2b90ff688b4c0392cb48,destination=/mnt/models/mmproj-model-f16.gguf,ro quay.io/ramalama/asahi:latest llama-server --port 1234 --model /mnt/models/gemma-3-27b-it-Q4_K_M.gguf --no-warmup --mmproj /mnt/models/mmproj-model-f16.gguf --log-colors --alias ggml-org/gemma-3-27b-it-GGUF --ctx-size 0 --temp 0.8 --cache-reuse 256 --flash-attn --mlock -v -ngl 999 --threads 5 --host 0.0.0.0

However I noticed something weird in the output

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Apple M1 Max (G13C C0) (Honeykrisp) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
build: 1 (3f4fc97) with cc (GCC) 15.1.1 20250521 (Red Hat 15.1.1-2) for aarch64-redhat-linux
system info: n_threads = 5, n_threads_batch = 5, total_threads = 10

Shared memory here is showing 32GB however the Mac Studio I have has 64GB of memory. On Mac OS I can utilise all of this for models with apps like LM Studio.

Is this a linux config issue? Is there another way to give ollama access to all of the memory?

I think this is preventing me from loading some models. Without adding --flash-attn the 27b gemma model doesnt load.

<!-- gh-comment-id:3154768866 --> @muzzah commented on GitHub (Aug 5, 2025): Im able to run ollama on asahi linux (Mac Studio with 64 GB of ram). `sudo ramalama --debug --engine docker serve -c 0 -n agi -p 1234 --runtime-args="--flash-attn --mlock" -d gemma3:27b` Which converts to the following docker command you can also run ` docker run --rm --label ai.ramalama.model=hf://ggml-org/gemma-3-27b-it-GGUF --label ai.ramalama.engine=docker --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=1234 --label ai.ramalama.command=serve --device /dev/dri -e ASAHI_VISIBLE_DEVICES=1 -p 1234:1234 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges -d --label ai.ramalama --name agi --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-27b-it-GGUF/blobs/sha256-edc9aff4d811a285b9157618130b08688b0768d94ee5355b02dc0cb713012e15,destination=/mnt/models/gemma-3-27b-it-Q4_K_M.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-27b-it-GGUF/blobs/sha256-54cb61c842fe49ac3c89bc1a614a2778163eb49f3dec2b90ff688b4c0392cb48,destination=/mnt/models/mmproj-model-f16.gguf,ro quay.io/ramalama/asahi:latest llama-server --port 1234 --model /mnt/models/gemma-3-27b-it-Q4_K_M.gguf --no-warmup --mmproj /mnt/models/mmproj-model-f16.gguf --log-colors --alias ggml-org/gemma-3-27b-it-GGUF --ctx-size 0 --temp 0.8 --cache-reuse 256 --flash-attn --mlock -v -ngl 999 --threads 5 --host 0.0.0.0 ` However I noticed something weird in the output ``` ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Apple M1 Max (G13C C0) (Honeykrisp) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none build: 1 (3f4fc97) with cc (GCC) 15.1.1 20250521 (Red Hat 15.1.1-2) for aarch64-redhat-linux system info: n_threads = 5, n_threads_batch = 5, total_threads = 10 ``` Shared memory here is showing 32GB however the Mac Studio I have has 64GB of memory. On Mac OS I can utilise all of this for models with apps like LM Studio. Is this a linux config issue? Is there another way to give ollama access to all of the memory? I think this is preventing me from loading some models. Without adding --flash-attn the 27b gemma model doesnt load.
Author
Owner

@muzzah commented on GitHub (Aug 5, 2025):

Its weird, my kernel parameters are already quite large

➜  ~ ipcs -lm                                         

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18446744073709551600
min seg size (bytes) = 1
<!-- gh-comment-id:3154971061 --> @muzzah commented on GitHub (Aug 5, 2025): Its weird, my kernel parameters are already quite large ``` ➜ ~ ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 18014398509465599 max total shared memory (kbytes) = 18446744073709551600 min seg size (bytes) = 1 ```
Author
Owner

@TheButlah commented on GitHub (Aug 5, 2025):

@maxiwee69 can you reopen the issue?

<!-- gh-comment-id:3156560831 --> @TheButlah commented on GitHub (Aug 5, 2025): @maxiwee69 can you reopen the issue?
Author
Owner

@muzzah commented on GitHub (Aug 5, 2025):

I was able to solve this by setting an environment variable which seems to play with the vulkan memory configuration allowing me to increase the shared memory size. So my problem is solved.

<!-- gh-comment-id:3156577394 --> @muzzah commented on GitHub (Aug 5, 2025): I was able to solve this by setting an environment variable which seems to play with the vulkan memory configuration allowing me to increase the shared memory size. So my problem is solved.
Author
Owner

@feoh commented on GitHub (Dec 9, 2025):

@muzzah Could you please leak the name of that variable and what you set it to? I'd love to get Ollama working on Asahi with GPU support?

<!-- gh-comment-id:3630283491 --> @feoh commented on GitHub (Dec 9, 2025): @muzzah Could you please leak the name of that variable and what you set it to? I'd love to get Ollama working on Asahi with GPU support?
Author
Owner

@muzzah commented on GitHub (Dec 9, 2025):

@feoh See https://github.com/ggml-org/llama.cpp/issues/10982#issuecomment-3155959678

<!-- gh-comment-id:3631055216 --> @muzzah commented on GitHub (Dec 9, 2025): @feoh See https://github.com/ggml-org/llama.cpp/issues/10982#issuecomment-3155959678
Author
Owner

@TheButlah commented on GitHub (Dec 28, 2025):

I can confirm its working with ramalama, but it would be great for ollama to work.

<!-- gh-comment-id:3694453113 --> @TheButlah commented on GitHub (Dec 28, 2025): I can confirm its working with ramalama, but it would be great for *ollama* to work.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1336