[GH-ISSUE #5954] Detecting macOS GPUs when using Podman with GPU passthrough #3720

Open
opened 2026-04-12 14:31:52 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @ThomasVitale on GitHub (Jul 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5954

Podman provides support for making the local GPU on a macOS computer available from within a container. This article describes the setup for it: https://blog.podman.io/2024/07/podman-and-libkrun/.

% podman machine ssh ls -l /dev/dri
total 0
drwxr-xr-x. 2 root root         80 Jul 25 17:12 by-path
crw-rw----. 1 root video  226,   0 Jul 25 17:12 card0
crw-rw-rw-. 1 root render 226, 128 Jul 25 17:12 renderD128

When I run an Ollama container, it doesn't seem it can recognise the GPU. Is there some option I can use to make that work or is some new implementation needed within the Ollama project to support that?

docker run -it --rm -p 11434:11434 --device /dev/dri -e OLLAMA_DEBUG=true ollama/ollama

Logs:

2024/07/25 15:56:42 routes.go:1100: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-25T15:56:42.248Z level=INFO source=images.go:784 msg="total blobs: 0"
time=2024-07-25T15:56:42.248Z level=INFO source=images.go:791 msg="total unused blobs removed: 0"
time=2024-07-25T15:56:42.248Z level=INFO source=routes.go:1147 msg="Listening on [::]:11434 (version 0.2.8)"
time=2024-07-25T15:56:42.249Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2409035362/runners
time=2024-07-25T15:56:42.249Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/arm64/cpu/bin/ollama_llama_server.gz
time=2024-07-25T15:56:42.249Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/arm64/cuda_v11/bin/libcublas.so.11.gz
time=2024-07-25T15:56:42.249Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/arm64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-07-25T15:56:42.249Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/arm64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-07-25T15:56:42.249Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/arm64/cuda_v11/bin/ollama_llama_server.gz
time=2024-07-25T15:56:47.047Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2409035362/runners/cpu/ollama_llama_server
time=2024-07-25T15:56:47.047Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2409035362/runners/cuda_v11/ollama_llama_server
time=2024-07-25T15:56:47.047Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]"
time=2024-07-25T15:56:47.047Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-07-25T15:56:47.047Z level=DEBUG source=sched.go:102 msg="starting llm scheduler"
time=2024-07-25T15:56:47.047Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-07-25T15:56:47.048Z level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-07-25T15:56:47.048Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
time=2024-07-25T15:56:47.049Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcuda.so** /usr/local/nvidia/lib64/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-07-25T15:56:47.049Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[]
time=2024-07-25T15:56:47.049Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so*
time=2024-07-25T15:56:47.049Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcudart.so** /usr/local/nvidia/lib64/libcudart.so** /tmp/ollama2409035362/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-07-25T15:56:47.049Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama2409035362/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-07-25T15:56:47.057Z level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama2409035362/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-07-25T15:56:47.057Z level=DEBUG source=amd_linux.go:356 msg="amdgpu driver not detected /sys/module/amdgpu"
time=2024-07-25T15:56:47.057Z level=INFO source=gpu.go:346 msg="no compatible GPUs were discovered"
time=2024-07-25T15:56:47.057Z level=INFO source=types.go:105 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="4.4 GiB" available="4.0 GiB"
Originally created by @ThomasVitale on GitHub (Jul 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5954 Podman provides support for making the local GPU on a macOS computer available from within a container. This article describes the setup for it: https://blog.podman.io/2024/07/podman-and-libkrun/. ```shell % podman machine ssh ls -l /dev/dri total 0 drwxr-xr-x. 2 root root 80 Jul 25 17:12 by-path crw-rw----. 1 root video 226, 0 Jul 25 17:12 card0 crw-rw-rw-. 1 root render 226, 128 Jul 25 17:12 renderD128 ``` When I run an Ollama container, it doesn't seem it can recognise the GPU. Is there some option I can use to make that work or is some new implementation needed within the Ollama project to support that? ```shell docker run -it --rm -p 11434:11434 --device /dev/dri -e OLLAMA_DEBUG=true ollama/ollama ``` Logs: ``` 2024/07/25 15:56:42 routes.go:1100: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-07-25T15:56:42.248Z level=INFO source=images.go:784 msg="total blobs: 0" time=2024-07-25T15:56:42.248Z level=INFO source=images.go:791 msg="total unused blobs removed: 0" time=2024-07-25T15:56:42.248Z level=INFO source=routes.go:1147 msg="Listening on [::]:11434 (version 0.2.8)" time=2024-07-25T15:56:42.249Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2409035362/runners time=2024-07-25T15:56:42.249Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/arm64/cpu/bin/ollama_llama_server.gz time=2024-07-25T15:56:42.249Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/arm64/cuda_v11/bin/libcublas.so.11.gz time=2024-07-25T15:56:42.249Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/arm64/cuda_v11/bin/libcublasLt.so.11.gz time=2024-07-25T15:56:42.249Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/arm64/cuda_v11/bin/libcudart.so.11.0.gz time=2024-07-25T15:56:42.249Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/arm64/cuda_v11/bin/ollama_llama_server.gz time=2024-07-25T15:56:47.047Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2409035362/runners/cpu/ollama_llama_server time=2024-07-25T15:56:47.047Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2409035362/runners/cuda_v11/ollama_llama_server time=2024-07-25T15:56:47.047Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]" time=2024-07-25T15:56:47.047Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-07-25T15:56:47.047Z level=DEBUG source=sched.go:102 msg="starting llm scheduler" time=2024-07-25T15:56:47.047Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs" time=2024-07-25T15:56:47.048Z level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA" time=2024-07-25T15:56:47.048Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* time=2024-07-25T15:56:47.049Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcuda.so** /usr/local/nvidia/lib64/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-07-25T15:56:47.049Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[] time=2024-07-25T15:56:47.049Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so* time=2024-07-25T15:56:47.049Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcudart.so** /usr/local/nvidia/lib64/libcudart.so** /tmp/ollama2409035362/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-07-25T15:56:47.049Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama2409035362/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-07-25T15:56:47.057Z level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama2409035362/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-07-25T15:56:47.057Z level=DEBUG source=amd_linux.go:356 msg="amdgpu driver not detected /sys/module/amdgpu" time=2024-07-25T15:56:47.057Z level=INFO source=gpu.go:346 msg="no compatible GPUs were discovered" time=2024-07-25T15:56:47.057Z level=INFO source=types.go:105 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="4.4 GiB" available="4.0 GiB" ```
GiteaMirror added the feature request label 2026-04-12 14:31:52 -05:00
Author
Owner

@edeandrea commented on GitHub (Dec 10, 2024):

Any progress on this?

<!-- gh-comment-id:2532246535 --> @edeandrea commented on GitHub (Dec 10, 2024): Any progress on this?
Author
Owner

@ancoleman commented on GitHub (Feb 14, 2025):

I'm very curious about this to, I trying running ollama with Podman using Libkrun as the provider for podman machine, and it doesn't detect any gpus.

<!-- gh-comment-id:2660006258 --> @ancoleman commented on GitHub (Feb 14, 2025): I'm very curious about this to, I trying running ollama with Podman using Libkrun as the provider for podman machine, and it doesn't detect any gpus.
Author
Owner

@step21 commented on GitHub (Feb 25, 2025):

You cannot just run a normal container. It needs to have a patched mesa, as described here: https://podman-desktop.io/docs/podman/gpu (switch to macos in the docs) This could change in the future when these changes are upstream, but until such time that is not avoidable.

<!-- gh-comment-id:2681682749 --> @step21 commented on GitHub (Feb 25, 2025): You cannot just run a normal container. It needs to have a patched mesa, as described here: https://podman-desktop.io/docs/podman/gpu (switch to macos in the docs) This could change in the future when these changes are upstream, but until such time that is not avoidable.
Author
Owner

@ancoleman commented on GitHub (Feb 25, 2025):

@step21 the example shows it's use with Mesa on Fedora. Anyway to build this in a similar manner for Ubuntu or Debian?

<!-- gh-comment-id:2682970422 --> @ancoleman commented on GitHub (Feb 25, 2025): @step21 the example shows it's use with Mesa on Fedora. Anyway to build this in a similar manner for Ubuntu or Debian?
Author
Owner

@step21 commented on GitHub (Feb 25, 2025):

You could probably either switch the docker base image to fedora, or install that mesa on ubuntu. I don't really have time to look up the details now. Rather than detecting, it's probably reasonable to just use a special docker-compose file or docker file, similar to how nvidia gpus and amd gpus also have their own docker-compose files on open-webui.
You could probably use this dockerfile as a base: https://github.com/sozercan/aikit/blob/main/Dockerfile.base-applesilicon
Or just install the patched mesa into a container running ubuntu or whatever if you can find the source. It should be linked here somewhere (https://copr.fedorainfracloud.org/coprs/slp/mesa-krunkit/) but I couldn't find it, maybe because part of fedora infra (cgit) seems to be down after a ddos in January.

Update: So you can clone it as explained here: https://github.com/fedora-copr/copr/issues/3591#issuecomment-2683411845
Use https://copr-dist-git.fedorainfracloud.org/cgit/slp/mesa-krunkit/mesa.git but replace cgit with git such as https://copr-dist-git.fedorainfracloud.org/git/slp/mesa-krunkit/mesa.git and clone it. Then build or package for Ubuntu or whatever you like.

Update update: As ollama is using a multi stage build - it might be a bit more complicated. Like it copies also parts built on fedora images, into the final ubuntu image. You could do the same with patched mesa, but not sure if these are compatible.

<!-- gh-comment-id:2683385245 --> @step21 commented on GitHub (Feb 25, 2025): You could probably either switch the docker base image to fedora, or install that mesa on ubuntu. I don't really have time to look up the details now. Rather than detecting, it's probably reasonable to just use a special docker-compose file or docker file, similar to how nvidia gpus and amd gpus also have their own docker-compose files on open-webui. You could probably use this dockerfile as a base: https://github.com/sozercan/aikit/blob/main/Dockerfile.base-applesilicon Or just install the patched mesa into a container running ubuntu or whatever if you can find the source. It should be linked here somewhere (https://copr.fedorainfracloud.org/coprs/slp/mesa-krunkit/) but I couldn't find it, maybe because part of fedora infra (cgit) seems to be down after a ddos in January. Update: So you can clone it as explained here: https://github.com/fedora-copr/copr/issues/3591#issuecomment-2683411845 Use https://copr-dist-git.fedorainfracloud.org/cgit/slp/mesa-krunkit/mesa.git but replace cgit with git such as https://copr-dist-git.fedorainfracloud.org/git/slp/mesa-krunkit/mesa.git and clone it. Then build or package for Ubuntu or whatever you like. Update update: As ollama is using a multi stage build - it might be a bit more complicated. Like it copies also parts built on fedora images, into the final ubuntu image. You could do the same with patched mesa, but not sure if these are compatible.
Author
Owner

@ThomasVitale commented on GitHub (Jul 30, 2025):

As additional context, since I opened this issue, GPU support in Podman graduated: https://podman-desktop.io/docs/podman/gpu

It works correctly when running models from Podman AI Lab or RamaLama as they detect the GPU from within the containers.

<!-- gh-comment-id:3135782194 --> @ThomasVitale commented on GitHub (Jul 30, 2025): As additional context, since I opened this issue, GPU support in Podman graduated: https://podman-desktop.io/docs/podman/gpu It works correctly when running models from [Podman AI Lab](https://podman-desktop.io/docs/ai-lab) or [RamaLama](https://ramalama.ai) as they detect the GPU from within the containers.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3720