[GH-ISSUE #4242] Ollama not using GPU #64682

New Issue

GiteaMirror · 2026-05-03T18:29:54-05:00

GiteaMirror commented

2026-05-03 18:29:54 -05:00

Originally created by @ziqizh on GitHub (May 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4242

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I am running a llama3 8b Q4, but it does not run on GPU. Here is the system information:

GPU: 10GB VRAM RTX 3080
OS: Ubuntu 22.04
CUDA version (from nvcc): 11.8
NVIDIA driver version: 545.23.06

I tried the installation script and Docker (sudo docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama) and observed no GPU usage.

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.1.34

Originally created by @ziqizh on GitHub (May 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4242 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I am running a llama3 8b Q4, but it does not run on GPU. Here is the system information: GPU: 10GB VRAM RTX 3080 OS: Ubuntu 22.04 CUDA version (from nvcc): 11.8 NVIDIA driver version: 545.23.06 I tried the installation script and Docker (`sudo docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama`) and observed no GPU usage. ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.1.34

GiteaMirror added the docker bug nvidia labels 2026-05-03 18:29:55 -05:00

GiteaMirror closed this issue

2026-05-03 18:30:04 -05:00

GiteaMirror commented

2026-05-03 18:30:08 -05:00

@manojmanivannan commented on GitHub (May 8, 2024):

Something similar on my side, I have RTX 4090, running ollama on docker does not recognize my nvidia GPU

~/A/ollama $ docker run -d -e CUDA_VISIBLE_DEVICES=0 --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
5eea5106ec6d75e36a01e0c2ac88f3877f1899e99fb4a23ffe54b0aaba1b6a66
~/A/ollama $ docker logs ollama -f                                                                                       ⌚ 12:49:18
time=2024-05-08T11:49:18.380Z level=INFO source=images.go:828 msg="total blobs: 5"
time=2024-05-08T11:49:18.381Z level=INFO source=images.go:835 msg="total unused blobs removed: 0"
time=2024-05-08T11:49:18.381Z level=INFO source=routes.go:1071 msg="Listening on [::]:11434 (version 0.1.33)"
time=2024-05-08T11:49:18.381Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2592519153/runners
time=2024-05-08T11:49:20.151Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60002 cpu cpu_avx cpu_avx2]"
time=2024-05-08T11:49:20.151Z level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-05-08T11:49:20.315Z level=INFO source=gpu.go:101 msg="detected GPUs" library=/tmp/ollama2592519153/runners/cuda_v11/libcudart.so.11.0 count=1
time=2024-05-08T11:49:20.315Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-08T11:49:20.427Z level=WARN source=amd_linux.go:49 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-05-08T11:49:20.427Z level=WARN source=amd_linux.go:143 msg="amdgpu too old gfx000" gpu=0
time=2024-05-08T11:49:20.427Z level=INFO source=amd_linux.go:286 msg="no compatible amdgpu devices detected"

output of nvidia-smi

~/A/ollama $ nvidia-smi                                                                                                  ⌚ 12:52:23
Wed May  8 12:52:25 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
|  0%   41C    P5             28W /  450W |     146MiB /  24564MiB |     70%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2904      G   /usr/lib/xorg/Xorg                            133MiB |
+-----------------------------------------------------------------------------------------+
~/A/ollama $

OS: Ubuntu 22.04
CPU: AMD
GPU NVIDIA 4090

@manojmanivannan commented on GitHub (May 8, 2024): Something similar on my side, I have RTX 4090, running ollama on docker does not recognize my nvidia GPU ```bash ~/A/ollama $ docker run -d -e CUDA_VISIBLE_DEVICES=0 --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama 5eea5106ec6d75e36a01e0c2ac88f3877f1899e99fb4a23ffe54b0aaba1b6a66 ~/A/ollama $ docker logs ollama -f ⌚ 12:49:18 time=2024-05-08T11:49:18.380Z level=INFO source=images.go:828 msg="total blobs: 5" time=2024-05-08T11:49:18.381Z level=INFO source=images.go:835 msg="total unused blobs removed: 0" time=2024-05-08T11:49:18.381Z level=INFO source=routes.go:1071 msg="Listening on [::]:11434 (version 0.1.33)" time=2024-05-08T11:49:18.381Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2592519153/runners time=2024-05-08T11:49:20.151Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60002 cpu cpu_avx cpu_avx2]" time=2024-05-08T11:49:20.151Z level=INFO source=gpu.go:96 msg="Detecting GPUs" time=2024-05-08T11:49:20.315Z level=INFO source=gpu.go:101 msg="detected GPUs" library=/tmp/ollama2592519153/runners/cuda_v11/libcudart.so.11.0 count=1 time=2024-05-08T11:49:20.315Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-05-08T11:49:20.427Z level=WARN source=amd_linux.go:49 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-05-08T11:49:20.427Z level=WARN source=amd_linux.go:143 msg="amdgpu too old gfx000" gpu=0 time=2024-05-08T11:49:20.427Z level=INFO source=amd_linux.go:286 msg="no compatible amdgpu devices detected" ``` output of `nvidia-smi` ```bash ~/A/ollama $ nvidia-smi ⌚ 12:52:23 Wed May 8 12:52:25 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 On | Off | | 0% 41C P5 28W / 450W | 146MiB / 24564MiB | 70% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2904 G /usr/lib/xorg/Xorg 133MiB | +-----------------------------------------------------------------------------------------+ ~/A/ollama $ ``` OS: Ubuntu 22.04 CPU: AMD GPU NVIDIA 4090

GiteaMirror commented

2026-05-03 18:30:11 -05:00

@lujx1024 commented on GitHub (May 8, 2024):

I have the same problem with Ubuntu 20.04, with the Nvidia RTX 2060 12GB GPU installed.

I've double-checked that I installed the Driver Docker engine and Nvidia toolkit correctly.

I looked up the TroubleShooting page and found nothing, so if you know about the situation, please tell me what's going on. I would be very appreciative.

@lujx1024 commented on GitHub (May 8, 2024): I have the same problem with Ubuntu 20.04, with the Nvidia RTX 2060 12GB GPU installed. I've double-checked that I installed the Driver Docker engine and Nvidia toolkit correctly. I looked up the TroubleShooting page and found nothing, so if you know about the situation, please tell me what's going on. I would be very appreciative.

GiteaMirror commented

2026-05-03 18:30:15 -05:00

@dhiltgen commented on GitHub (May 8, 2024):

We've adjusted the GPU discovery logic in 0.1.34 to use a different nvidia library - the Driver API, which should hopefully make it more reliable. Can you all please try pulling the latest ollama/ollama image (or use the explicit tag ollama/ollama:0.1.34) and see if it discovered your GPUs correctly now? If not, please run the container with -e OLLAMA_DEBUG=1 and share the log so we can see what may be the problem.

@dhiltgen commented on GitHub (May 8, 2024): We've adjusted the GPU discovery logic in 0.1.34 to use a different nvidia library - the Driver API, which should hopefully make it more reliable. Can you all please try pulling the latest `ollama/ollama` image (or use the explicit tag `ollama/ollama:0.1.34`) and see if it discovered your GPUs correctly now? If not, please run the container with `-e OLLAMA_DEBUG=1` and share the log so we can see what may be the problem.

GiteaMirror commented

2026-05-03 18:30:16 -05:00

@ziqizh commented on GitHub (May 8, 2024):

Debug log from version 0.1.34:

time=2024-05-08T21:50:55.140Z level=INFO source=images.go:904 msg="total unused blobs removed: 0"                                         [34/574]
time=2024-05-08T21:50:55.141Z level=INFO source=routes.go:1034 msg="Listening on [::]:11434 (version 0.1.34)"
time=2024-05-08T21:50:55.141Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2923388324/runners                   time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_se
rver.gz                                                                                                                                           time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz                                                                                                                                         time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.
11.0.gz                                                                                                                                           time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_
server.gz
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.tx
t.gz
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_t.gz                                                                                                                                      
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_
llama_server.gz                                                                                                                                   time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu
time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu_avx
time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu_avx2         time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cuda_v11         time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/rocm_v60002      time=2024-05-08T21:50:57.928Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"             time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-05-08T21:50:57.928Z level=DEBUG source=sched.go:85 msg="starting llm scheduler"                                                         time=2024-05-08T21:50:57.928Z level=INFO source=gpu.go:122 msg="Detecting GPUs"
time=2024-05-08T21:50:57.928Z level=DEBUG source=gpu.go:255 msg="Searching for GPU library" name=libcuda.so*
time=2024-05-08T21:50:57.928Z level=DEBUG source=gpu.go:274 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcuda.so** /usr/local/nvidia/lib64/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /us
r/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /us
r/local/lib*/libcuda.so*]"
time=2024-05-08T21:50:57.929Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06]
cuInit err: 999
time=2024-05-08T21:50:57.929Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06]
cuInit err: 999
time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:336 msg="Unable to load nvcuda" library=/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06 error="nvcuda init failure: 999"
time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:255 msg="Searching for GPU library" name=libcudart.so*
time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:274 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcudart.so** /usr/local/nvidia/lib64/libcudart.so** /tmp/ollama2923388324/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/tmp/ollama2923388324/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 999
time=2024-05-08T21:50:57.934Z level=DEBUG source=gpu.go:319 msg="Unable to load cudart" library=/tmp/ollama2923388324/runners/cuda_v11/libcudart.so.11.0 error="cudart init failure: 999"
time=2024-05-08T21:50:57.934Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-08T21:50:57.934Z level=DEBUG source=amd_linux.go:297 msg="amdgpu driver not detected /sys/module/amdgpu"

btw I am able to run llama.cpp and LM Studio on GPU, so CUDA should be working.

@ziqizh commented on GitHub (May 8, 2024): Debug log from version `0.1.34`: ``` time=2024-05-08T21:50:55.140Z level=INFO source=images.go:904 msg="total unused blobs removed: 0" [34/574] time=2024-05-08T21:50:55.141Z level=INFO source=routes.go:1034 msg="Listening on [::]:11434 (version 0.1.34)" time=2024-05-08T21:50:55.141Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2923388324/runners time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_se rver.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so. 11.0.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_ server.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.tx t.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_t.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_ llama_server.gz time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu_avx time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu_avx2 time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cuda_v11 time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/rocm_v60002 time=2024-05-08T21:50:57.928Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-05-08T21:50:57.928Z level=DEBUG source=sched.go:85 msg="starting llm scheduler" time=2024-05-08T21:50:57.928Z level=INFO source=gpu.go:122 msg="Detecting GPUs" time=2024-05-08T21:50:57.928Z level=DEBUG source=gpu.go:255 msg="Searching for GPU library" name=libcuda.so* time=2024-05-08T21:50:57.928Z level=DEBUG source=gpu.go:274 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcuda.so** /usr/local/nvidia/lib64/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /us r/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /us r/local/lib*/libcuda.so*]" time=2024-05-08T21:50:57.929Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06] cuInit err: 999 time=2024-05-08T21:50:57.929Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06] cuInit err: 999 time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:336 msg="Unable to load nvcuda" library=/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06 error="nvcuda init failure: 999" time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:255 msg="Searching for GPU library" name=libcudart.so* time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:274 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcudart.so** /usr/local/nvidia/lib64/libcudart.so** /tmp/ollama2923388324/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/tmp/ollama2923388324/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 999 time=2024-05-08T21:50:57.934Z level=DEBUG source=gpu.go:319 msg="Unable to load cudart" library=/tmp/ollama2923388324/runners/cuda_v11/libcudart.so.11.0 error="cudart init failure: 999" time=2024-05-08T21:50:57.934Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-05-08T21:50:57.934Z level=DEBUG source=amd_linux.go:297 msg="amdgpu driver not detected /sys/module/amdgpu" ``` btw I am able to run llama.cpp and LM Studio on GPU, so CUDA should be working.

GiteaMirror commented

2026-05-03 18:30:17 -05:00

@dhiltgen commented on GitHub (May 8, 2024):

@ziqizh What I've heard is this typically points to a driver bug, but you mention other tools are working properly inside a container on the same host. Can you try any/all of the following and see if it resolves the problem, or gives us more details on the failure?

Does a reboot change the failure mode?
Can you upgrade to the latest driver? (if already running the latest driver, re-install the driver)
Set CUDA_ERROR_LEVEL=50 and try again to see if we get more diagnostic logs
Check dmesg for any errors sudo dmesg | grep -i nvrm and sudo dmesg | grep -i nvidia
Is the uvm driver not loaded? sudo nvidia-modprobe -u
Try reloading the nvidia_uvm driver - sudo rmmod nvidia_uvm then sudo modprobe nvidia_uvm

@dhiltgen commented on GitHub (May 8, 2024): @ziqizh What I've heard is this typically points to a driver bug, but you mention other tools are working properly inside a container on the same host. Can you try any/all of the following and see if it resolves the problem, or gives us more details on the failure? - Does a reboot change the failure mode? - Can you upgrade to the latest driver? (if already running the latest driver, re-install the driver) - Set `CUDA_ERROR_LEVEL=50` and try again to see if we get more diagnostic logs - Check dmesg for any errors `sudo dmesg | grep -i nvrm` and `sudo dmesg | grep -i nvidia` - Is the uvm driver not loaded? `sudo nvidia-modprobe -u` - Try reloading the nvidia_uvm driver - `sudo rmmod nvidia_uvm` then `sudo modprobe nvidia_uvm`

GiteaMirror commented

2026-05-03 18:30:18 -05:00

@lujx1024 commented on GitHub (May 9, 2024):

@dhiltgen It works.
I removed the old docker image and pulled the latest image by explicit tag ollama/ollama:0.1.34. it still cannot discover the GPU at first, but when I reboot the machine and restart the container, the problem is solved. Thanks for your effort, appreciated.

Here is the Monitor data, GPU is discovered:

@lujx1024 commented on GitHub (May 9, 2024): @dhiltgen It works. I removed the old docker image and pulled the latest image by explicit tag `ollama/ollama:0.1.34`. it still cannot discover the GPU at first, but when I reboot the machine and restart the container, the problem is solved. Thanks for your effort, appreciated. Here is the Monitor data, GPU is discovered: <img width="1920" alt="ollama_running_003" src="https://github.com/ollama/ollama/assets/36251986/9ea24b23-2f74-4804-bdba-3e87c5611102">

GiteaMirror commented

2026-05-03 18:30:18 -05:00

@farheinheigt commented on GitHub (May 9, 2024):

great, the update works well for me, GPU is working perfectly now thanks

@farheinheigt commented on GitHub (May 9, 2024): great, the update works well for me, GPU is working perfectly now thanks

GiteaMirror commented

2026-05-03 18:30:18 -05:00

@dhiltgen commented on GitHub (May 9, 2024):

That's great to hear!

I'll update our doc's to include these workaround/troubleshooting steps for others that may hit these problems as well.

@dhiltgen commented on GitHub (May 9, 2024): That's great to hear! I'll update our doc's to include these workaround/troubleshooting steps for others that may hit these problems as well.

GiteaMirror commented

2026-05-03 18:30:19 -05:00

@manojmanivannan commented on GitHub (May 15, 2024):

I can confirm this as well, using the latest ollama image solves the issue for me

~ $ docker run -d --gpus=all -v ollama:/root/.ollama -p 11435:11434 --name ollama ollama/ollama                                                                                                                                                                  ⌚ 6:07:27
61671828322b09f4a48da9635ffec3e5fee52d367385d8a3353cb0880c3a7a7a
~ $ docker logs ollama -f                                                                                                                                                                                                                                        ⌚ 6:07:30
2024/05/15 17:07:30 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-05-15T17:07:30.376Z level=INFO source=images.go:704 msg="total blobs: 0"
time=2024-05-15T17:07:30.376Z level=INFO source=images.go:711 msg="total unused blobs removed: 0"
time=2024-05-15T17:07:30.376Z level=INFO source=routes.go:1052 msg="Listening on [::]:11434 (version 0.1.37)"
time=2024-05-15T17:07:30.376Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2950159761/runners
time=2024-05-15T17:07:32.115Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
time=2024-05-15T17:07:32.303Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-05-15T17:07:32.303Z level=WARN source=amd_linux.go:163 msg="amdgpu too old gfx000" gpu=0
time=2024-05-15T17:07:32.303Z level=INFO source=amd_linux.go:311 msg="no compatible amdgpu devices detected"
time=2024-05-15T17:07:32.303Z level=INFO source=types.go:71 msg="inference compute" id=GPU-83112007-904c-7153-a250-6fce07a5db13 library=cuda compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.4 GiB" available="22.9 GiB"

@manojmanivannan commented on GitHub (May 15, 2024): I can confirm this as well, using the latest ollama image solves the issue for me ```bash ~ $ docker run -d --gpus=all -v ollama:/root/.ollama -p 11435:11434 --name ollama ollama/ollama ⌚ 6:07:27 61671828322b09f4a48da9635ffec3e5fee52d367385d8a3353cb0880c3a7a7a ~ $ docker logs ollama -f ⌚ 6:07:30 2024/05/15 17:07:30 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" time=2024-05-15T17:07:30.376Z level=INFO source=images.go:704 msg="total blobs: 0" time=2024-05-15T17:07:30.376Z level=INFO source=images.go:711 msg="total unused blobs removed: 0" time=2024-05-15T17:07:30.376Z level=INFO source=routes.go:1052 msg="Listening on [::]:11434 (version 0.1.37)" time=2024-05-15T17:07:30.376Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2950159761/runners time=2024-05-15T17:07:32.115Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" time=2024-05-15T17:07:32.303Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-05-15T17:07:32.303Z level=WARN source=amd_linux.go:163 msg="amdgpu too old gfx000" gpu=0 time=2024-05-15T17:07:32.303Z level=INFO source=amd_linux.go:311 msg="no compatible amdgpu devices detected" time=2024-05-15T17:07:32.303Z level=INFO source=types.go:71 msg="inference compute" id=GPU-83112007-904c-7153-a250-6fce07a5db13 library=cuda compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.4 GiB" available="22.9 GiB" ```

GiteaMirror commented

2026-05-03 18:30:19 -05:00

@2loki4u commented on GitHub (Jul 7, 2024):

greetings all - full disclosure, be gentle - lol - I'm a novice in linux as well as most aspects of running a dedicated AI server. I've built a dedicated workstation for self hosted AI - ryzen 7900x/64g ddr/4070ti-super/m.2 4x4 - os: mint 21.3 (ubuntu 22.04).

Struggling on how to resolve an issue where some llama models fully utilize the GPU and some do not. example: llama3:latest, fully utilizes the GPU as does llama2:latest but neither mixtral nor llama3:70b are even touching the GPU and solely peg out most if not all cores on the 7900x.

I'm running the latest ollama build 0.1.48 with nvidia 550.90.07 drivers - nvidia is set to "on-demand" - upon install of 0.1.48 machine reports nvidia GPU detected (obviously, based on 2 of 4 models using it extensively).

ollama is installed directly on linux (not a docker container) - I am using a docker container for openweb-ui and I see the same behavior based on which model I chose.

reading through this thread, I noted that the other users were running ollama in a docker container where I am not. Wasn't sure if this would change the method of troubleshooting but frankly I am at a loss as I cannot explain why it would behave in this way.

Questions:
Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU?
Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use)
Could something else be going on here?

thanks in advance... if you need additional information, please provide instructions with your ask (or just reference where to get them)

@2loki4u commented on GitHub (Jul 7, 2024): greetings all - full disclosure, be gentle - lol - I'm a novice in linux as well as most aspects of running a dedicated AI server. I've built a dedicated workstation for self hosted AI - ryzen 7900x/64g ddr/4070ti-super/m.2 4x4 - os: mint 21.3 (ubuntu 22.04). Struggling on how to resolve an issue where some llama models fully utilize the GPU and some do not. example: llama3:latest, fully utilizes the GPU as does llama2:latest but neither mixtral nor llama3:70b are even touching the GPU and solely peg out most if not all cores on the 7900x. I'm running the latest ollama build 0.1.48 with nvidia 550.90.07 drivers - nvidia is set to "on-demand" - upon install of 0.1.48 machine reports nvidia GPU detected (obviously, based on 2 of 4 models using it extensively). ollama is installed directly on linux (not a docker container) - I am using a docker container for openweb-ui and I see the same behavior based on which model I chose. reading through this thread, I noted that the other users were running ollama in a docker container where I am not. Wasn't sure if this would change the method of troubleshooting but frankly I am at a loss as I cannot explain why it would behave in this way. Questions: Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU? Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use) Could something else be going on here? thanks in advance... if you need additional information, please provide instructions with your ask (or just reference where to get them)

GiteaMirror commented

2026-05-03 18:30:20 -05:00

@Trijeet commented on GitHub (Feb 5, 2025):

Questions: Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU? Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use) Could something else be going on here?

Hi, is there any answer to this? This seems to be the case even now. Are there any workarounds for ollama to use both GPU and CPU for models larger than available VRAM?

@Trijeet commented on GitHub (Feb 5, 2025): > > Questions: Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU? Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use) Could something else be going on here? > Hi, is there any answer to this? This seems to be the case even now. Are there any workarounds for ollama to use both GPU and CPU for models larger than available VRAM?

GiteaMirror commented

2026-05-03 18:30:20 -05:00

@2loki4u commented on GitHub (Feb 6, 2025):

Not that I've found - a colleague of mine was seeking to do the same thing and found the same road block, just a week or so ago.

From: Trijeet Ganguly @.>
Sent: Wednesday, February 5, 2025 5:38 PM
To: ollama/ollama @.>
Cc: Chris B @.>; Comment @.>
Subject: Re: [ollama/ollama] Ollama not using GPU (Issue #4242)

Questions: Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU? Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use) Could something else be going on here?

Hi, is there any answer to this? This seems to be the case even now. Are there any workarounds for ollama to use both GPU and CPU for models larger than available VRAM?

—
Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/4242#issuecomment-2638196439, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANULTJFYSMTW4BPTJ2OUDBD2OKHHRAVCNFSM6AAAAABWSFINIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZYGE4TMNBTHE.
You are receiving this because you commented.Message ID: @.***>

@2loki4u commented on GitHub (Feb 6, 2025): Not that I've found - a colleague of mine was seeking to do the same thing and found the same road block, just a week or so ago. ________________________________ From: Trijeet Ganguly ***@***.***> Sent: Wednesday, February 5, 2025 5:38 PM To: ollama/ollama ***@***.***> Cc: Chris B ***@***.***>; Comment ***@***.***> Subject: Re: [ollama/ollama] Ollama not using GPU (Issue #4242) Questions: Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU? Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use) Could something else be going on here? Hi, is there any answer to this? This seems to be the case even now. Are there any workarounds for ollama to use both GPU and CPU for models larger than available VRAM? — Reply to this email directly, view it on GitHub<https://github.com/ollama/ollama/issues/4242#issuecomment-2638196439>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANULTJFYSMTW4BPTJ2OUDBD2OKHHRAVCNFSM6AAAAABWSFINIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZYGE4TMNBTHE>. You are receiving this because you commented.Message ID: ***@***.***>

GiteaMirror commented

2026-05-03 18:30:21 -05:00

@cfengseagate commented on GitHub (Aug 14, 2025):

Just a side note. I have CUDA 11.8 and cannot upgrade it. Ollama:0.3.14 can use GPU but 0.9+ cannot. Didn't test all versions but apparently there is a compatibility connection between ollama version and cuda version.

@cfengseagate commented on GitHub (Aug 14, 2025): Just a side note. I have CUDA 11.8 and cannot upgrade it. Ollama:0.3.14 can use GPU but 0.9+ cannot. Didn't test all versions but apparently there is a compatibility connection between ollama version and cuda version.

GiteaMirror commented

2026-05-03 18:30:21 -05:00

@Mikhail42 commented on GitHub (Aug 30, 2025):

See also https://github.com/ollama/ollama/issues/5464

@Mikhail42 commented on GitHub (Aug 30, 2025): See also https://github.com/ollama/ollama/issues/5464

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#64682