[GH-ISSUE #6364] docker container can't detect Nvidia GPU - intermittent "cuda driver library failed to get device context 801" #3995

New Issue

GiteaMirror · 2026-04-12T14:51:52-05:00

GiteaMirror commented

2026-04-12 14:51:52 -05:00

Originally created by @fahadshery on GitHub (Aug 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6364

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

OS: Ubuntu 24.04 LTS
GPU: Nvidia Tesla P40 (24G)

I installed ollama without docker and it was able to utilise my gpu without any issues.
I then deployed ollama using the following docker compose file:

  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    environment:
      - PUID=${PUID:-1000}
      - PGID=${PGID:-1000}
      - OLLAMA_KEEP_ALIVE=24h
      - ENABLE_IMAGE_GENERATION=True
      - COMFYUI_BASE_URL=http://stable-diffusion-webui:7860
    networks:
      - traefik
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
      - ./ollama:/root/.ollama
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.ollama.rule=Host(`ollama.local.example.com`)"
      - "traefik.http.routers.ollama.entrypoints=https"
      - "traefik.http.routers.ollama.tls=true"
      - "traefik.http.routers.ollama.tls.certresolver=cloudflare"
      - "traefik.http.routers.ollama.middlewares=default-headers@file"
      - "traefik.http.routers.ollama.middlewares=ollama-auth"
      - "traefik.http.services.ollama.loadbalancer.server.port=11434"
      - "traefik.http.routers.ollama.middlewares=auth"
      - "traefik.http.middlewares.auth.basicauth.users=${OLLAMA_API_CREDENTIALS}"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

When I exec in to the container and run nvidia-smi it successfully executes it from within the ollama docker container.
but the logs show that it can't detect my gpu?

2024/08/14 22:50:17 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:24h0m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-14T22:50:18.674+01:00 level=INFO source=images.go:782 msg="total blobs: 5"
time=2024-08-14T22:50:18.675+01:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0"
time=2024-08-14T22:50:18.677+01:00 level=INFO source=routes.go:1170 msg="Listening on [::]:11434 (version 0.3.5)"
time=2024-08-14T22:50:18.678+01:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2940291930/runners
time=2024-08-14T22:50:30.626+01:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]"
time=2024-08-14T22:50:30.626+01:00 level=INFO source=gpu.go:204 msg="looking for compatible GPUs"
time=2024-08-14T22:50:30.640+01:00 level=INFO source=gpu.go:260 msg="error looking up nvidia GPU memory" error="cuda driver library failed to get device context 801"
time=2024-08-14T22:50:30.640+01:00 level=INFO source=gpu.go:350 msg="no compatible GPUs were discovered"
time=2024-08-14T22:50:30.640+01:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="47.1 GiB" available="43.9 GiB"
2024/08/14 22:54:19 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:24h0m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-14T22:54:19.967+01:00 level=INFO source=images.go:782 msg="total blobs: 5"
time=2024-08-14T22:54:20.012+01:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0"
time=2024-08-14T22:54:20.013+01:00 level=INFO source=routes.go:1170 msg="Listening on [::]:11434 (version 0.3.5)"
time=2024-08-14T22:54:20.032+01:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1278819119/runners

not sure why??

OS

Linux, Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.3.5

Originally created by @fahadshery on GitHub (Aug 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6364 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? OS: Ubuntu 24.04 LTS GPU: Nvidia Tesla P40 (24G) I installed ollama without docker and it was able to utilise my gpu without any issues. I then deployed ollama using the following docker compose file: ``` ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped environment: - PUID=${PUID:-1000} - PGID=${PGID:-1000} - OLLAMA_KEEP_ALIVE=24h - ENABLE_IMAGE_GENERATION=True - COMFYUI_BASE_URL=http://stable-diffusion-webui:7860 networks: - traefik volumes: - /etc/localtime:/etc/localtime:ro - /etc/timezone:/etc/timezone:ro - ./ollama:/root/.ollama labels: - "traefik.enable=true" - "traefik.http.routers.ollama.rule=Host(`ollama.local.example.com`)" - "traefik.http.routers.ollama.entrypoints=https" - "traefik.http.routers.ollama.tls=true" - "traefik.http.routers.ollama.tls.certresolver=cloudflare" - "traefik.http.routers.ollama.middlewares=default-headers@file" - "traefik.http.routers.ollama.middlewares=ollama-auth" - "traefik.http.services.ollama.loadbalancer.server.port=11434" - "traefik.http.routers.ollama.middlewares=auth" - "traefik.http.middlewares.auth.basicauth.users=${OLLAMA_API_CREDENTIALS}" deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] ``` When I exec in to the container and run `nvidia-smi` it successfully executes it from `within` the ollama docker container. but the logs show that it can't detect my gpu? ``` 2024/08/14 22:50:17 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:24h0m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-08-14T22:50:18.674+01:00 level=INFO source=images.go:782 msg="total blobs: 5" time=2024-08-14T22:50:18.675+01:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0" time=2024-08-14T22:50:18.677+01:00 level=INFO source=routes.go:1170 msg="Listening on [::]:11434 (version 0.3.5)" time=2024-08-14T22:50:18.678+01:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2940291930/runners time=2024-08-14T22:50:30.626+01:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]" time=2024-08-14T22:50:30.626+01:00 level=INFO source=gpu.go:204 msg="looking for compatible GPUs" time=2024-08-14T22:50:30.640+01:00 level=INFO source=gpu.go:260 msg="error looking up nvidia GPU memory" error="cuda driver library failed to get device context 801" time=2024-08-14T22:50:30.640+01:00 level=INFO source=gpu.go:350 msg="no compatible GPUs were discovered" time=2024-08-14T22:50:30.640+01:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="47.1 GiB" available="43.9 GiB" 2024/08/14 22:54:19 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:24h0m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-08-14T22:54:19.967+01:00 level=INFO source=images.go:782 msg="total blobs: 5" time=2024-08-14T22:54:20.012+01:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0" time=2024-08-14T22:54:20.013+01:00 level=INFO source=routes.go:1170 msg="Listening on [::]:11434 (version 0.3.5)" time=2024-08-14T22:54:20.032+01:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1278819119/runners ``` not sure why?? ### OS Linux, Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.5

GiteaMirror added the docker nvidia needs more info bug labels 2026-04-12 14:51:53 -05:00

GiteaMirror commented

2026-04-12 14:51:54 -05:00

@fahadshery commented on GitHub (Aug 14, 2024):

@fahadshery commented on GitHub (Aug 14, 2024): <img width="900" alt="image" src="https://github.com/user-attachments/assets/628dbe8e-4ad5-49b9-89b9-c853cbb6d053">

GiteaMirror commented

2026-04-12 14:51:54 -05:00

@rick-github commented on GitHub (Aug 14, 2024):

What's the output of nvidia-smi outside of the container?

@rick-github commented on GitHub (Aug 14, 2024): What's the output of nvidia-smi outside of the container?

GiteaMirror commented

2026-04-12 14:51:55 -05:00

@rick-github commented on GitHub (Aug 14, 2024):

~~If you (temporarily) install ollama as a service (curl -fsSL https://ollama.com/install.sh | sh) can it access the GPU?~~

I see that you've already done that.

@rick-github commented on GitHub (Aug 14, 2024): ~If you (temporarily) install ollama as a service (`curl -fsSL https://ollama.com/install.sh | sh`) can it access the GPU?~ I see that you've already done that.

GiteaMirror commented

2026-04-12 14:51:55 -05:00

@fahadshery commented on GitHub (Aug 14, 2024):

What's the output of nvidia-smi outside of the container?

@fahadshery commented on GitHub (Aug 14, 2024): > What's the output of nvidia-smi outside of the container? <img width="903" alt="image" src="https://github.com/user-attachments/assets/7a61e040-c0bc-4bd9-a9c3-83da2ecf1614">

GiteaMirror commented

2026-04-12 14:51:56 -05:00

@fahadshery commented on GitHub (Aug 14, 2024):

I am using vGPU which is a datacenter grade GPU. I changed and tried with different Nvidia profiles but no use. Here is more info on the GPU:

fahadshery@ai-stack:~$ nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Wed Aug 14 23:19:54 2024
Driver Version                            : 535.161.08
CUDA Version                              : 12.2

Attached GPUs                             : 1
GPU 00000000:01:00.0
    Product Name                          : GRID P40-24Q
    Product Brand                         : NVIDIA RTX Virtual Workstation
    Product Architecture                  : Pascal
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    Addressing Mode                       : N/A
    MIG Mode
        Current                           : Disabled
        Pending                           : Disabled
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                              : GPU-a126170b-5a87-11ef-bdda-59b6da7f9cd4
    Minor Number                          : 0
    VBIOS Version                         : 00.00.00.00.00
    MultiGPU Board                        : No
    Board ID                              : 0x100
    Board Part Number                     : N/A
    GPU Part Number                       : 1B38-895-A1
    FRU Part Number                       : N/A
    Module ID                             : N/A
    Inforom Version
        Image Version                     : N/A
        OEM Object                        : N/A
        ECC Object                        : N/A
        Power Management Object           : N/A
    Inforom BBX Object Flush
        Latest Timestamp                  : N/A
        Latest Duration                   : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : VGPU
        Host VGPU Mode                    : N/A
    vGPU Software Licensed Product
        Product Name                      : NVIDIA RTX Virtual Workstation
        License Status                    : Licensed (Expiry: 2024-11-12 21:54:2 GMT)
    GPU Reset Status
        Reset Required                    : N/A
        Drain and Reset Recommended       : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x01
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1B3810DE
        Bus Id                            : 00000000:01:00.0
        Sub System Id                     : 0x11EF10DE
        GPU Link Info
            PCIe Generation
                Max                       : N/A
                Current                   : N/A
                Device Current            : N/A
                Device Max                : N/A
                Host Max                  : N/A
            Link Width
                Max                       : N/A
                Current                   : N/A
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : N/A
        Replay Number Rollovers           : N/A
        Tx Throughput                     : N/A
        Rx Throughput                     : N/A
        Atomic Caps Inbound               : N/A
        Atomic Caps Outbound              : N/A
    Fan Speed                             : N/A
    Performance State                     : P0
    Clocks Event Reasons                  : N/A
    Sparse Operation Mode                 : N/A
    FB Memory Usage
        Total                             : 24576 MiB
        Reserved                          : 1680 MiB
        Used                              : 4176 MiB
        Free                              : 18719 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 0 MiB
        Free                              : 256 MiB
    Conf Compute Protected Memory Usage
        Total                             : 0 MiB
        Used                              : 0 MiB
        Free                              : 0 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
        JPEG                              : N/A
        OFA                               : N/A
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    ECC Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
            Double Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
        Aggregate
            Single Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
            Double Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : N/A
        GPU T.Limit Temp                  : N/A
        GPU Shutdown Temp                 : N/A
        GPU Slowdown Temp                 : N/A
        GPU Max Operating Temp            : N/A
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    GPU Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Clocks
        Graphics                          : 1303 MHz
        SM                                : 1303 MHz
        Memory                            : 3615 MHz
        Video                             : 1164 MHz
    Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Default Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : N/A
        SM                                : N/A
        Memory                            : N/A
        Video                             : N/A
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Fabric
        State                             : N/A
        Status                            : N/A
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2201
            Type                          : C
            Name                          : python
            Used GPU Memory               : 146 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 3576
            Type                          : C
            Name                          : /tmp/ollama1278819119/runners/cuda_v11/ollama_llama_server
            Used GPU Memory               : 4030 MiB

@fahadshery commented on GitHub (Aug 14, 2024): I am using vGPU which is a datacenter grade GPU. I changed and tried with different Nvidia profiles but no use. Here is more info on the GPU: ``` fahadshery@ai-stack:~$ nvidia-smi -q ==============NVSMI LOG============== Timestamp : Wed Aug 14 23:19:54 2024 Driver Version : 535.161.08 CUDA Version : 12.2 Attached GPUs : 1 GPU 00000000:01:00.0 Product Name : GRID P40-24Q Product Brand : NVIDIA RTX Virtual Workstation Product Architecture : Pascal Display Mode : Enabled Display Active : Disabled Persistence Mode : Enabled Addressing Mode : N/A MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-a126170b-5a87-11ef-bdda-59b6da7f9cd4 Minor Number : 0 VBIOS Version : 00.00.00.00.00 MultiGPU Board : No Board ID : 0x100 Board Part Number : N/A GPU Part Number : 1B38-895-A1 FRU Part Number : N/A Module ID : N/A Inforom Version Image Version : N/A OEM Object : N/A ECC Object : N/A Power Management Object : N/A Inforom BBX Object Flush Latest Timestamp : N/A Latest Duration : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : VGPU Host VGPU Mode : N/A vGPU Software Licensed Product Product Name : NVIDIA RTX Virtual Workstation License Status : Licensed (Expiry: 2024-11-12 21:54:2 GMT) GPU Reset Status Reset Required : N/A Drain and Reset Recommended : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x01 Device : 0x00 Domain : 0x0000 Device Id : 0x1B3810DE Bus Id : 00000000:01:00.0 Sub System Id : 0x11EF10DE GPU Link Info PCIe Generation Max : N/A Current : N/A Device Current : N/A Device Max : N/A Host Max : N/A Link Width Max : N/A Current : N/A Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : N/A Replay Number Rollovers : N/A Tx Throughput : N/A Rx Throughput : N/A Atomic Caps Inbound : N/A Atomic Caps Outbound : N/A Fan Speed : N/A Performance State : P0 Clocks Event Reasons : N/A Sparse Operation Mode : N/A FB Memory Usage Total : 24576 MiB Reserved : 1680 MiB Used : 4176 MiB Free : 18719 MiB BAR1 Memory Usage Total : 256 MiB Used : 0 MiB Free : 256 MiB Conf Compute Protected Memory Usage Total : 0 MiB Used : 0 MiB Free : 0 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % JPEG : N/A OFA : N/A Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : N/A Pending : N/A ECC Errors Volatile Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Aggregate Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : N/A GPU T.Limit Temp : N/A GPU Shutdown Temp : N/A GPU Slowdown Temp : N/A GPU Max Operating Temp : N/A GPU Target Temperature : N/A Memory Current Temp : N/A Memory Max Operating Temp : N/A GPU Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Module Power Readings Power Draw : N/A Current Power Limit : N/A Requested Power Limit : N/A Default Power Limit : N/A Min Power Limit : N/A Max Power Limit : N/A Clocks Graphics : 1303 MHz SM : 1303 MHz Memory : 3615 MHz Video : 1164 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Deferred Clocks Memory : N/A Max Clocks Graphics : N/A SM : N/A Memory : N/A Video : N/A Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : N/A Fabric State : N/A Status : N/A Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 2201 Type : C Name : python Used GPU Memory : 146 MiB GPU instance ID : N/A Compute instance ID : N/A Process ID : 3576 Type : C Name : /tmp/ollama1278819119/runners/cuda_v11/ollama_llama_server Used GPU Memory : 4030 MiB ```

GiteaMirror commented

2026-04-12 14:51:56 -05:00

@fahadshery commented on GitHub (Aug 14, 2024):

~~If you (temporarily) install ollama as a service (curl -fsSL https://ollama.com/install.sh | sh) can it access the GPU?~~

I see that you've already done that.

yes, already tried and it works beautifully. But I need it running in docker so that it's easier to deploy other services with it like stable diffusion, open-webui, whisper, searxng, libretranslate etc. etc.

@fahadshery commented on GitHub (Aug 14, 2024): > ~If you (temporarily) install ollama as a service (`curl -fsSL https://ollama.com/install.sh | sh`) can it access the GPU?~ > > I see that you've already done that. yes, already tried and it works beautifully. But I need it running in docker so that it's easier to deploy other services with it like `stable diffusion, open-webui, whisper, searxng, libretranslate` etc. etc.

GiteaMirror commented

2026-04-12 14:51:57 -05:00

@rick-github commented on GitHub (Aug 14, 2024):

nvidia-smi outside of the container shows an ollama runner using the GPU. Is that running inside the container or is the ollama-as-a-service still running?

@rick-github commented on GitHub (Aug 14, 2024): nvidia-smi outside of the container shows an ollama runner using the GPU. Is that running inside the container or is the ollama-as-a-service still running?

GiteaMirror commented

2026-04-12 14:51:57 -05:00

@fahadshery commented on GitHub (Aug 14, 2024):

nvidia-smi outside of the container shows an ollama runner using the GPU. Is that running inside the container or is the ollama-as-a-service still running?

There is no ollama service running in this VM. It's a fresh VM and I deploy everything using ansible so that I don't mess things up. So I am assuming that it's got to be from inside the container. but as logs show, container fails to recognise that there is a GPU available to it

@fahadshery commented on GitHub (Aug 14, 2024): > nvidia-smi outside of the container shows an ollama runner using the GPU. Is that running inside the container or is the ollama-as-a-service still running? There is no ollama service running in this VM. It's a fresh VM and I deploy everything using `ansible` so that I don't mess things up. So I am assuming that it's got to be from inside the container. but as logs show, container fails to recognise that there is a GPU available to it

GiteaMirror commented

2026-04-12 14:51:58 -05:00

@fahadshery commented on GitHub (Aug 14, 2024):

this is what I am trying to build:

[](https://technotim.live/posts/ai-stack-tutorial

@fahadshery commented on GitHub (Aug 14, 2024): this is what I am trying to build: [[](https://technotim.live/posts/ai-stack-tutorial](https://technotim.live/posts/ai-stack-tutorial/)

GiteaMirror commented

2026-04-12 14:51:58 -05:00

@rick-github commented on GitHub (Aug 14, 2024):

What do the following show:
pstree -ls 3576
ps wwp3576

@rick-github commented on GitHub (Aug 14, 2024): What do the following show: `pstree -ls 3576` `ps wwp3576`

GiteaMirror commented

2026-04-12 14:51:59 -05:00

@fahadshery commented on GitHub (Aug 15, 2024):

What do the following show: pstree -ls 3576 ps wwp3576

ok, I don't know what happened. (I didn't make any change other than downloading a different model i.e. llama3.1:8b)

I ran it and the GPU utilisation went up to 85%.

then I did the process check and here are the results:

fahadshery@ai-stack:~$ pstree -ls 27163
systemd───containerd-shim───ollama───ollama_llama_se───15*[{ollama_llama_se}]
fahadshery@ai-stack:~$ ps wwp27163
    PID TTY      STAT   TIME COMMAND
  27163 ?        Sl     0:35 /tmp/ollama3783253186/runners/cuda_v11/ollama_llama_server --model /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --numa distribute --parallel 4 --port 40425

is that normal? I was expecting a 90% GPU utilisation

@fahadshery commented on GitHub (Aug 15, 2024): > What do the following show: `pstree -ls 3576` `ps wwp3576` ok, I don't know what happened. (I didn't make any change other than downloading a different model i.e. llama3.1:8b) I ran it and the GPU utilisation went up to 85%. then I did the process check and here are the results: ``` fahadshery@ai-stack:~$ pstree -ls 27163 systemd───containerd-shim───ollama───ollama_llama_se───15*[{ollama_llama_se}] fahadshery@ai-stack:~$ ps wwp27163 PID TTY STAT TIME COMMAND 27163 ? Sl 0:35 /tmp/ollama3783253186/runners/cuda_v11/ollama_llama_server --model /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --numa distribute --parallel 4 --port 40425 ``` is that normal? I was expecting a 90% GPU utilisation

GiteaMirror commented

2026-04-12 14:51:59 -05:00

@fahadshery commented on GitHub (Aug 15, 2024):

@fahadshery commented on GitHub (Aug 15, 2024): <img width="1440" alt="image" src="https://github.com/user-attachments/assets/3bfdc1d7-6f7a-4754-951f-519fcf8e0e58">

GiteaMirror commented

2026-04-12 14:51:59 -05:00

@rick-github commented on GitHub (Aug 15, 2024):

GPU utilization will vary on the efficiency of the model and external factors like power usage, etc. I don't know how this applies to VGPUs as they are a shared resource and presumably there will be some competition for cycles. You can get a view of possible limiting factors by looking at the performance state and throttle reasons from nvidia-smi -q -d POWER,TEMPERATURE,PERFORMANCE.

@rick-github commented on GitHub (Aug 15, 2024): GPU utilization will vary on the efficiency of the model and external factors like power usage, etc. I don't know how this applies to VGPUs as they are a shared resource and presumably there will be some competition for cycles. You can get a view of possible limiting factors by looking at the performance state and throttle reasons from `nvidia-smi -q -d POWER,TEMPERATURE,PERFORMANCE`.

GiteaMirror commented

2026-04-12 14:52:00 -05:00

@fahadshery commented on GitHub (Aug 15, 2024):

GPU utilization will vary on the efficiency of the model and external factors like power usage, etc. I don't know how this applies to VGPUs as they are a shared resource and presumably there will be some competition for cycles. You can get a view of possible limiting factors by looking at the performance state and throttle reasons from nvidia-smi -q -d POWER,TEMPERATURE,PERFORMANCE.

How does it come up with --n-gpu-layers 33 in the ps wwp27163 command? how do you determine that? or is it inherent to the underlying model to decide?

@fahadshery commented on GitHub (Aug 15, 2024): > GPU utilization will vary on the efficiency of the model and external factors like power usage, etc. I don't know how this applies to VGPUs as they are a shared resource and presumably there will be some competition for cycles. You can get a view of possible limiting factors by looking at the performance state and throttle reasons from `nvidia-smi -q -d POWER,TEMPERATURE,PERFORMANCE`. How does it come up with `--n-gpu-layers 33` in the `ps wwp27163 command`? how do you determine that? or is it inherent to the underlying model to decide?

GiteaMirror commented

2026-04-12 14:52:00 -05:00

@rick-github commented on GitHub (Aug 15, 2024):

In the server log there will be lines like:

ollama  | time=2024-08-14T22:59:28.178Z level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=81 layers.offload=14 layers.split="" memory.available="[11.6 GiB]" memory.required.full="55.9 GiB" memory.required.partial="11.4 GiB" memory.required.kv="640.0 MiB" memory.required.allocations="[11.4 GiB]" memory.weights.total="52.9 GiB" memory.weights.repeating="52.1 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="324.0 MiB" memory.graph.partial="1.1 GiB"

This is ollama figuring out how much (V)RAM your system has, and calculating how many layers will fit in the available VRAM and how much RAM will be needed to the non-GPU layers. You can control the number of layers that are offloaded to the GPU with the num_gpu option, either in the CLI (/set parameter num_gpu xx) or in the API (curl localhost:11434/api/generate -d '{"model":"yy","options":{"num_gpu":xx}}').

@rick-github commented on GitHub (Aug 15, 2024): In the server log there will be lines like: ``` ollama | time=2024-08-14T22:59:28.178Z level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=81 layers.offload=14 layers.split="" memory.available="[11.6 GiB]" memory.required.full="55.9 GiB" memory.required.partial="11.4 GiB" memory.required.kv="640.0 MiB" memory.required.allocations="[11.4 GiB]" memory.weights.total="52.9 GiB" memory.weights.repeating="52.1 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="324.0 MiB" memory.graph.partial="1.1 GiB" ``` This is ollama figuring out how much (V)RAM your system has, and calculating how many layers will fit in the available VRAM and how much RAM will be needed to the non-GPU layers. You can control the number of layers that are offloaded to the GPU with the `num_gpu` option, either in the CLI (`/set parameter num_gpu xx`) or in the API (`curl localhost:11434/api/generate -d '{"model":"yy","options":{"num_gpu":xx}}'`).

GiteaMirror commented

2026-04-12 14:52:01 -05:00

@fahadshery commented on GitHub (Aug 16, 2024):

so working within docker container is intermittent. it struggles to reload the model into the GPU once it's been offloaded. I installed using linux shell script and it's working as expected. in docker it some times don't even see the GPU even though nvidia-smi command works fine within the container.

@fahadshery commented on GitHub (Aug 16, 2024): so working within docker container is intermittent. it struggles to reload the model into the GPU once it's been offloaded. I installed using linux shell script and it's working as expected. in docker it some times don't even see the GPU even though `nvidia-smi` command works fine within the container.

GiteaMirror commented

2026-04-12 14:52:02 -05:00

@rick-github commented on GitHub (Aug 16, 2024):

What's in the server logs when it fails?

@rick-github commented on GitHub (Aug 16, 2024): What's in the server logs when it fails?

GiteaMirror commented

2026-04-12 14:52:02 -05:00

@TomorrowToday commented on GitHub (Aug 26, 2024):

Are you running with the nvidia container toolkit? It's not supported yet on Ubuntu 24.04 according to their docs.

@TomorrowToday commented on GitHub (Aug 26, 2024): Are you running with the nvidia container toolkit? It's not supported yet on Ubuntu 24.04 according to their docs.

GiteaMirror commented

2026-04-12 14:52:02 -05:00

@fahadshery commented on GitHub (Aug 27, 2024):

Are you running with the nvidia container toolkit? It's not supported yet on Ubuntu 24.04 according to their docs.

working fine in other containers like Stable-Diffusion-webui, whisper etc.

@fahadshery commented on GitHub (Aug 27, 2024): > Are you running with the nvidia container toolkit? It's not supported yet on Ubuntu 24.04 according to their docs. working fine in other containers like `Stable-Diffusion-webui`, `whisper` etc.

GiteaMirror commented

2026-04-12 14:52:03 -05:00

@superwolfboy commented on GitHub (Aug 31, 2024):

enable "above 4G" in bios already ?

@superwolfboy commented on GitHub (Aug 31, 2024): enable "above 4G" in bios already ?

GiteaMirror commented

2026-04-12 14:52:03 -05:00

@fahadshery commented on GitHub (Aug 31, 2024):

enable "above 4G" in bios already ?

I am running it on Dell R720 Server with NVIDIA Tesla P40 24G GPU. So not sure if there is an option there? But as I said, all the other containers are working fine. Even the gpu-jupyter container is working fine!

@fahadshery commented on GitHub (Aug 31, 2024): > enable "above 4G" in bios already ? I am running it on Dell R720 Server with NVIDIA Tesla P40 24G GPU. So not sure if there is an option there? But as I said, all the other containers are working fine. Even the `gpu-jupyter` container is working fine!

GiteaMirror commented

2026-04-12 14:52:04 -05:00

@superwolfboy commented on GitHub (Sep 2, 2024):

My problem is the same as you,vGPU is not working, almost the same log,
But GPU passthrough can working, and only one VM can use this GPU

@superwolfboy commented on GitHub (Sep 2, 2024): My problem is the same as you,vGPU is not working, almost the same log, But GPU passthrough can working, and only one VM can use this GPU

GiteaMirror commented

2026-04-12 14:52:04 -05:00

@dhiltgen commented on GitHub (Sep 4, 2024):

@fahadshery in your initial logs I see the following error

time=2024-08-14T22:50:30.640+01:00 level=INFO source=gpu.go:260 msg="error looking up nvidia GPU memory" error="cuda driver library failed to get device context 801"

That error code maps to:

    /**
     * This error indicates that the attempted operation is not supported
     * on the current system or device.
     */
    CUDA_ERROR_NOT_SUPPORTED                  = 801,

I would recommend working through our troublshooting guide for NVIDIA GPUs - https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#nvidia-gpu-discovery

In particular, the uvm driver may be unloading which may explain intermittent behavior (works sometimes, fails other times). In our install script we mitigate this with the following code https://github.com/ollama/ollama/blob/main/scripts/install.sh#L358-L367 which may be applicable for your host system if it turns out this is the root cause.

@dhiltgen commented on GitHub (Sep 4, 2024): @fahadshery in your initial logs I see the following error ``` time=2024-08-14T22:50:30.640+01:00 level=INFO source=gpu.go:260 msg="error looking up nvidia GPU memory" error="cuda driver library failed to get device context 801" ``` That error code maps to: ``` /** * This error indicates that the attempted operation is not supported * on the current system or device. */ CUDA_ERROR_NOT_SUPPORTED = 801, ``` I would recommend working through our troublshooting guide for NVIDIA GPUs - https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#nvidia-gpu-discovery In particular, the uvm driver may be unloading which may explain intermittent behavior (works sometimes, fails other times). In our install script we mitigate this with the following code https://github.com/ollama/ollama/blob/main/scripts/install.sh#L358-L367 which may be applicable for your host system if it turns out this is the root cause.

GiteaMirror commented

2026-04-12 14:52:05 -05:00

@JavierCCC commented on GitHub (Sep 12, 2024):

Check /etc/docker/daemon.json

You want to have a runtime definition related to nvidia inside it, something like this

(...)

    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
(...)

Then, you need to use

runtime: nvidia

Inside your docker compose yaml.

@JavierCCC commented on GitHub (Sep 12, 2024): Check /etc/docker/daemon.json You want to have a runtime definition related to nvidia inside it, something like this ``` (...) "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } (...) ``` Then, you need to use `runtime: nvidia` Inside your docker compose yaml.

GiteaMirror commented

2026-04-12 14:52:06 -05:00

@mrk3786 commented on GitHub (Sep 16, 2024):

nvidia-smi outside of the container shows an ollama runner using the GPU. Is that running inside the container or is the ollama-as-a-service still running?

There is no ollama service running in this VM. It's a fresh VM and I deploy everything using ansible so that I don't mess things up. So I am assuming that it's got to be from inside the container. but as logs show, container fails to recognise that there is a GPU available to it

I ran into the same problem. It turned out to be a CPU type configuration in my proxmox VM. I configured x86 and when i changed that to 'host', the issue was solved.

@mrk3786 commented on GitHub (Sep 16, 2024): > > nvidia-smi outside of the container shows an ollama runner using the GPU. Is that running inside the container or is the ollama-as-a-service still running? > > There is no ollama service running in this VM. It's a fresh VM and I deploy everything using `ansible` so that I don't mess things up. So I am assuming that it's got to be from inside the container. but as logs show, container fails to recognise that there is a GPU available to it I ran into the same problem. It turned out to be a CPU type configuration in my proxmox VM. I configured x86 and when i changed that to 'host', the issue was solved.

GiteaMirror commented

2026-04-12 14:52:06 -05:00

@vaclcer commented on GitHub (Sep 17, 2024):

Hello, reporting the same problem with error "cuda driver library failed to get device context 801":

time=2024-09-17T05:56:32.395Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 cuda_v12 rocm_v60102 cpu cpu_avx]" time=2024-09-17T05:56:32.395Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-09-17T05:56:32.395Z level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2024-09-17T05:56:32.395Z level=INFO source=gpu.go:200 msg="looking for compatible GPUs" time=2024-09-17T05:56:32.395Z level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA" time=2024-09-17T05:56:32.395Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* time=2024-09-17T05:56:32.395Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /usr/local/nvidia/lib/libcuda.so* /usr/local/nvidia/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-09-17T05:56:32.396Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.525.105.17] CUDA driver version: 12.0 time=2024-09-17T05:56:32.404Z level=DEBUG source=gpu.go:119 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.525.105.17 time=2024-09-17T05:56:32.404Z level=INFO source=gpu.go:252 msg="error looking up nvidia GPU memory" error="cuda driver library failed to get device context 801" time=2024-09-17T05:56:32.404Z level=DEBUG source=amd_linux.go:371 msg="amdgpu driver not detected /sys/module/amdgpu" time=2024-09-17T05:56:32.404Z level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" releasing cuda driver library time=2024-09-17T05:56:32.404Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="94.3 GiB" available="92.7 GiB"

nvidia-smi in the container work ok:

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
`

Did go through the proposed troubleshooting, but no luck, no other errors found. What do I do now? Thanks for any help.

@vaclcer commented on GitHub (Sep 17, 2024): Hello, reporting the same problem with error "cuda driver library failed to get device context 801": `time=2024-09-17T05:56:32.395Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 cuda_v12 rocm_v60102 cpu cpu_avx]" time=2024-09-17T05:56:32.395Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-09-17T05:56:32.395Z level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2024-09-17T05:56:32.395Z level=INFO source=gpu.go:200 msg="looking for compatible GPUs" time=2024-09-17T05:56:32.395Z level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA" time=2024-09-17T05:56:32.395Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* time=2024-09-17T05:56:32.395Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /usr/local/nvidia/lib/libcuda.so* /usr/local/nvidia/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-09-17T05:56:32.396Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.525.105.17] CUDA driver version: 12.0 time=2024-09-17T05:56:32.404Z level=DEBUG source=gpu.go:119 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.525.105.17 time=2024-09-17T05:56:32.404Z level=INFO source=gpu.go:252 msg="error looking up nvidia GPU memory" error="cuda driver library failed to get device context 801" time=2024-09-17T05:56:32.404Z level=DEBUG source=amd_linux.go:371 msg="amdgpu driver not detected /sys/module/amdgpu" time=2024-09-17T05:56:32.404Z level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" releasing cuda driver library time=2024-09-17T05:56:32.404Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="94.3 GiB" available="92.7 GiB" ` nvidia-smi in the container work ok: ` Tue Sep 17 06:03:42 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A40-48Q On | 00000000:00:05.0 Off | 0 | | N/A N/A P8 N/A / N/A | 0MiB / 49152MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ` Did go through the proposed troubleshooting, but no luck, no other errors found. What do I do now? Thanks for any help.

GiteaMirror commented

2026-04-12 14:52:07 -05:00

@phonkd commented on GitHub (Sep 19, 2024):

After changing the cpu type to host (of my qemu) vm it worked.

@phonkd commented on GitHub (Sep 19, 2024): After changing the cpu type to host (of my qemu) vm it worked.

GiteaMirror commented

2026-04-12 14:52:08 -05:00

@fahadshery commented on GitHub (Sep 20, 2024):

nvidia-smi outside of the container shows an ollama runner using the GPU. Is that running inside the container or is the ollama-as-a-service still running?

ollama works fine but it's intermittent. I have no issues with other containers using the GPU. Therefore, I am running ollama as a service and have no issues..

I ran into the same problem. It turned out to be a CPU type configuration in my proxmox VM. I configured x86 and when i changed that to 'host', the issue was solved.

I will check the CPU type and change it to host but this might not help since we're using vGPU

@fahadshery commented on GitHub (Sep 20, 2024): > > > nvidia-smi outside of the container shows an ollama runner using the GPU. Is that running inside the container or is the ollama-as-a-service still running? ollama works fine but it's intermittent. I have no issues with other containers using the GPU. Therefore, I am running ollama as a service and have no issues.. > > I ran into the same problem. It turned out to be a CPU type configuration in my proxmox VM. I configured x86 and when i changed that to 'host', the issue was solved. I will check the CPU type and change it to host but this might not help since we're using vGPU

GiteaMirror commented

2026-04-12 14:52:09 -05:00

@fahadshery commented on GitHub (Sep 20, 2024):

After changing the cpu type to host (of my qemu) vm it worked.

are there no reloading model issues?

@fahadshery commented on GitHub (Sep 20, 2024): > After changing the cpu type to host (of my qemu) vm it worked. are there no reloading model issues?

GiteaMirror commented

2026-04-12 14:52:09 -05:00

@dhiltgen commented on GitHub (Sep 24, 2024):

@vaclcer your driver version 525.105.17 is well over a year old. Perhaps try upgrading the driver and see if maybe this is a bug nvidia has already fixed?

@fahadshery from your logs, you're running a newer driver, but given the intermittent nature of this, it might also be worth trying to upgrade to the latest driver to see if that clears it up.

@dhiltgen commented on GitHub (Sep 24, 2024): @vaclcer your driver version 525.105.17 is well over a [year old](https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-525-105-17/index.html). Perhaps try upgrading the driver and see if maybe this is a bug nvidia has already fixed? @fahadshery from your logs, you're running a newer driver, but given the intermittent nature of this, it might also be worth trying to upgrade to the latest driver to see if that clears it up.

GiteaMirror commented

2026-04-12 14:52:10 -05:00

@fahadshery commented on GitHub (Sep 26, 2024):

@fahadshery from your logs, you're running a newer driver, but given the intermittent nature of this, it might also be worth trying to upgrade to the latest driver to see if that clears it up.

ok, I am gona upgrade the drivers to the latest 550.90.05 and try again and report back

@fahadshery commented on GitHub (Sep 26, 2024): > @fahadshery from your logs, you're running a newer driver, but given the intermittent nature of this, it might also be worth trying to upgrade to the latest driver to see if that clears it up. ok, I am gona upgrade the drivers to the latest `550.90.05` and try again and report back

GiteaMirror commented

2026-04-12 14:52:10 -05:00

@dstaicova commented on GitHub (Oct 1, 2024):

Just to say, I'm getting the same error without docker:

>OLLAMA_DEBUG=1 ollama serve time=2024-10-01T18:59:04.108+03:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs" time=2024-10-01T18:59:04.109+03:00 level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA" time=2024-10-01T18:59:04.109+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* time=2024-10-01T18:59:04.109+03:00 level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /home/denijane/scripts/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-10-01T18:59:04.166+03:00 level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths="[/usr/lib/libcuda.so.550.107.02 /usr/lib/libcuda.so.550.78 /usr/lib.bak/libcuda.so.550.90.07 /usr/lib32/libcuda.so.550.107.02 /usr/lib64/libcuda.so.550.107.02 /usr/lib64/libcuda.so.550.78]" cuInit err: 999 time=2024-10-01T18:59:04.173+03:00 level=WARN source=gpu.go:562 msg="unknown error initializing cuda driver library" library=/usr/lib/libcuda.so.550.107.02 error="cuda driver library init failure: 999" time=2024-10-01T18:59:04.173+03:00 level=WARN source=gpu.go:563 msg="see https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for more information" cuInit err: 803 time=2024-10-01T18:59:04.181+03:00 level=WARN source=gpu.go:558 msg="version mismatch between driver and cuda driver library - reboot or upgrade may be required" library=/usr/lib/libcuda.so.550.78 error="cuda driver library init failure: 803" cuInit err: 803 time=2024-10-01T18:59:04.193+03:00 level=WARN source=gpu.go:558 msg="version mismatch between driver and cuda driver library - reboot or upgrade may be required" library=/usr/lib.bak/libcuda.so.550.90.07 error="cuda driver library init failure: 803" library /usr/lib32/libcuda.so.550.107.02 load err: /usr/lib32/libcuda.so.550.107.02: wrong ELF class: ELFCLASS32 time=2024-10-01T18:59:04.193+03:00 level=DEBUG source=gpu.go:566 msg="skipping 32bit library" library=/usr/lib32/libcuda.so.550.107.02 cuInit err: 999 time=2024-10-01T18:59:04.202+03:00 level=WARN source=gpu.go:562 msg="unknown error initializing cuda driver library" library=/usr/lib64/libcuda.so.550.107.02 error="cuda driver library init failure: 999" time=2024-10-01T18:59:04.202+03:00 level=WARN source=gpu.go:563 msg="see https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for more information" cuInit err: 803 time=2024-10-01T18:59:04.205+03:00 level=WARN source=gpu.go:558 msg="version mismatch between driver and cuda driver library - reboot or upgrade may be required" library=/usr/lib64/libcuda.so.550.78 error="cuda driver library init failure: 803"
I'm on Manjaro, with nvidia: 550.107.02, cuda: 12.4. I just installed the newest ollama and noticed it starts the models really slowly. Slower than before. The nvidia is working (I can see games in nvidia-smi), but I'm not sure when last I saw ollama to use the gpu as it was pretty busy 2-3 weeks.

@dstaicova commented on GitHub (Oct 1, 2024): Just to say, I'm getting the same error without docker: `>OLLAMA_DEBUG=1 ollama serve time=2024-10-01T18:59:04.108+03:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs" time=2024-10-01T18:59:04.109+03:00 level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA" time=2024-10-01T18:59:04.109+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* time=2024-10-01T18:59:04.109+03:00 level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /home/denijane/scripts/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-10-01T18:59:04.166+03:00 level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths="[/usr/lib/libcuda.so.550.107.02 /usr/lib/libcuda.so.550.78 /usr/lib.bak/libcuda.so.550.90.07 /usr/lib32/libcuda.so.550.107.02 /usr/lib64/libcuda.so.550.107.02 /usr/lib64/libcuda.so.550.78]" cuInit err: 999 time=2024-10-01T18:59:04.173+03:00 level=WARN source=gpu.go:562 msg="unknown error initializing cuda driver library" library=/usr/lib/libcuda.so.550.107.02 error="cuda driver library init failure: 999" time=2024-10-01T18:59:04.173+03:00 level=WARN source=gpu.go:563 msg="see https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for more information" cuInit err: 803 time=2024-10-01T18:59:04.181+03:00 level=WARN source=gpu.go:558 msg="version mismatch between driver and cuda driver library - reboot or upgrade may be required" library=/usr/lib/libcuda.so.550.78 error="cuda driver library init failure: 803" cuInit err: 803 time=2024-10-01T18:59:04.193+03:00 level=WARN source=gpu.go:558 msg="version mismatch between driver and cuda driver library - reboot or upgrade may be required" library=/usr/lib.bak/libcuda.so.550.90.07 error="cuda driver library init failure: 803" library /usr/lib32/libcuda.so.550.107.02 load err: /usr/lib32/libcuda.so.550.107.02: wrong ELF class: ELFCLASS32 time=2024-10-01T18:59:04.193+03:00 level=DEBUG source=gpu.go:566 msg="skipping 32bit library" library=/usr/lib32/libcuda.so.550.107.02 cuInit err: 999 time=2024-10-01T18:59:04.202+03:00 level=WARN source=gpu.go:562 msg="unknown error initializing cuda driver library" library=/usr/lib64/libcuda.so.550.107.02 error="cuda driver library init failure: 999" time=2024-10-01T18:59:04.202+03:00 level=WARN source=gpu.go:563 msg="see https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for more information" cuInit err: 803 time=2024-10-01T18:59:04.205+03:00 level=WARN source=gpu.go:558 msg="version mismatch between driver and cuda driver library - reboot or upgrade may be required" library=/usr/lib64/libcuda.so.550.78 error="cuda driver library init failure: 803" ` I'm on Manjaro, with nvidia: 550.107.02, cuda: 12.4. I just installed the newest ollama and noticed it starts the models really slowly. Slower than before. The nvidia is working (I can see games in nvidia-smi), but I'm not sure when last I saw ollama to use the gpu as it was pretty busy 2-3 weeks.

GiteaMirror commented

2026-04-12 14:52:10 -05:00

@dhiltgen commented on GitHub (Oct 17, 2024):

@denijane you're getting 2 different errors from 2 different libraries we try. The 999 error is a generic "unknown error" code, which isn't super helpful, however the other code 803 is enlightening.

    /**
     * This error indicates that there is a mismatch between the versions of
     * the display driver and the CUDA driver. Refer to the compatibility documentation
     * for supported versions.
     */
    CUDA_ERROR_SYSTEM_DRIVER_MISMATCH         = 803,

If you have already rebooted, somehow your system has gotten into an inconsistent state where the driver you're booting doesn't match the libraries installed.

@dhiltgen commented on GitHub (Oct 17, 2024): @denijane you're getting 2 different errors from 2 different libraries we try. The 999 error is a generic "unknown error" code, which isn't super helpful, however the other code 803 is enlightening. ``` /** * This error indicates that there is a mismatch between the versions of * the display driver and the CUDA driver. Refer to the compatibility documentation * for supported versions. */ CUDA_ERROR_SYSTEM_DRIVER_MISMATCH = 803, ``` If you have already rebooted, somehow your system has gotten into an inconsistent state where the driver you're booting doesn't match the libraries installed.

GiteaMirror commented

2026-04-12 14:52:11 -05:00

@MrHongping commented on GitHub (Oct 19, 2024):

I encountered the same problem. My GPU is Tesla P4 and I am using PVE virtualization GPU for Ubuntu 22's virtual machine. The Nvidia smi command checks that the virtualized GPU is working properly, but I am unable to use the GPU for acceleration even after starting up in Docker or virtual machine environments.

root@gpu-server:~# nvidia-smi
Sat Oct 19 12:17:19 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  GRID P4-8A                     On  | 00000000:00:10.0 Off |                  N/A |
| N/A   N/A    P8              N/A /  N/A |      0MiB /  8192MiB |      0%   Prohibited |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+


root@gpu-server:~# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

root@gpu-server:~# ollama serve
2024/10/19 11:59:03 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-10-19T11:59:03.005Z level=INFO source=images.go:754 msg="total blobs: 5"
time=2024-10-19T11:59:03.005Z level=INFO source=images.go:761 msg="total unused blobs removed: 0"
time=2024-10-19T11:59:03.005Z level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.3.13)"
time=2024-10-19T11:59:03.006Z level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama2843459739/runners
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libggml.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libllama.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/ollama_llama_server.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libggml.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libllama.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/ollama_llama_server.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libggml.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libllama.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/ollama_llama_server.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libggml.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libllama.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/ollama_llama_server.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libggml.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libllama.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/ollama_llama_server.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libggml.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libllama.so.gz
time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/ollama_llama_server.gz
time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/cpu/ollama_llama_server
time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/cpu_avx/ollama_llama_server
time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/cpu_avx2/ollama_llama_server
time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/cuda_v11/ollama_llama_server
time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/cuda_v12/ollama_llama_server
time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/rocm_v60102/ollama_llama_server
time=2024-10-19T11:59:14.631Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cuda_v11 cuda_v12 rocm_v60102 cpu cpu_avx cpu_avx2]"
time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:50 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-10-19T11:59:14.631Z level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-10-19T11:59:14.631Z level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-10-19T11:59:14.631Z level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-10-19T11:59:14.631Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
time=2024-10-19T11:59:14.631Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /root/libcuda.so* /usr/local/cuda-12.2/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-10-19T11:59:14.634Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths="[/usr/lib/x86_64-linux-gnu/libcuda.so.535.161.07 /usr/lib32/libcuda.so.535.161.07]"
CUDA driver version: 12.2
time=2024-10-19T11:59:14.640Z level=DEBUG source=gpu.go:118 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.535.161.07
time=2024-10-19T11:59:14.640Z level=INFO source=gpu.go:252 msg="error looking up nvidia GPU memory" error="cuda driver library failed to get device context 801"
time=2024-10-19T11:59:14.640Z level=DEBUG source=amd_linux.go:376 msg="amdgpu driver not detected /sys/module/amdgpu"
time=2024-10-19T11:59:14.640Z level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered"
releasing cuda driver library
time=2024-10-19T11:59:14.640Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="7.8 GiB" available="7.2 GiB"
time=2024-10-19T12:02:44.961Z level=DEBUG source=sched.go:318 msg="shutting down scheduler completed loop"
time=2024-10-19T12:02:44.961Z level=DEBUG source=common.go:73 msg="cleaning up" dir=/tmp/ollama2843459739
time=2024-10-19T12:02:44.961Z level=DEBUG source=sched.go:119 msg="shutting down scheduler pending loop"

@MrHongping commented on GitHub (Oct 19, 2024): I encountered the same problem. My GPU is Tesla P4 and I am using PVE virtualization GPU for Ubuntu 22's virtual machine. The Nvidia smi command checks that the virtualized GPU is working properly, but I am unable to use the GPU for acceleration even after starting up in Docker or virtual machine environments. ``` text root@gpu-server:~# nvidia-smi Sat Oct 19 12:17:19 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 GRID P4-8A On | 00000000:00:10.0 Off | N/A | | N/A N/A P8 N/A / N/A | 0MiB / 8192MiB | 0% Prohibited | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ root@gpu-server:~# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0 ``` ``` text root@gpu-server:~# ollama serve 2024/10/19 11:59:03 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2024-10-19T11:59:03.005Z level=INFO source=images.go:754 msg="total blobs: 5" time=2024-10-19T11:59:03.005Z level=INFO source=images.go:761 msg="total unused blobs removed: 0" time=2024-10-19T11:59:03.005Z level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.3.13)" time=2024-10-19T11:59:03.006Z level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama2843459739/runners time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libggml.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libllama.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/ollama_llama_server.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libggml.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libllama.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/ollama_llama_server.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libggml.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libllama.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/ollama_llama_server.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libggml.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libllama.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/ollama_llama_server.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libggml.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libllama.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/ollama_llama_server.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libggml.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libllama.so.gz time=2024-10-19T11:59:03.006Z level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/ollama_llama_server.gz time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/cpu/ollama_llama_server time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/cpu_avx/ollama_llama_server time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/cpu_avx2/ollama_llama_server time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/cuda_v11/ollama_llama_server time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/cuda_v12/ollama_llama_server time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama2843459739/runners/rocm_v60102/ollama_llama_server time=2024-10-19T11:59:14.631Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cuda_v11 cuda_v12 rocm_v60102 cpu cpu_avx cpu_avx2]" time=2024-10-19T11:59:14.631Z level=DEBUG source=common.go:50 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-10-19T11:59:14.631Z level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2024-10-19T11:59:14.631Z level=INFO source=gpu.go:199 msg="looking for compatible GPUs" time=2024-10-19T11:59:14.631Z level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA" time=2024-10-19T11:59:14.631Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* time=2024-10-19T11:59:14.631Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /root/libcuda.so* /usr/local/cuda-12.2/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-10-19T11:59:14.634Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths="[/usr/lib/x86_64-linux-gnu/libcuda.so.535.161.07 /usr/lib32/libcuda.so.535.161.07]" CUDA driver version: 12.2 time=2024-10-19T11:59:14.640Z level=DEBUG source=gpu.go:118 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.535.161.07 time=2024-10-19T11:59:14.640Z level=INFO source=gpu.go:252 msg="error looking up nvidia GPU memory" error="cuda driver library failed to get device context 801" time=2024-10-19T11:59:14.640Z level=DEBUG source=amd_linux.go:376 msg="amdgpu driver not detected /sys/module/amdgpu" time=2024-10-19T11:59:14.640Z level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" releasing cuda driver library time=2024-10-19T11:59:14.640Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="7.8 GiB" available="7.2 GiB" time=2024-10-19T12:02:44.961Z level=DEBUG source=sched.go:318 msg="shutting down scheduler completed loop" time=2024-10-19T12:02:44.961Z level=DEBUG source=common.go:73 msg="cleaning up" dir=/tmp/ollama2843459739 time=2024-10-19T12:02:44.961Z level=DEBUG source=sched.go:119 msg="shutting down scheduler pending loop" ```

GiteaMirror commented

2026-04-12 14:52:11 -05:00

@dhiltgen commented on GitHub (Nov 6, 2024):

I've posted a new PR documenting a workaround some users are seeing success with for a slightly different failure mode, but it might be helpful in these cases as well. If you are experiencing the sporadic 801, please give it a try and let us know if it resolves the problem.

#7519

@dhiltgen commented on GitHub (Nov 6, 2024): I've posted a new PR documenting a workaround some users are seeing success with for a slightly different failure mode, but it might be helpful in these cases as well. If you are experiencing the sporadic 801, please give it a try and let us know if it resolves the problem. #7519

GiteaMirror commented

2026-04-12 14:52:12 -05:00

@DoctorDream commented on GitHub (Feb 24, 2025):

我发布了一个新的 PR，记录了一些用户看到的解决方法，但对于略有不同的失败模式，它也可能有所帮助。如果您遇到零星的 801，请尝试一下，并告诉我们它是否解决了问题。

#7519

After research, it has been found that this problem mostly occurs in virtual machines. Through numerous attempts, I've discovered that the cause of this problem (cuda driver library failed to get device context 801) may not be related to the graphics card itself, but rather to the CPU instruction set.
The default CPU virtualization mode in the VM doesn't have the AVX2 instruction set (which can be checked via lscpu). However, after switching it to the host virtualization mode, the AVX2 instruction set appears, and at this point, ollama runs normally.

@DoctorDream commented on GitHub (Feb 24, 2025): > 我发布了一个新的 PR，记录了一些用户看到的解决方法，但对于略有不同的失败模式，它也可能有所帮助。如果您遇到零星的 801，请尝试一下，并告诉我们它是否解决了问题。 > > [#7519](https://github.com/ollama/ollama/pull/7519) After research, it has been found that this problem mostly occurs in virtual machines. Through numerous attempts, I've discovered that the cause of this problem (`cuda driver library failed to get device context 801`) may not be related to the graphics card itself, but rather to the CPU instruction set. The default CPU virtualization mode in the VM doesn't have the AVX2 instruction set (which can be checked via `lscpu`). However, after switching it to the host virtualization mode, the AVX2 instruction set appears, and at this point, ollama runs normally.

GiteaMirror commented

2026-04-12 14:52:12 -05:00

@tharun571 commented on GitHub (Jan 20, 2026):

The "cuda driver library failed to get device context 801" error in Docker containers is frustrating - especially when GPU works fine outside Docker!

Common root causes for Docker GPU detection issues:

nvidia-container-runtime not configured: Check if docker info | grep -i runtime shows nvidia
Missing --gpus flag: Ensure you're using --gpus all or the runtime config in docker-compose
Driver/container toolkit version mismatch: Host driver must support container's CUDA version
Device permissions: Container might not have access to /dev/nvidia* devices

Quick validation steps:

# Inside container, check if GPU is visible:
nvidia-smi
# Check CUDA device access:
ls -la /dev/nvidia*

For intermittent failures (works sometimes, fails others), it's often a race condition with device initialization.

I built an OSS diagnostic tool for these exact GPU+Docker issues: env-doctor

It checks:

Docker GPU runtime configuration
Host driver vs container CUDA compatibility
GPU device forwarding and permissions
Detects missing nvidia-container-toolkit setup

It works both inside and outside containers to pinpoint where the GPU detection breaks.

Full disclosure: I'm the author. Sharing because Docker GPU issues are notoriously hard to debug, and this automates the validation chain. Hope it helps troubleshoot!

@tharun571 commented on GitHub (Jan 20, 2026): The "cuda driver library failed to get device context 801" error in Docker containers is frustrating - especially when GPU works fine outside Docker! Common root causes for Docker GPU detection issues: 1. **nvidia-container-runtime not configured**: Check if `docker info | grep -i runtime` shows nvidia 2. **Missing --gpus flag**: Ensure you're using `--gpus all` or the runtime config in docker-compose 3. **Driver/container toolkit version mismatch**: Host driver must support container's CUDA version 4. **Device permissions**: Container might not have access to `/dev/nvidia*` devices Quick validation steps: ```bash # Inside container, check if GPU is visible: nvidia-smi # Check CUDA device access: ls -la /dev/nvidia* ``` For intermittent failures (works sometimes, fails others), it's often a race condition with device initialization. I built an OSS diagnostic tool for these exact GPU+Docker issues: **[env-doctor](https://github.com/mitulgarg/env-doctor)** It checks: - Docker GPU runtime configuration - Host driver vs container CUDA compatibility - GPU device forwarding and permissions - Detects missing nvidia-container-toolkit setup It works both inside and outside containers to pinpoint where the GPU detection breaks. Full disclosure: I'm the author. Sharing because Docker GPU issues are notoriously hard to debug, and this automates the validation chain. Hope it helps troubleshoot!

GiteaMirror referenced this issue

2026-04-22 06:09:25 -05:00

[GH-ISSUE #3995] Issues with Llama3:70b Model When stream is Set to False #28232

GiteaMirror referenced this issue

2026-04-28 10:31:09 -05:00

[GH-ISSUE #3995] Issues with Llama3:70b Model When stream is Set to False #48984

GiteaMirror referenced this issue

2026-05-03 17:55:41 -05:00

[GH-ISSUE #3995] Issues with Llama3:70b Model When stream is Set to False #64510

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#3995