[GH-ISSUE #4814] Only Detecting One MIG Instance #3040

Open
opened 2026-04-12 13:27:52 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @Magitoneu on GitHub (Jun 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4814

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I am running Ollama as a service on a server with 2xA100 GPUs, both split into 4 MIG instances. I want Ollama to keep two models loaded, each in a different MIG instance. However, Ollama is detecting only one of the MIG instances that I have assigned, so it does not load both models but swaps them.

Environment vars:

Environment="CUDA_VISIBLE_DEVICES=MIG-d001b894-6dc3-5220-a1bf-fb944eb1b13b,MIG-0665cd24-2540-5c33-a995-636c4fcab1cf"
Environment="OLLAMA_MAX_LOADED_MODELS=2"

Nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  Off  | 00000000:05:00.0 Off |                   On |
| N/A   44C    P0    66W / 300W |     48MiB / 81920MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80G...  Off  | 00000000:86:00.0 Off |                   On |
| N/A   34C    P0    67W / 300W |     45MiB / 81920MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  0    2   0   0  |     19MiB / 40192MiB | 42      0 |  3   0    2    0    0 |
|                  |      0MiB / 65535MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0    7   0   1  |      8MiB /  9728MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0    8   0   2  |      6MiB /  9728MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0    9   0   3  |      6MiB /  9728MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0   10   0   4  |      6MiB /  9728MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  1    2   0   0  |     19MiB / 40192MiB | 42      0 |  3   0    2    0    0 |
|                  |      0MiB / 65535MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  1    7   0   1  |      6MiB /  9728MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  1    8   0   2  |      6MiB /  9728MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  1    9   0   3  |      6MiB /  9728MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  1   10   0   4  |      6MiB /  9728MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-fce143c8-f3e2-5db9-a955-1541d6ae4ed6)
  MIG 3g.40gb     Device  0: (UUID: MIG-cb4aa05b-5bb3-5f35-8028-590348715f02)
  MIG 1g.10gb     Device  1: (UUID: MIG-d001b894-6dc3-5220-a1bf-fb944eb1b13b)
  MIG 1g.10gb     Device  2: (UUID: MIG-0665cd24-2540-5c33-a995-636c4fcab1cf)
  MIG 1g.10gb     Device  3: (UUID: MIG-3a059506-610c-5274-984f-065c39393bdc)
  MIG 1g.10gb     Device  4: (UUID: MIG-91c89459-4898-5209-90b7-b0ede2e5ec35)
GPU 1: NVIDIA A100 80GB PCIe (UUID: GPU-14122cde-73d9-d15c-5de8-4ce2da4ff2a2)
  MIG 3g.40gb     Device  0: (UUID: MIG-4770598e-9eea-56f1-a483-dd22f5725c58)
  MIG 1g.10gb     Device  1: (UUID: MIG-94bf4e63-eeda-55e0-a038-ca15deeb6a98)
  MIG 1g.10gb     Device  2: (UUID: MIG-e15bc69a-fdaa-5b4a-bbdd-e2c836785d91)
  MIG 1g.10gb     Device  3: (UUID: MIG-dc0d7d2f-15be-5eb7-bfbb-800b1c8f0368)
  MIG 1g.10gb     Device  4: (UUID: MIG-6c1c7cef-7926-5217-a5bd-b2298a3771a9)

Ollama Logs

Jun 04 15:13:42 rack-ai-0 systemd[1]: Started Ollama Service.
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: 2024/06/04 15:13:42 routes.go:1007: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512>
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.076+02:00 level=INFO source=images.go:729 msg="total blobs: 44"
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.076+02:00 level=INFO source=images.go:736 msg="total unused blobs removed: 0"
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.076+02:00 level=INFO source=routes.go:1053 msg="Listening on [::]:11434 (version 0.1.41)"
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1162914854/runners
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz
Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1162914854/runners/cpu
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1162914854/runners/cpu_avx
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1162914854/runners/cpu_avx2
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1162914854/runners/cuda_v11
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1162914854/runners/rocm_v60002
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=sched.go:90 msg="starting llm scheduler"
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/li>
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.660+02:00 level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.510.108.03]
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: CUDA driver version: 11.6
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.749+02:00 level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.510.108.03
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.749+02:00 level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: [GPU-fce143c8-f3e2-5db9-a955-1541d6ae4ed6] CUDA totalMem 9728 mb
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: [GPU-fce143c8-f3e2-5db9-a955-1541d6ae4ed6] CUDA freeMem 9645 mb
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: [GPU-fce143c8-f3e2-5db9-a955-1541d6ae4ed6] Compute Capability 8.0
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.888+02:00 level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: releasing nvcuda library
Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.888+02:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-fce143c8-f3e2-5db9-a955-1541d6ae4ed6 library=cuda compute=8.0 driver=11.6 name="NVIDIA A100 80GB PCIe MIG 1g.10gb"

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.1.41

Originally created by @Magitoneu on GitHub (Jun 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4814 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I am running Ollama as a service on a server with 2xA100 GPUs, both split into 4 MIG instances. I want Ollama to keep two models loaded, each in a different MIG instance. However, Ollama is detecting only one of the MIG instances that I have assigned, so it does not load both models but swaps them. Environment vars: ``` Environment="CUDA_VISIBLE_DEVICES=MIG-d001b894-6dc3-5220-a1bf-fb944eb1b13b,MIG-0665cd24-2540-5c33-a995-636c4fcab1cf" Environment="OLLAMA_MAX_LOADED_MODELS=2" ``` Nvidia-smi ``` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100 80G... Off | 00000000:05:00.0 Off | On | | N/A 44C P0 66W / 300W | 48MiB / 81920MiB | N/A Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA A100 80G... Off | 00000000:86:00.0 Off | On | | N/A 34C P0 67W / 300W | 45MiB / 81920MiB | N/A Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |==================+======================+===========+=======================| | 0 2 0 0 | 19MiB / 40192MiB | 42 0 | 3 0 2 0 0 | | | 0MiB / 65535MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 7 0 1 | 8MiB / 9728MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 8 0 2 | 6MiB / 9728MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 9 0 3 | 6MiB / 9728MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 10 0 4 | 6MiB / 9728MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ | 1 2 0 0 | 19MiB / 40192MiB | 42 0 | 3 0 2 0 0 | | | 0MiB / 65535MiB | | | +------------------+----------------------+-----------+-----------------------+ | 1 7 0 1 | 6MiB / 9728MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ | 1 8 0 2 | 6MiB / 9728MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ | 1 9 0 3 | 6MiB / 9728MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ | 1 10 0 4 | 6MiB / 9728MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 16383MiB | | | +------------------+----------------------+-----------+-----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-fce143c8-f3e2-5db9-a955-1541d6ae4ed6) MIG 3g.40gb Device 0: (UUID: MIG-cb4aa05b-5bb3-5f35-8028-590348715f02) MIG 1g.10gb Device 1: (UUID: MIG-d001b894-6dc3-5220-a1bf-fb944eb1b13b) MIG 1g.10gb Device 2: (UUID: MIG-0665cd24-2540-5c33-a995-636c4fcab1cf) MIG 1g.10gb Device 3: (UUID: MIG-3a059506-610c-5274-984f-065c39393bdc) MIG 1g.10gb Device 4: (UUID: MIG-91c89459-4898-5209-90b7-b0ede2e5ec35) GPU 1: NVIDIA A100 80GB PCIe (UUID: GPU-14122cde-73d9-d15c-5de8-4ce2da4ff2a2) MIG 3g.40gb Device 0: (UUID: MIG-4770598e-9eea-56f1-a483-dd22f5725c58) MIG 1g.10gb Device 1: (UUID: MIG-94bf4e63-eeda-55e0-a038-ca15deeb6a98) MIG 1g.10gb Device 2: (UUID: MIG-e15bc69a-fdaa-5b4a-bbdd-e2c836785d91) MIG 1g.10gb Device 3: (UUID: MIG-dc0d7d2f-15be-5eb7-bfbb-800b1c8f0368) MIG 1g.10gb Device 4: (UUID: MIG-6c1c7cef-7926-5217-a5bd-b2298a3771a9) ``` Ollama Logs ``` Jun 04 15:13:42 rack-ai-0 systemd[1]: Started Ollama Service. Jun 04 15:13:42 rack-ai-0 ollama[3755267]: 2024/06/04 15:13:42 routes.go:1007: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512> Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.076+02:00 level=INFO source=images.go:729 msg="total blobs: 44" Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.076+02:00 level=INFO source=images.go:736 msg="total unused blobs removed: 0" Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.076+02:00 level=INFO source=routes.go:1053 msg="Listening on [::]:11434 (version 0.1.41)" Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1162914854/runners Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz Jun 04 15:13:42 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:42.077+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1162914854/runners/cpu Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1162914854/runners/cpu_avx Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1162914854/runners/cpu_avx2 Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1162914854/runners/cuda_v11 Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1162914854/runners/rocm_v60002 Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=sched.go:90 msg="starting llm scheduler" Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.658+02:00 level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/li> Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.660+02:00 level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.510.108.03] Jun 04 15:13:44 rack-ai-0 ollama[3755267]: CUDA driver version: 11.6 Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.749+02:00 level=DEBUG source=gpu.go:137 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.510.108.03 Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.749+02:00 level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 04 15:13:44 rack-ai-0 ollama[3755267]: [GPU-fce143c8-f3e2-5db9-a955-1541d6ae4ed6] CUDA totalMem 9728 mb Jun 04 15:13:44 rack-ai-0 ollama[3755267]: [GPU-fce143c8-f3e2-5db9-a955-1541d6ae4ed6] CUDA freeMem 9645 mb Jun 04 15:13:44 rack-ai-0 ollama[3755267]: [GPU-fce143c8-f3e2-5db9-a955-1541d6ae4ed6] Compute Capability 8.0 Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.888+02:00 level=DEBUG source=amd_linux.go:322 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 04 15:13:44 rack-ai-0 ollama[3755267]: releasing nvcuda library Jun 04 15:13:44 rack-ai-0 ollama[3755267]: time=2024-06-04T15:13:44.888+02:00 level=INFO source=types.go:71 msg="inference compute" id=GPU-fce143c8-f3e2-5db9-a955-1541d6ae4ed6 library=cuda compute=8.0 driver=11.6 name="NVIDIA A100 80GB PCIe MIG 1g.10gb" ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.1.41
GiteaMirror added the nvidiabug labels 2026-04-12 13:27:52 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jun 18, 2024):

Based on the log output it seems like the driver API may not be MIG aware. Maybe we were premature in closing #1500

<!-- gh-comment-id:2177093239 --> @dhiltgen commented on GitHub (Jun 18, 2024): Based on the log output it seems like the driver API may not be MIG aware. Maybe we were premature in closing #1500
Author
Owner

@mhoehl05 commented on GitHub (Aug 1, 2024):

As already mentioned in #1500

been testing the same setup with 1x h100 and 20gb slices for a proof of concept, running into the same issue. Ollama only utilizes 1 of 3 passed Migs:

image

used llama3.1:70b and noticed a massive performance loss compared to our "no-mig" setup

<!-- gh-comment-id:2263029059 --> @mhoehl05 commented on GitHub (Aug 1, 2024): As already mentioned in #1500 been testing the same setup with 1x h100 and 20gb slices for a proof of concept, running into the same issue. Ollama only utilizes 1 of 3 passed Migs: ![image](https://github.com/user-attachments/assets/89ac70fe-dd29-45f8-9f50-64641fb499d6) used llama3.1:70b and noticed a massive performance loss compared to our "no-mig" setup
Author
Owner

@waTeim commented on GitHub (Aug 1, 2024):

Hmm, ok what's the deal here? -- we are probably also being stung by this.

<!-- gh-comment-id:2263392095 --> @waTeim commented on GitHub (Aug 1, 2024): Hmm, ok what's the deal here? -- we are probably also being stung by this.
Author
Owner

@Magitoneu commented on GitHub (Nov 8, 2024):

Is there any update on this topic? @dhiltgen

<!-- gh-comment-id:2464283077 --> @Magitoneu commented on GitHub (Nov 8, 2024): Is there any update on this topic? @dhiltgen
Author
Owner

@Eridoc commented on GitHub (Feb 24, 2025):

MIG Support would be very beneficial.

<!-- gh-comment-id:2678350963 --> @Eridoc commented on GitHub (Feb 24, 2025): MIG Support would be very beneficial.
Author
Owner

@Eridoc commented on GitHub (Mar 12, 2025):

Alright I think there is a misconception here.

MIG is indeed supported, that's why we are able to run Ollama in a single MIG instance.
However, MIG instances are inherently isolated from one another and do not share SMs and Memory, which is why they do not "support" multiple MIG instances.
https://forums.developer.nvidia.com/t/a100-mig-inter-instance-communication/190424

If I am not mistaken, best practice would be to have bigger MIG instances if the model does not fit into the smaller MIG instance, or enable MIG for only a few GPUs, and run Ollama on MIG disabled GPUs, so that they can share SMs and Memory.

<!-- gh-comment-id:2718363834 --> @Eridoc commented on GitHub (Mar 12, 2025): Alright I think there is a misconception here. MIG is indeed supported, that's why we are able to run Ollama in a single MIG instance. However, MIG instances are inherently isolated from one another and do not share SMs and Memory, which is why they do not "support" multiple MIG instances. https://forums.developer.nvidia.com/t/a100-mig-inter-instance-communication/190424 If I am not mistaken, best practice would be to have bigger MIG instances if the model does not fit into the smaller MIG instance, or enable MIG for only a few GPUs, and run Ollama on MIG disabled GPUs, so that they can share SMs and Memory.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3040