[GH-ISSUE #5450] Inference fails on AMD when using >1 GPU. #65443

Closed
opened 2026-05-03 21:18:03 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @Speedway1 on GitHub (Jul 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5450

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

This is on AMD. I have 2 x Radeon 7900 XCX cards (24gb each).

For models/memory use that only uses 1 GPU, everything works fine.
As soon as both cards are required, the inference fails with garbage. As seen in this output:

ollama@TH-AI2:~$ ollama list
NAME                                    ID              SIZE    MODIFIED    
deepseek-coder-v2:latest                8577f96d693e    8.9 GB  10 days ago
codestral:latest                        fcc0019dcee9    12 GB   11 days ago
qwen2:latest                            e0d4e1163c58    4.4 GB  11 days ago
command-r:latest                        b8cdfff0263c    20 GB   11 days ago
mxbai-embed-large:latest                468836162de7    669 MB  11 days ago
llama3:70b                              786f3184aec0    39 GB   11 days ago
phi3:14b-medium-128k-instruct-f16       e89861c3ba63    27 GB   11 days ago
ollama@TH-AI2:~$ ollama run command-r:latest
>>> Hello how are you?
???????????????????????????????

>>> /bye

Codetral is only 12 GB and runs in 1GPU, it works fine:

ollama@TH-AI2:~$ ollama run command-r:latest
>>> Hello how are you?
???????????????????????????????

>>> /bye
ollama@TH-AI2:~$ ollama run codestral:latest
>>> Create a ruby script that counts from 1 to 100 and outputs to the console.
 Here's a simple Ruby script that counts from 1 to 100 and outputs to the console:

```ruby
(1..100).each do |number|
  puts number
end

The (1..100) creates a range of numbers from 1 to 100. The each method is then used to iterate over each number in the range. Finally, the puts method outputs the current number to the console.


 phi3:14b-medium requires 2 GPUs for its 27GB size and it too outputs garbage:

ollama@TH-AI2:~$ ollama run phi3:14b-medium-128k-instruct-f16

Hello how are you?
###############################

Send a message (/? for help)





### OS

Linux

### GPU

AMD

### CPU

AMD

### Ollama version

0.1.48
Originally created by @Speedway1 on GitHub (Jul 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5450 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? This is on AMD. I have 2 x Radeon 7900 XCX cards (24gb each). For models/memory use that only uses 1 GPU, everything works fine. As soon as both cards are required, the inference fails with garbage. As seen in this output: ``` ollama@TH-AI2:~$ ollama list NAME ID SIZE MODIFIED deepseek-coder-v2:latest 8577f96d693e 8.9 GB 10 days ago codestral:latest fcc0019dcee9 12 GB 11 days ago qwen2:latest e0d4e1163c58 4.4 GB 11 days ago command-r:latest b8cdfff0263c 20 GB 11 days ago mxbai-embed-large:latest 468836162de7 669 MB 11 days ago llama3:70b 786f3184aec0 39 GB 11 days ago phi3:14b-medium-128k-instruct-f16 e89861c3ba63 27 GB 11 days ago ollama@TH-AI2:~$ ollama run command-r:latest >>> Hello how are you? ??????????????????????????????? >>> /bye ``` Codetral is only 12 GB and runs in 1GPU, it works fine: ``` ollama@TH-AI2:~$ ollama run command-r:latest >>> Hello how are you? ??????????????????????????????? >>> /bye ollama@TH-AI2:~$ ollama run codestral:latest >>> Create a ruby script that counts from 1 to 100 and outputs to the console. Here's a simple Ruby script that counts from 1 to 100 and outputs to the console: ```ruby (1..100).each do |number| puts number end ``` The `(1..100)` creates a range of numbers from 1 to 100. The `each` method is then used to iterate over each number in the range. Finally, the `puts` method outputs the current number to the console. ``` phi3:14b-medium requires 2 GPUs for its 27GB size and it too outputs garbage: ``` ollama@TH-AI2:~$ ollama run phi3:14b-medium-128k-instruct-f16 >>> Hello how are you? ############################### >>> Send a message (/? for help) ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.1.48
GiteaMirror added the gpuamdbug labels 2026-05-03 21:18:04 -05:00
Author
Owner

@Speedway1 commented on GitHub (Jul 3, 2024):

Here is the setup:

root@TH-AI2:~# rocm-smi

============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device  Node  IDs              Temp    Power    Partitions          SCLK   MCLK     Fan  Perf  PwrCap       VRAM%  GPU%  
              (DID,     GUID)  (Edge)  (Avg)    (Mem, Compute, ID)                                                       
==========================================================================================================================
0       1     0x744c,   55924  44.0°C  26.0W    N/A, N/A, 0         62Mhz  96Mhz    0%   auto  327.0W       94%    1%    
1       2     0x744c,   27211  43.0°C  19.0W    N/A, N/A, 0         63Mhz  96Mhz    0%   auto  327.0W       92%    2%    
2       3     0x164e,   33198  30.0°C  21.086W  N/A, N/A, 0         None   1800Mhz  0%   auto  Unsupported  15%    0%    
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================

<!-- gh-comment-id:2204763884 --> @Speedway1 commented on GitHub (Jul 3, 2024): Here is the setup: root@TH-AI2:~# rocm-smi ``` ============================================ ROCm System Management Interface ============================================ ====================================================== Concise Info ====================================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) ========================================================================================================================== 0 1 0x744c, 55924 44.0°C 26.0W N/A, N/A, 0 62Mhz 96Mhz 0% auto 327.0W 94% 1% 1 2 0x744c, 27211 43.0°C 19.0W N/A, N/A, 0 63Mhz 96Mhz 0% auto 327.0W 92% 2% 2 3 0x164e, 33198 30.0°C 21.086W N/A, N/A, 0 None 1800Mhz 0% auto Unsupported 15% 0% ========================================================================================================================== ================================================== End of ROCm SMI Log =================================================== ```
Author
Owner

@Speedway1 commented on GitHub (Jul 3, 2024):

Just to add extra information, it looks like there's a memory issue when running across 2 AMD GPUs. Here is the log output when it fails:

Jul  4 00:37:27 TH-AI2 ollama[238317]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=4 tid="124200820454208" timestamp=1720049847
Jul  4 00:37:27 TH-AI2 kernel: [82387.667170] amd_iommu_report_page_fault: 102 callbacks suppressed
Jul  4 00:37:27 TH-AI2 kernel: [82387.667174] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081000000 flags=0x0020]
Jul  4 00:37:27 TH-AI2 kernel: [82387.667190] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081000300 flags=0x0020]
Jul  4 00:37:27 TH-AI2 kernel: [82387.667203] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081000b00 flags=0x0020]
Jul  4 00:37:27 TH-AI2 kernel: [82387.667215] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081001000 flags=0x0020]
Jul  4 00:37:27 TH-AI2 kernel: [82387.667226] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081000c00 flags=0x0020]
Jul  4 00:37:27 TH-AI2 kernel: [82387.667237] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081001300 flags=0x0020]
Jul  4 00:37:27 TH-AI2 kernel: [82387.667249] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081002900 flags=0x0020]
Jul  4 00:37:27 TH-AI2 kernel: [82387.667261] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081001400 flags=0x0020]
Jul  4 00:37:27 TH-AI2 kernel: [82387.667272] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081002400 flags=0x0020]
Jul  4 00:37:27 TH-AI2 kernel: [82387.667283] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081001f00 flags=0x0020]
Jul  4 00:37:28 TH-AI2 kernel: [82387.670406] AMD-Vi: IOMMU event log overflow

And here is the loading sequence:

Jul  4 00:36:48 TH-AI2 ollama[1488]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
Jul  4 00:36:48 TH-AI2 ollama[1488]: ggml_cuda_init: found 2 ROCm devices:
Jul  4 00:36:48 TH-AI2 ollama[1488]:   Device 0: Radeon RX 7900 XTX, compute capability 11.0, VMM: no
Jul  4 00:36:48 TH-AI2 ollama[1488]:   Device 1: Radeon RX 7900 XTX, compute capability 11.0, VMM: no
Jul  4 00:36:48 TH-AI2 ollama[1488]: llm_load_tensors: ggml ctx size =    0.51 MiB
Jul  4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors: offloading 40 repeating layers to GPU
Jul  4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors: offloading non-repeating layers to GPU
Jul  4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors: offloaded 41/41 layers to GPU
Jul  4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors:      ROCm0 buffer size =  9261.66 MiB
Jul  4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors:      ROCm1 buffer size = 10020.25 MiB
Jul  4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors:        CPU buffer size =  1640.62 MiB
Jul  4 00:36:49 TH-AI2 ollama[1488]: time=2024-07-04T00:36:49.931+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.04"
Jul  4 00:36:50 TH-AI2 ollama[1488]: time=2024-07-04T00:36:50.182+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.09"
Jul  4 00:36:50 TH-AI2 ollama[1488]: time=2024-07-04T00:36:50.433+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.14"
Jul  4 00:36:50 TH-AI2 ollama[1488]: time=2024-07-04T00:36:50.684+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.18"
Jul  4 00:36:50 TH-AI2 ollama[1488]: time=2024-07-04T00:36:50.935+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.24"
Jul  4 00:36:51 TH-AI2 ollama[1488]: time=2024-07-04T00:36:51.186+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.28"
Jul  4 00:36:51 TH-AI2 ollama[1488]: time=2024-07-04T00:36:51.437+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.33"
Jul  4 00:36:51 TH-AI2 ollama[1488]: time=2024-07-04T00:36:51.687+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.38"
Jul  4 00:36:51 TH-AI2 ollama[1488]: time=2024-07-04T00:36:51.938+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.43"
Jul  4 00:36:52 TH-AI2 ollama[1488]: time=2024-07-04T00:36:52.189+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.44"
Jul  4 00:36:52 TH-AI2 ollama[1488]: time=2024-07-04T00:36:52.442+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.55"
Jul  4 00:36:52 TH-AI2 ollama[1488]: time=2024-07-04T00:36:52.693+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.63"
Jul  4 00:36:52 TH-AI2 ollama[1488]: time=2024-07-04T00:36:52.943+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.72"
Jul  4 00:36:53 TH-AI2 ollama[1488]: time=2024-07-04T00:36:53.194+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.80"
Jul  4 00:36:53 TH-AI2 ollama[1488]: time=2024-07-04T00:36:53.445+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.88"
Jul  4 00:36:53 TH-AI2 ollama[1488]: time=2024-07-04T00:36:53.896+01:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server not responding"
Jul  4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: n_ctx      = 8192
Jul  4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: n_batch    = 512
Jul  4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: n_ubatch   = 512
Jul  4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: flash_attn = 0
Jul  4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: freq_base  = 8000000.0
Jul  4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: freq_scale = 1
Jul  4 00:36:54 TH-AI2 ollama[1488]: time=2024-07-04T00:36:54.314+01:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server loading model"
Jul  4 00:36:54 TH-AI2 ollama[1488]: time=2024-07-04T00:36:54.315+01:00 level=DEBUG source=server.go:605 msg="model load progress 1.00"
Jul  4 00:36:54 TH-AI2 ollama[1488]: time=2024-07-04T00:36:54.565+01:00 level=DEBUG source=server.go:608 msg="model load completed, waiting for server to become available" status="llm server loading model"
Jul  4 00:36:55 TH-AI2 ollama[1488]: llama_kv_cache_init:      ROCm0 KV buffer size =  5376.00 MiB
Jul  4 00:36:55 TH-AI2 ollama[1488]: llama_kv_cache_init:      ROCm1 KV buffer size =  4864.00 MiB
Jul  4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: KV self size  = 10240.00 MiB, K (f16): 5120.00 MiB, V (f16): 5120.00 MiB
Jul  4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model:  ROCm_Host  output buffer size =     4.03 MiB
Jul  4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
Jul  4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model:      ROCm0 compute buffer size =  1216.01 MiB
Jul  4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model:      ROCm1 compute buffer size =  1216.02 MiB
Jul  4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model:  ROCm_Host compute buffer size =    80.02 MiB
Jul  4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: graph nodes  = 1208
Jul  4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: graph splits = 3
Jul  4 00:36:55 TH-AI2 kernel: [82355.181846] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080000100 flags=0x0020]
Jul  4 00:36:55 TH-AI2 kernel: [82355.181864] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080000400 flags=0x0020]
Jul  4 00:36:55 TH-AI2 kernel: [82355.181877] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080000c00 flags=0x0020]
Jul  4 00:36:55 TH-AI2 kernel: [82355.181889] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080001100 flags=0x0020]
Jul  4 00:36:55 TH-AI2 kernel: [82355.181901] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080001900 flags=0x0020]
Jul  4 00:36:55 TH-AI2 kernel: [82355.181913] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080000600 flags=0x0020]
Jul  4 00:36:55 TH-AI2 kernel: [82355.181924] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080001800 flags=0x0020]
Jul  4 00:36:55 TH-AI2 kernel: [82355.181936] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080002c00 flags=0x0020]
Jul  4 00:36:55 TH-AI2 kernel: [82355.181948] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080001300 flags=0x0020]
Jul  4 00:36:55 TH-AI2 kernel: [82355.181959] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080002300 flags=0x0020]
Jul  4 00:36:56 TH-AI2 ollama[238317]: DEBUG [initialize] initializing slots | n_slots=4 tid="124200820454208" timestamp=1720049816
Jul  4 00:36:56 TH-AI2 ollama[238317]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="124200820454208" timestamp=1720049816
Jul  4 00:36:56 TH-AI2 ollama[238317]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=1 tid="124200820454208" timestamp=1720049816
Jul  4 00:36:56 TH-AI2 ollama[238317]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=2 tid="124200820454208" timestamp=1720049816
Jul  4 00:36:56 TH-AI2 ollama[238317]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=3 tid="124200820454208" timestamp=1720049816
Jul  4 00:36:56 TH-AI2 ollama[238317]: INFO [main] model loaded | tid="124200820454208" timestamp=1720049816

<!-- gh-comment-id:2207531909 --> @Speedway1 commented on GitHub (Jul 3, 2024): Just to add extra information, it looks like there's a memory issue when running across 2 AMD GPUs. Here is the log output when it fails: ``` Jul 4 00:37:27 TH-AI2 ollama[238317]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=4 tid="124200820454208" timestamp=1720049847 Jul 4 00:37:27 TH-AI2 kernel: [82387.667170] amd_iommu_report_page_fault: 102 callbacks suppressed Jul 4 00:37:27 TH-AI2 kernel: [82387.667174] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081000000 flags=0x0020] Jul 4 00:37:27 TH-AI2 kernel: [82387.667190] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081000300 flags=0x0020] Jul 4 00:37:27 TH-AI2 kernel: [82387.667203] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081000b00 flags=0x0020] Jul 4 00:37:27 TH-AI2 kernel: [82387.667215] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081001000 flags=0x0020] Jul 4 00:37:27 TH-AI2 kernel: [82387.667226] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081000c00 flags=0x0020] Jul 4 00:37:27 TH-AI2 kernel: [82387.667237] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081001300 flags=0x0020] Jul 4 00:37:27 TH-AI2 kernel: [82387.667249] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081002900 flags=0x0020] Jul 4 00:37:27 TH-AI2 kernel: [82387.667261] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081001400 flags=0x0020] Jul 4 00:37:27 TH-AI2 kernel: [82387.667272] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081002400 flags=0x0020] Jul 4 00:37:27 TH-AI2 kernel: [82387.667283] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf081001f00 flags=0x0020] Jul 4 00:37:28 TH-AI2 kernel: [82387.670406] AMD-Vi: IOMMU event log overflow ``` And here is the loading sequence: ``` Jul 4 00:36:48 TH-AI2 ollama[1488]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes Jul 4 00:36:48 TH-AI2 ollama[1488]: ggml_cuda_init: found 2 ROCm devices: Jul 4 00:36:48 TH-AI2 ollama[1488]: Device 0: Radeon RX 7900 XTX, compute capability 11.0, VMM: no Jul 4 00:36:48 TH-AI2 ollama[1488]: Device 1: Radeon RX 7900 XTX, compute capability 11.0, VMM: no Jul 4 00:36:48 TH-AI2 ollama[1488]: llm_load_tensors: ggml ctx size = 0.51 MiB Jul 4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors: offloading 40 repeating layers to GPU Jul 4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors: offloading non-repeating layers to GPU Jul 4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors: offloaded 41/41 layers to GPU Jul 4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors: ROCm0 buffer size = 9261.66 MiB Jul 4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors: ROCm1 buffer size = 10020.25 MiB Jul 4 00:36:49 TH-AI2 ollama[1488]: llm_load_tensors: CPU buffer size = 1640.62 MiB Jul 4 00:36:49 TH-AI2 ollama[1488]: time=2024-07-04T00:36:49.931+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.04" Jul 4 00:36:50 TH-AI2 ollama[1488]: time=2024-07-04T00:36:50.182+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.09" Jul 4 00:36:50 TH-AI2 ollama[1488]: time=2024-07-04T00:36:50.433+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.14" Jul 4 00:36:50 TH-AI2 ollama[1488]: time=2024-07-04T00:36:50.684+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.18" Jul 4 00:36:50 TH-AI2 ollama[1488]: time=2024-07-04T00:36:50.935+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.24" Jul 4 00:36:51 TH-AI2 ollama[1488]: time=2024-07-04T00:36:51.186+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.28" Jul 4 00:36:51 TH-AI2 ollama[1488]: time=2024-07-04T00:36:51.437+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.33" Jul 4 00:36:51 TH-AI2 ollama[1488]: time=2024-07-04T00:36:51.687+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.38" Jul 4 00:36:51 TH-AI2 ollama[1488]: time=2024-07-04T00:36:51.938+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.43" Jul 4 00:36:52 TH-AI2 ollama[1488]: time=2024-07-04T00:36:52.189+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.44" Jul 4 00:36:52 TH-AI2 ollama[1488]: time=2024-07-04T00:36:52.442+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.55" Jul 4 00:36:52 TH-AI2 ollama[1488]: time=2024-07-04T00:36:52.693+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.63" Jul 4 00:36:52 TH-AI2 ollama[1488]: time=2024-07-04T00:36:52.943+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.72" Jul 4 00:36:53 TH-AI2 ollama[1488]: time=2024-07-04T00:36:53.194+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.80" Jul 4 00:36:53 TH-AI2 ollama[1488]: time=2024-07-04T00:36:53.445+01:00 level=DEBUG source=server.go:605 msg="model load progress 0.88" Jul 4 00:36:53 TH-AI2 ollama[1488]: time=2024-07-04T00:36:53.896+01:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server not responding" Jul 4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: n_ctx = 8192 Jul 4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: n_batch = 512 Jul 4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: n_ubatch = 512 Jul 4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: flash_attn = 0 Jul 4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: freq_base = 8000000.0 Jul 4 00:36:54 TH-AI2 ollama[1488]: llama_new_context_with_model: freq_scale = 1 Jul 4 00:36:54 TH-AI2 ollama[1488]: time=2024-07-04T00:36:54.314+01:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server loading model" Jul 4 00:36:54 TH-AI2 ollama[1488]: time=2024-07-04T00:36:54.315+01:00 level=DEBUG source=server.go:605 msg="model load progress 1.00" Jul 4 00:36:54 TH-AI2 ollama[1488]: time=2024-07-04T00:36:54.565+01:00 level=DEBUG source=server.go:608 msg="model load completed, waiting for server to become available" status="llm server loading model" Jul 4 00:36:55 TH-AI2 ollama[1488]: llama_kv_cache_init: ROCm0 KV buffer size = 5376.00 MiB Jul 4 00:36:55 TH-AI2 ollama[1488]: llama_kv_cache_init: ROCm1 KV buffer size = 4864.00 MiB Jul 4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: KV self size = 10240.00 MiB, K (f16): 5120.00 MiB, V (f16): 5120.00 MiB Jul 4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: ROCm_Host output buffer size = 4.03 MiB Jul 4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: pipeline parallelism enabled (n_copies=4) Jul 4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: ROCm0 compute buffer size = 1216.01 MiB Jul 4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: ROCm1 compute buffer size = 1216.02 MiB Jul 4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: ROCm_Host compute buffer size = 80.02 MiB Jul 4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: graph nodes = 1208 Jul 4 00:36:55 TH-AI2 ollama[1488]: llama_new_context_with_model: graph splits = 3 Jul 4 00:36:55 TH-AI2 kernel: [82355.181846] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080000100 flags=0x0020] Jul 4 00:36:55 TH-AI2 kernel: [82355.181864] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080000400 flags=0x0020] Jul 4 00:36:55 TH-AI2 kernel: [82355.181877] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080000c00 flags=0x0020] Jul 4 00:36:55 TH-AI2 kernel: [82355.181889] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080001100 flags=0x0020] Jul 4 00:36:55 TH-AI2 kernel: [82355.181901] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080001900 flags=0x0020] Jul 4 00:36:55 TH-AI2 kernel: [82355.181913] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080000600 flags=0x0020] Jul 4 00:36:55 TH-AI2 kernel: [82355.181924] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080001800 flags=0x0020] Jul 4 00:36:55 TH-AI2 kernel: [82355.181936] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080002c00 flags=0x0020] Jul 4 00:36:55 TH-AI2 kernel: [82355.181948] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080001300 flags=0x0020] Jul 4 00:36:55 TH-AI2 kernel: [82355.181959] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0xf080002300 flags=0x0020] Jul 4 00:36:56 TH-AI2 ollama[238317]: DEBUG [initialize] initializing slots | n_slots=4 tid="124200820454208" timestamp=1720049816 Jul 4 00:36:56 TH-AI2 ollama[238317]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="124200820454208" timestamp=1720049816 Jul 4 00:36:56 TH-AI2 ollama[238317]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=1 tid="124200820454208" timestamp=1720049816 Jul 4 00:36:56 TH-AI2 ollama[238317]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=2 tid="124200820454208" timestamp=1720049816 Jul 4 00:36:56 TH-AI2 ollama[238317]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=3 tid="124200820454208" timestamp=1720049816 Jul 4 00:36:56 TH-AI2 ollama[238317]: INFO [main] model loaded | tid="124200820454208" timestamp=1720049816 ```
Author
Owner

@eliranwong commented on GitHub (Jul 4, 2024):

Here is the setup:

root@TH-AI2:~# rocm-smi

============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device  Node  IDs              Temp    Power    Partitions          SCLK   MCLK     Fan  Perf  PwrCap       VRAM%  GPU%  
              (DID,     GUID)  (Edge)  (Avg)    (Mem, Compute, ID)                                                       
==========================================================================================================================
0       1     0x744c,   55924  44.0°C  26.0W    N/A, N/A, 0         62Mhz  96Mhz    0%   auto  327.0W       94%    1%    
1       2     0x744c,   27211  43.0°C  19.0W    N/A, N/A, 0         63Mhz  96Mhz    0%   auto  327.0W       92%    2%    
2       3     0x164e,   33198  30.0°C  21.086W  N/A, N/A, 0         None   1800Mhz  0%   auto  Unsupported  15%    0%    
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================

I am using dual AMD RX 7900 XTX too, but no issue at all, e.g.:

ubuntu@ai:~/eliran/ai$ ollama list
NAME                 	ID          	SIZE  	MODIFIED       
deepseek-v2:16b      	7c8c332f2df7	8.9 GB	21 minutes ago	
deepseek-coder-v2:16b	8577f96d693e	8.9 GB	36 minutes ago	
internlm2:7b         	5050e36678ab	4.5 GB	2 hours ago   	
gemma2:27b           	371038893ee3	15 GB 	2 hours ago   	
gemma2:9b            	c19987e1e6e2	5.4 GB	7 hours ago   	
codellama:7b-code    	fc84f39375bc	3.8 GB	2 days ago    	
codellama:7b-instruct	8fdf8f752f6e	3.8 GB	2 days ago    	
dbrx:132b            	36800d8d3a28	74 GB 	3 days ago    	
wizardlm2:latest     	c9b1aff820f2	4.1 GB	4 days ago    	
command-r-plus:latest	c9c6cc6d20c7	59 GB 	5 days ago    	
mistral:latest       	2ae6f6dd7a3d	4.1 GB	5 days ago    	
ubuntu@ai:~/eliran/ai$ ollama run command-r-plus

>>> How are you?
As an AI language model, I don't have feelings or emotions in the traditional sense. However, my purpose is to 
assist and provide helpful responses to your queries, so from that perspective, I'm doing well! How can I help you 
today?

========================================== ROCm System Management Interface ==========================================
==================================================== Concise Info ====================================================
Device  Node  IDs              Temp    Power   Partitions          SCLK     MCLK   Fan     Perf  PwrCap  VRAM%  GPU%  
              (DID,     GUID)  (Edge)  (Avg)   (Mem, Compute, ID)                                                     
======================================================================================================================
0       2     0x744c,   45048  54.0°C  122.0W  N/A, N/A, 0         3152Mhz  96Mhz  22.75%  auto  327.0W  90%    100%  
1       1     0x744c,   12575  48.0°C  124.0W  N/A, N/A, 0         3056Mhz  96Mhz  14.9%   auto  327.0W  99%    100%  
======================================================================================================================
================================================ End of ROCm SMI Log =================================================

May I ask two questions:

  1. What ROCm version are you using? I am using ROCm 6.1.3, which officially extend support to 7000 series. My setup notes are recorded at https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu

  2. From your rocm-smi output, I saw you have 3 devices instead of 2. Are you using iGPU, together with your dual discrete GPUs? The official documentation states that mixing use of iGPU and GPUs cause issues with ROCm, so better disable iGPU.

<!-- gh-comment-id:2208914638 --> @eliranwong commented on GitHub (Jul 4, 2024): > Here is the setup: > > root@TH-AI2:~# rocm-smi > > ``` > ============================================ ROCm System Management Interface ============================================ > ====================================================== Concise Info ====================================================== > Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% > (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) > ========================================================================================================================== > 0 1 0x744c, 55924 44.0°C 26.0W N/A, N/A, 0 62Mhz 96Mhz 0% auto 327.0W 94% 1% > 1 2 0x744c, 27211 43.0°C 19.0W N/A, N/A, 0 63Mhz 96Mhz 0% auto 327.0W 92% 2% > 2 3 0x164e, 33198 30.0°C 21.086W N/A, N/A, 0 None 1800Mhz 0% auto Unsupported 15% 0% > ========================================================================================================================== > ================================================== End of ROCm SMI Log =================================================== > ``` I am using dual AMD RX 7900 XTX too, but no issue at all, e.g.: ``` ubuntu@ai:~/eliran/ai$ ollama list NAME ID SIZE MODIFIED deepseek-v2:16b 7c8c332f2df7 8.9 GB 21 minutes ago deepseek-coder-v2:16b 8577f96d693e 8.9 GB 36 minutes ago internlm2:7b 5050e36678ab 4.5 GB 2 hours ago gemma2:27b 371038893ee3 15 GB 2 hours ago gemma2:9b c19987e1e6e2 5.4 GB 7 hours ago codellama:7b-code fc84f39375bc 3.8 GB 2 days ago codellama:7b-instruct 8fdf8f752f6e 3.8 GB 2 days ago dbrx:132b 36800d8d3a28 74 GB 3 days ago wizardlm2:latest c9b1aff820f2 4.1 GB 4 days ago command-r-plus:latest c9c6cc6d20c7 59 GB 5 days ago mistral:latest 2ae6f6dd7a3d 4.1 GB 5 days ago ubuntu@ai:~/eliran/ai$ ollama run command-r-plus >>> How are you? As an AI language model, I don't have feelings or emotions in the traditional sense. However, my purpose is to assist and provide helpful responses to your queries, so from that perspective, I'm doing well! How can I help you today? ========================================== ROCm System Management Interface ========================================== ==================================================== Concise Info ==================================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) ====================================================================================================================== 0 2 0x744c, 45048 54.0°C 122.0W N/A, N/A, 0 3152Mhz 96Mhz 22.75% auto 327.0W 90% 100% 1 1 0x744c, 12575 48.0°C 124.0W N/A, N/A, 0 3056Mhz 96Mhz 14.9% auto 327.0W 99% 100% ====================================================================================================================== ================================================ End of ROCm SMI Log ================================================= ``` May I ask two questions: 1) What ROCm version are you using? I am using ROCm 6.1.3, which officially extend support to 7000 series. My setup notes are recorded at https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu 2) From your rocm-smi output, I saw you have 3 devices instead of 2. Are you using iGPU, together with your dual discrete GPUs? The official documentation states that mixing use of iGPU and GPUs cause issues with ROCm, so better disable iGPU.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65443