[GH-ISSUE #5143] AMD iGPU works in docker with override but not on host #29004

Closed
opened 2026-04-22 07:35:35 -05:00 by GiteaMirror · 21 comments
Owner

Originally created by @smellouk on GitHub (Jun 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5143

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Ollama is failing to run on GPU instead it uses CPU. If I force it using HSA_OVERRIDE_GFX_VERSION=9.0.0 then I get Error: llama runner process has terminated: signal: aborted error:Could not initialize Tensile host: No devices found.

ENV:

I'm using Proxmox LXC with Device Passthrough.

journalctl:

Jun 19 14:38:07 ai-llm systemd[1]: Stopping Ollama Service...
Jun 19 14:38:07 ai-llm systemd[1]: ollama.service: Deactivated successfully.
Jun 19 14:38:07 ai-llm systemd[1]: Stopped Ollama Service.
Jun 19 14:38:07 ai-llm systemd[1]: ollama.service: Consumed 7.818s CPU time.
Jun 19 14:38:07 ai-llm systemd[1]: Started Ollama Service.
Jun 19 14:38:07 ai-llm ollama[15932]: 2024/06/19 14:38:07 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.>
Jun 19 14:38:07 ai-llm ollama[15932]: time=2024-06-19T14:38:07.789Z level=INFO source=images.go:725 msg="total blobs: 5"
Jun 19 14:38:07 ai-llm ollama[15932]: time=2024-06-19T14:38:07.789Z level=INFO source=images.go:732 msg="total unused blobs removed: 0"
Jun 19 14:38:07 ai-llm ollama[15932]: time=2024-06-19T14:38:07.789Z level=INFO source=routes.go:1057 msg="Listening on 127.0.0.1:11434 (version 0.1.44)"
Jun 19 14:38:07 ai-llm ollama[15932]: time=2024-06-19T14:38:07.789Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3886990766/runners
Jun 19 14:38:10 ai-llm ollama[15932]: time=2024-06-19T14:38:10.123Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60002 cpu]"
Jun 19 14:38:10 ai-llm ollama[15932]: time=2024-06-19T14:38:10.125Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-dri>
Jun 19 14:38:10 ai-llm ollama[15932]: time=2024-06-19T14:38:10.126Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Jun 19 14:38:10 ai-llm ollama[15932]: time=2024-06-19T14:38:10.126Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx90c driver=0.0 name=1002:1>
Jun 19 14:38:12 ai-llm ollama[15932]: [GIN] 2024/06/19 - 14:38:12 | 200 |      39.044<C2><B5>s |       127.0.0.1 | HEAD     "/"
Jun 19 14:38:12 ai-llm ollama[15932]: [GIN] 2024/06/19 - 14:38:12 | 200 |     436.745<C2><B5>s |       127.0.0.1 | POST     "/api/show"
Jun 19 14:38:12 ai-llm ollama[15932]: [GIN] 2024/06/19 - 14:38:12 | 200 |     278.824<C2><B5>s |       127.0.0.1 | POST     "/api/show"
Jun 19 14:38:12 ai-llm ollama[15932]: time=2024-06-19T14:38:12.873Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-dri>
Jun 19 14:38:12 ai-llm ollama[15932]: time=2024-06-19T14:38:12.874Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.183Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0>
Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.183Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0>
Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.184Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama3886990766/runners/rocm_v60002/ol>
Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.184Z level=INFO source=sched.go:338 msg="loaded runners" count=1
Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.184Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding"
Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.185Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error"
Jun 19 14:38:13 ai-llm ollama[15955]: INFO [main] build info | build=1 commit="5921b8f" tid="125536033915712" timestamp=1718807893
Jun 19 14:38:13 ai-llm ollama[15955]: INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AV>
Jun 19 14:38:13 ai-llm ollama[15955]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="44261" tid="125536033915712" timestamp=1718807893
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c>
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv   1:                               general.name str              = TinyLlama
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv   4:                          llama.block_count u32              = 22
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  11:                          general.file_type u32              = 2
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,61249]   = ["<E2><96><81> t", "e r", "i n", "<E2><96><81> >
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 2
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% for message in messages %}\n{% if m...
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv  22:               general.quantization_version u32              = 2
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - type  f32:   45 tensors
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - type q4_0:  155 tensors
Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - type q6_K:    1 tensors
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_vocab: special tokens cache size = 259
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_vocab: token to piece cache size = 0.3368 MB
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: format           = GGUF V3 (latest)
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: arch             = llama
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: vocab type       = SPM
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_vocab          = 32000
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_merges         = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_ctx_train      = 2048
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_embd           = 2048
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_head           = 32
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_head_kv        = 4
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_layer          = 22
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_rot            = 64
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_embd_head_k    = 64
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_embd_head_v    = 64
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_gqa            = 8
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_embd_k_gqa     = 256
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_embd_v_gqa     = 256
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_ff             = 5632
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_expert         = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_expert_used    = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: causal attn      = 1
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: pooling type     = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: pooling type     = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: rope type        = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: rope scaling     = linear
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: freq_base_train  = 10000.0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: freq_scale_train = 1
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_yarn_orig_ctx  = 2048
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: rope_finetuned   = unknown
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_d_conv       = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_d_inner      = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: freq_base_train  = 10000.0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: freq_scale_train = 1
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_yarn_orig_ctx  = 2048
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: rope_finetuned   = unknown
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_d_conv       = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_d_inner      = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_d_state      = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_dt_rank      = 0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: model type       = 1B
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: model ftype      = Q4_0
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: model params     = 1.10 B
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: model size       = 606.53 MiB (4.63 BPW)
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: general.name     = TinyLlama
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: BOS token        = 1 '<s>'
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: EOS token        = 2 '</s>'
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: UNK token        = 0 '<unk>'
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: PAD token        = 2 '</s>'
Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Jun 19 14:38:13 ai-llm ollama[15932]: rocBLAS error: Could not initialize Tensile host: No devices found
Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.436Z level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: >

rocm-smi

============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device  Node  IDs              Temp    Power     Partitions          SCLK  MCLK     Fan  Perf  PwrCap       VRAM%  GPU%  
              (DID,     GUID)  (Edge)  (Socket)  (Mem, Compute, ID)                                                      
==========================================================================================================================
0       1     0x164c,   28495  46.0°C  9.0W      N/A, N/A, 0         None  1200Mhz  0%   auto  Unsupported  1%     0%    
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================

rocminfo

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.13
Runtime Ext Version:     1.4
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 5700U with Radeon Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 5700U with Radeon Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4372                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    24508068(0x175f6a4) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    24508068(0x175f6a4) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    24508068(0x175f6a4) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx90c                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      1024(0x400) KB                     
  Chip ID:                 5708(0x164c)                       
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1900                               
  BDFID:                   1024                               
  Internal Node ID:        1                                  
  Compute Unit:            8                                  
  SIMDs per CU:            4                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 471                                
  SDMA engine uCode::      40                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8388608(0x800000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8388608(0x800000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx90c:xnack-   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

rocm version

Package: rocm-libs
Version: 6.1.1.60101-90~22.04
Priority: optional
Section: devel
Maintainer: ROCm Dev Support <rocm-dev.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 2.1.0.60101-90~22.04), hipblaslt (= 0.7.0.60101-90~22.04), hipfft (= 1.0.14.60101-90~22.04), hipsolver (= 2.1.1.60101-90~22.04), hipsparse (= 3.0.1.60101-90~22.04), hiptensor (= 1.2.0.60101-90~22.04), miopen-hip (= 3.1.0.60101-90~22.04), half (= 1.12.0.60101-90~22.04), rccl (= 2.18.6.60101-90~22.04), rocalution (= 3.1.1.60101-90~22.04), rocblas (= 4.1.0.60101-90~22.04), rocfft (= 1.0.27.60101-90~22.04), rocrand (= 3.0.1.60101-90~22.04), hiprand (= 2.10.16.60101-90~22.04), rocsolver (= 3.25.0.60101-90~22.04), rocsparse (= 3.1.2.60101-90~22.04), rocm-core (= 6.1.1.60101-90~22.04), hipsparselt (= 0.1.0.60101-90~22.04), composablekernel-dev (= 1.1.0.60101-90~22.04), hipblas-dev (= 2.1.0.60101-90~22.04), hipblaslt-dev (= 0.7.0.60101-90~22.04), hipcub-dev (= 3.1.0.60101-90~22.04), hipfft-dev (= 1.0.14.60101-90~22.04), hipsolver-dev (= 2.1.1.60101-90~22.04), hipsparse-dev (= 3.0.1.60101-90~22.04), hiptensor-dev (= 1.2.0.60101-90~22.04), miopen-hip-dev (= 3.1.0.60101-90~22.04), rccl-dev (= 2.18.6.60101-90~22.04), rocalution-dev (= 3.1.1.60101-90~22.04), rocblas-dev (= 4.1.0.60101-90~22.04), rocfft-dev (= 1.0.27.60101-90~22.04), rocprim-dev (= 3.1.0.60101-90~22.04), rocrand-dev (= 3.0.1.60101-90~22.04), hiprand-dev (= 2.10.16.60101-90~22.04), rocsolver-dev (= 3.25.0.60101-90~22.04), rocsparse-dev (= 3.1.2.60101-90~22.04), rocthrust-dev (= 3.0.1.60101-90~22.04), rocwmma-dev (= 1.4.0.60101-90~22.04), hipsparselt-dev (= 0.1.0.60101-90~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1060 B
APT-Sources: https://repo.radeon.com/rocm/apt/6.1.1 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

Troubleshooting?

  1. I tried to HSA_OVERRIDE_GFX_VERSION=9.0.0 + HIP_VISIBLE_DEVICES=0 to the service file but it didn't change anything.

  2. I tried to run ollama using docker in proxmox LXC, with device passthrough using this cmd

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama --device=/dev/kfd --device=/dev/dri/renderD128 --env HSA_OVERRIDE_GFX_VERSION=9.0.0 --env HSA_ENABLE_SDMA=0 ollama/ollama:rocm

everything works as expected.

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.1.44

Originally created by @smellouk on GitHub (Jun 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5143 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Ollama is failing to run on GPU instead it uses CPU. If I force it using `HSA_OVERRIDE_GFX_VERSION=9.0.0` then I get `Error: llama runner process has terminated: signal: aborted error:Could not initialize Tensile host: No devices found`. ## ENV: I'm using Proxmox LXC with Device Passthrough. ## journalctl: ``` Jun 19 14:38:07 ai-llm systemd[1]: Stopping Ollama Service... Jun 19 14:38:07 ai-llm systemd[1]: ollama.service: Deactivated successfully. Jun 19 14:38:07 ai-llm systemd[1]: Stopped Ollama Service. Jun 19 14:38:07 ai-llm systemd[1]: ollama.service: Consumed 7.818s CPU time. Jun 19 14:38:07 ai-llm systemd[1]: Started Ollama Service. Jun 19 14:38:07 ai-llm ollama[15932]: 2024/06/19 14:38:07 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.> Jun 19 14:38:07 ai-llm ollama[15932]: time=2024-06-19T14:38:07.789Z level=INFO source=images.go:725 msg="total blobs: 5" Jun 19 14:38:07 ai-llm ollama[15932]: time=2024-06-19T14:38:07.789Z level=INFO source=images.go:732 msg="total unused blobs removed: 0" Jun 19 14:38:07 ai-llm ollama[15932]: time=2024-06-19T14:38:07.789Z level=INFO source=routes.go:1057 msg="Listening on 127.0.0.1:11434 (version 0.1.44)" Jun 19 14:38:07 ai-llm ollama[15932]: time=2024-06-19T14:38:07.789Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3886990766/runners Jun 19 14:38:10 ai-llm ollama[15932]: time=2024-06-19T14:38:10.123Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60002 cpu]" Jun 19 14:38:10 ai-llm ollama[15932]: time=2024-06-19T14:38:10.125Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-dri> Jun 19 14:38:10 ai-llm ollama[15932]: time=2024-06-19T14:38:10.126Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Jun 19 14:38:10 ai-llm ollama[15932]: time=2024-06-19T14:38:10.126Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx90c driver=0.0 name=1002:1> Jun 19 14:38:12 ai-llm ollama[15932]: [GIN] 2024/06/19 - 14:38:12 | 200 | 39.044<C2><B5>s | 127.0.0.1 | HEAD "/" Jun 19 14:38:12 ai-llm ollama[15932]: [GIN] 2024/06/19 - 14:38:12 | 200 | 436.745<C2><B5>s | 127.0.0.1 | POST "/api/show" Jun 19 14:38:12 ai-llm ollama[15932]: [GIN] 2024/06/19 - 14:38:12 | 200 | 278.824<C2><B5>s | 127.0.0.1 | POST "/api/show" Jun 19 14:38:12 ai-llm ollama[15932]: time=2024-06-19T14:38:12.873Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-dri> Jun 19 14:38:12 ai-llm ollama[15932]: time=2024-06-19T14:38:12.874Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.183Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0> Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.183Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0> Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.184Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama3886990766/runners/rocm_v60002/ol> Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.184Z level=INFO source=sched.go:338 msg="loaded runners" count=1 Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.184Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding" Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.185Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error" Jun 19 14:38:13 ai-llm ollama[15955]: INFO [main] build info | build=1 commit="5921b8f" tid="125536033915712" timestamp=1718807893 Jun 19 14:38:13 ai-llm ollama[15955]: INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AV> Jun 19 14:38:13 ai-llm ollama[15955]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="44261" tid="125536033915712" timestamp=1718807893 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c> Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 0: general.architecture str = llama Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 1: general.name str = TinyLlama Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 2: llama.context_length u32 = 2048 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 4: llama.block_count u32 = 22 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 11: general.file_type u32 = 2 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 12: tokenizer.ggml.model str = llama Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["<E2><96><81> t", "e r", "i n", "<E2><96><81> > Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m... Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - kv 22: general.quantization_version u32 = 2 Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - type f32: 45 tensors Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - type q4_0: 155 tensors Jun 19 14:38:13 ai-llm ollama[15932]: llama_model_loader: - type q6_K: 1 tensors Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_vocab: special tokens cache size = 259 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_vocab: token to piece cache size = 0.3368 MB Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: format = GGUF V3 (latest) Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: arch = llama Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: vocab type = SPM Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_vocab = 32000 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_merges = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_ctx_train = 2048 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_embd = 2048 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_head = 32 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_head_kv = 4 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_layer = 22 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_rot = 64 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_embd_head_k = 64 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_embd_head_v = 64 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_gqa = 8 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_embd_k_gqa = 256 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_embd_v_gqa = 256 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: f_norm_eps = 0.0e+00 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: f_logit_scale = 0.0e+00 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_ff = 5632 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_expert = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_expert_used = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: causal attn = 1 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: pooling type = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: pooling type = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: rope type = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: rope scaling = linear Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: freq_base_train = 10000.0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: freq_scale_train = 1 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_yarn_orig_ctx = 2048 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: rope_finetuned = unknown Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_d_conv = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_d_inner = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: freq_base_train = 10000.0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: freq_scale_train = 1 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: n_yarn_orig_ctx = 2048 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: rope_finetuned = unknown Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_d_conv = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_d_inner = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_d_state = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: ssm_dt_rank = 0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: model type = 1B Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: model ftype = Q4_0 Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: model params = 1.10 B Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: model size = 606.53 MiB (4.63 BPW) Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: general.name = TinyLlama Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: BOS token = 1 '<s>' Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: EOS token = 2 '</s>' Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: UNK token = 0 '<unk>' Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: PAD token = 2 '</s>' Jun 19 14:38:13 ai-llm ollama[15932]: llm_load_print_meta: LF token = 13 '<0x0A>' Jun 19 14:38:13 ai-llm ollama[15932]: rocBLAS error: Could not initialize Tensile host: No devices found Jun 19 14:38:13 ai-llm ollama[15932]: time=2024-06-19T14:38:13.436Z level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: > ``` ## rocm-smi ``` ============================================ ROCm System Management Interface ============================================ ====================================================== Concise Info ====================================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Edge) (Socket) (Mem, Compute, ID) ========================================================================================================================== 0 1 0x164c, 28495 46.0°C 9.0W N/A, N/A, 0 None 1200Mhz 0% auto Unsupported 1% 0% ========================================================================================================================== ================================================== End of ROCm SMI Log =================================================== ``` ## rocminfo ``` ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.13 Runtime Ext Version: 1.4 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 7 5700U with Radeon Graphics Uuid: CPU-XX Marketing Name: AMD Ryzen 7 5700U with Radeon Graphics Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 4372 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 24508068(0x175f6a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 24508068(0x175f6a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 24508068(0x175f6a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx90c Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 1024(0x400) KB Chip ID: 5708(0x164c) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1900 BDFID: 1024 Internal Node ID: 1 Compute Unit: 8 SIMDs per CU: 4 Shader Engines: 1 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 471 SDMA engine uCode:: 40 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 8388608(0x800000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 8388608(0x800000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90c:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ``` ## rocm version ``` Package: rocm-libs Version: 6.1.1.60101-90~22.04 Priority: optional Section: devel Maintainer: ROCm Dev Support <rocm-dev.support@amd.com> Installed-Size: 13.3 kB Depends: hipblas (= 2.1.0.60101-90~22.04), hipblaslt (= 0.7.0.60101-90~22.04), hipfft (= 1.0.14.60101-90~22.04), hipsolver (= 2.1.1.60101-90~22.04), hipsparse (= 3.0.1.60101-90~22.04), hiptensor (= 1.2.0.60101-90~22.04), miopen-hip (= 3.1.0.60101-90~22.04), half (= 1.12.0.60101-90~22.04), rccl (= 2.18.6.60101-90~22.04), rocalution (= 3.1.1.60101-90~22.04), rocblas (= 4.1.0.60101-90~22.04), rocfft (= 1.0.27.60101-90~22.04), rocrand (= 3.0.1.60101-90~22.04), hiprand (= 2.10.16.60101-90~22.04), rocsolver (= 3.25.0.60101-90~22.04), rocsparse (= 3.1.2.60101-90~22.04), rocm-core (= 6.1.1.60101-90~22.04), hipsparselt (= 0.1.0.60101-90~22.04), composablekernel-dev (= 1.1.0.60101-90~22.04), hipblas-dev (= 2.1.0.60101-90~22.04), hipblaslt-dev (= 0.7.0.60101-90~22.04), hipcub-dev (= 3.1.0.60101-90~22.04), hipfft-dev (= 1.0.14.60101-90~22.04), hipsolver-dev (= 2.1.1.60101-90~22.04), hipsparse-dev (= 3.0.1.60101-90~22.04), hiptensor-dev (= 1.2.0.60101-90~22.04), miopen-hip-dev (= 3.1.0.60101-90~22.04), rccl-dev (= 2.18.6.60101-90~22.04), rocalution-dev (= 3.1.1.60101-90~22.04), rocblas-dev (= 4.1.0.60101-90~22.04), rocfft-dev (= 1.0.27.60101-90~22.04), rocprim-dev (= 3.1.0.60101-90~22.04), rocrand-dev (= 3.0.1.60101-90~22.04), hiprand-dev (= 2.10.16.60101-90~22.04), rocsolver-dev (= 3.25.0.60101-90~22.04), rocsparse-dev (= 3.1.2.60101-90~22.04), rocthrust-dev (= 3.0.1.60101-90~22.04), rocwmma-dev (= 1.4.0.60101-90~22.04), hipsparselt-dev (= 0.1.0.60101-90~22.04) Homepage: https://github.com/RadeonOpenCompute/ROCm Download-Size: 1060 B APT-Sources: https://repo.radeon.com/rocm/apt/6.1.1 jammy/main amd64 Packages Description: Radeon Open Compute (ROCm) Runtime software stack ``` ## Troubleshooting? 1. I tried to `HSA_OVERRIDE_GFX_VERSION=9.0.0` + `HIP_VISIBLE_DEVICES=0` to the service file but it didn't change anything. 2. I tried to run ollama using docker in proxmox LXC, with device passthrough using this cmd ``` docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama --device=/dev/kfd --device=/dev/dri/renderD128 --env HSA_OVERRIDE_GFX_VERSION=9.0.0 --env HSA_ENABLE_SDMA=0 ollama/ollama:rocm ``` everything works as expected. ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.1.44
GiteaMirror added the amdbug labels 2026-04-22 07:35:35 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jun 19, 2024):

What kind of GPU are you using?

<!-- gh-comment-id:2179245794 --> @dhiltgen commented on GitHub (Jun 19, 2024): What kind of GPU are you using?
Author
Owner

@smellouk commented on GitHub (Jun 19, 2024):

@dhiltgen Integrated GPU

<!-- gh-comment-id:2179417781 --> @smellouk commented on GitHub (Jun 19, 2024): @dhiltgen Integrated GPU
Author
Owner

@dhiltgen commented on GitHub (Jun 19, 2024):

iGPUs aren't yet supported. That's tracked via #2637

<!-- gh-comment-id:2179487457 --> @dhiltgen commented on GitHub (Jun 19, 2024): iGPUs aren't yet supported. That's tracked via #2637
Author
Owner

@smellouk commented on GitHub (Jun 19, 2024):

@dhiltgen can you help me to understand why it works with docker ?

<!-- gh-comment-id:2179498743 --> @smellouk commented on GitHub (Jun 19, 2024): @dhiltgen can you help me to understand why it works with docker ?
Author
Owner

@dhiltgen commented on GitHub (Jun 19, 2024):

It shouldn't work. Can you confirm it's actually loading into the GPU? ollama ps or check the logs to see which runner it's running. iGPUs need extra flags for llama.cpp to compile with shared memory support since the iGPU doesn't have dedicated VRAM, and this is a porformance hit for discrete GPUs, so we haven't yet added a second ROCm runner compiled specifically to support shared memory.

<!-- gh-comment-id:2179500157 --> @dhiltgen commented on GitHub (Jun 19, 2024): It shouldn't work. Can you confirm it's actually loading into the GPU? `ollama ps` or check the logs to see which runner it's running. iGPUs need extra flags for llama.cpp to compile with shared memory support since the iGPU doesn't have dedicated VRAM, and this is a porformance hit for discrete GPUs, so we haven't yet added a second ROCm runner compiled specifically to support shared memory.
Author
Owner

@smellouk commented on GitHub (Jun 19, 2024):

@dhiltgen It's working fine in this setup:

  • Proxmox running on host machine
  • Docker running in Debian LXC
  • Ollama running on docker
  • Shared GPU

Ollama PS:

ollama ps
[root@48371e2102b0 /]# ollama ps
NAME                    ID              SIZE    PROCESSOR       UNTIL              
tinyllama:latest        2644915ede35    1.3 GB  100% GPU        4 minutes from now

Docker Logs:

2024/06/19 22:39:11 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-19T22:39:11.820Z level=INFO source=images.go:725 msg="total blobs: 10"
time=2024-06-19T22:39:11.822Z level=INFO source=images.go:732 msg="total unused blobs removed: 0"
time=2024-06-19T22:39:11.822Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.44)"
time=2024-06-19T22:39:11.823Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3423951783/runners
time=2024-06-19T22:39:14.287Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
time=2024-06-19T22:39:14.296Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-19T22:39:14.299Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
time=2024-06-19T22:39:14.299Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx90c driver=0.0 name=1002:164c total="8.0 GiB" available="8.0 GiB"
[GIN] 2024/06/19 - 22:40:24 | 200 |     514.554µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/06/19 - 22:40:24 | 200 |    4.285247ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/06/19 - 22:40:24 | 200 |     504.915µs |       127.0.0.1 | POST     "/api/show"
time=2024-06-19T22:40:24.124Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-19T22:40:24.124Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
time=2024-06-19T22:40:24.423Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0 GiB" memory.required.full="1.2 GiB" memory.required.partial="1.2 GiB" memory.required.kv="44.0 MiB" memory.weights.total="571.4 MiB" memory.weights.repeating="520.1 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="144.3 MiB"
time=2024-06-19T22:40:24.423Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0 GiB" memory.required.full="1.2 GiB" memory.required.partial="1.2 GiB" memory.required.kv="44.0 MiB" memory.weights.total="571.4 MiB" memory.weights.repeating="520.1 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="144.3 MiB"
time=2024-06-19T22:40:24.424Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama3423951783/runners/rocm_v60002/ollama_llama_server --model /root/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 23 --parallel 1 --port 42571"
time=2024-06-19T22:40:24.425Z level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-06-19T22:40:24.425Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding"
time=2024-06-19T22:40:24.425Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="5921b8f" tid="136884972076096" timestamp=1718836824
INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="136884972076096" timestamp=1718836824 total_threads=16
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="42571" tid="136884972076096" timestamp=1718836824
llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /root/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = TinyLlama
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 2
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,61249]   = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% for message in messages %}\n{% if m...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   45 tensors
llama_model_loader: - type q4_0:  155 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache size = 259
llm_load_vocab: token to piece cache size = 0.3368 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_layer          = 22
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 5632
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 1.10 B
llm_load_print_meta: model size       = 606.53 MiB (4.63 BPW) 
llm_load_print_meta: general.name     = TinyLlama
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 2 '</s>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
time=2024-06-19T22:40:24.927Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 9.0, VMM: no
llm_load_tensors: ggml ctx size =    0.20 MiB
llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 23/23 layers to GPU
llm_load_tensors:      ROCm0 buffer size =   571.37 MiB
llm_load_tensors:        CPU buffer size =    35.16 MiB
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =    44.00 MiB
llama_new_context_with_model: KV self size  =   44.00 MiB, K (f16):   22.00 MiB, V (f16):   22.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     0.13 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   148.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =     8.01 MiB
llama_new_context_with_model: graph nodes  = 710
llama_new_context_with_model: graph splits = 2
INFO [main] model loaded | tid="136884972076096" timestamp=1718836827
time=2024-06-19T22:40:27.187Z level=INFO source=server.go:572 msg="llama runner started in 2.76 seconds"
[GIN] 2024/06/19 - 22:40:27 | 200 |  3.067133877s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/06/19 - 22:41:12 | 200 | 12.271690234s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/06/19 - 22:41:25 | 200 |      21.372µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/06/19 - 22:41:25 | 200 |       72.85µs |       127.0.0.1 | GET      "/api/ps"

This is the proof that it works fine:
before loading tinyllama
image

after loading tinyllama
image

after testing the model
image

iGPU doesn't have dedicated VRAM

Regarding this statement, I have the option to dedicate VRAM on my BIOS.

<!-- gh-comment-id:2179538572 --> @smellouk commented on GitHub (Jun 19, 2024): @dhiltgen It's working fine in this setup: * Proxmox running on host machine * Docker running in Debian LXC * Ollama running on docker * Shared GPU **Ollama PS:** ``` ollama ps [root@48371e2102b0 /]# ollama ps NAME ID SIZE PROCESSOR UNTIL tinyllama:latest 2644915ede35 1.3 GB 100% GPU 4 minutes from now ``` **Docker Logs:** ``` 2024/06/19 22:39:11 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" time=2024-06-19T22:39:11.820Z level=INFO source=images.go:725 msg="total blobs: 10" time=2024-06-19T22:39:11.822Z level=INFO source=images.go:732 msg="total unused blobs removed: 0" time=2024-06-19T22:39:11.822Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.44)" time=2024-06-19T22:39:11.823Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3423951783/runners time=2024-06-19T22:39:14.287Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" time=2024-06-19T22:39:14.296Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-19T22:39:14.299Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 time=2024-06-19T22:39:14.299Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx90c driver=0.0 name=1002:164c total="8.0 GiB" available="8.0 GiB" [GIN] 2024/06/19 - 22:40:24 | 200 | 514.554µs | 127.0.0.1 | HEAD "/" [GIN] 2024/06/19 - 22:40:24 | 200 | 4.285247ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/06/19 - 22:40:24 | 200 | 504.915µs | 127.0.0.1 | POST "/api/show" time=2024-06-19T22:40:24.124Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-19T22:40:24.124Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 time=2024-06-19T22:40:24.423Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0 GiB" memory.required.full="1.2 GiB" memory.required.partial="1.2 GiB" memory.required.kv="44.0 MiB" memory.weights.total="571.4 MiB" memory.weights.repeating="520.1 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="144.3 MiB" time=2024-06-19T22:40:24.423Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0 GiB" memory.required.full="1.2 GiB" memory.required.partial="1.2 GiB" memory.required.kv="44.0 MiB" memory.weights.total="571.4 MiB" memory.weights.repeating="520.1 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="144.3 MiB" time=2024-06-19T22:40:24.424Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama3423951783/runners/rocm_v60002/ollama_llama_server --model /root/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 23 --parallel 1 --port 42571" time=2024-06-19T22:40:24.425Z level=INFO source=sched.go:338 msg="loaded runners" count=1 time=2024-06-19T22:40:24.425Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding" time=2024-06-19T22:40:24.425Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error" INFO [main] build info | build=1 commit="5921b8f" tid="136884972076096" timestamp=1718836824 INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="136884972076096" timestamp=1718836824 total_threads=16 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="42571" tid="136884972076096" timestamp=1718836824 llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /root/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = TinyLlama llama_model_loader: - kv 2: llama.context_length u32 = 2048 llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 llama_model_loader: - kv 4: llama.block_count u32 = 22 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m... llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - type f32: 45 tensors llama_model_loader: - type q4_0: 155 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens cache size = 259 llm_load_vocab: token to piece cache size = 0.3368 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 4 llm_load_print_meta: n_layer = 22 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 256 llm_load_print_meta: n_embd_v_gqa = 256 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 5632 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 1B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 1.10 B llm_load_print_meta: model size = 606.53 MiB (4.63 BPW) llm_load_print_meta: general.name = TinyLlama llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 2 '</s>' llm_load_print_meta: LF token = 13 '<0x0A>' time=2024-06-19T22:40:24.927Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, compute capability 9.0, VMM: no llm_load_tensors: ggml ctx size = 0.20 MiB llm_load_tensors: offloading 22 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 23/23 layers to GPU llm_load_tensors: ROCm0 buffer size = 571.37 MiB llm_load_tensors: CPU buffer size = 35.16 MiB llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 44.00 MiB llama_new_context_with_model: KV self size = 44.00 MiB, K (f16): 22.00 MiB, V (f16): 22.00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0.13 MiB llama_new_context_with_model: ROCm0 compute buffer size = 148.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 8.01 MiB llama_new_context_with_model: graph nodes = 710 llama_new_context_with_model: graph splits = 2 INFO [main] model loaded | tid="136884972076096" timestamp=1718836827 time=2024-06-19T22:40:27.187Z level=INFO source=server.go:572 msg="llama runner started in 2.76 seconds" [GIN] 2024/06/19 - 22:40:27 | 200 | 3.067133877s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/06/19 - 22:41:12 | 200 | 12.271690234s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/06/19 - 22:41:25 | 200 | 21.372µs | 127.0.0.1 | HEAD "/" [GIN] 2024/06/19 - 22:41:25 | 200 | 72.85µs | 127.0.0.1 | GET "/api/ps" ``` **This is the proof that it works fine:** before loading tinyllama ![image](https://github.com/ollama/ollama/assets/13059906/2455076d-216b-4b80-b903-953650f3a431) **after loading tinyllama** ![image](https://github.com/ollama/ollama/assets/13059906/c5c05d8d-c8b5-4fef-98d3-91c07c67794f) **after testing the model** ![image](https://github.com/ollama/ollama/assets/13059906/4f7ebc93-05b3-4ad1-ad69-d022d1d5172b) > iGPU doesn't have dedicated VRAM Regarding this statement, I have the option to dedicate VRAM on my BIOS.
Author
Owner

@dhiltgen commented on GitHub (Jun 19, 2024):

Interesting. We have code explicitly to detect and block iGPUs, but it sounds like there are some limited scenarios where this can work without requiring a new runner build with shared memory support. I'm not sure why the dockerized version differs from the host, but what I'd suggest is set OLLAMA_DEBUG=1 on both, and just start the server without trying to load models, and then we can compare the output of the GPU discovery code.

<!-- gh-comment-id:2179583086 --> @dhiltgen commented on GitHub (Jun 19, 2024): Interesting. We have code explicitly to detect and block iGPUs, but it sounds like there are some limited scenarios where this can work without requiring a new runner build with shared memory support. I'm not sure why the dockerized version differs from the host, but what I'd suggest is set OLLAMA_DEBUG=1 on both, and just start the server without trying to load models, and then we can compare the output of the GPU discovery code.
Author
Owner

@smellouk commented on GitHub (Jun 20, 2024):

@dhiltgen this is current logs, but I can't spot any difference, any help from you is appreciated 🙏

ON LXC

Jun 20 00:11:55 ai-llm ollama[377]: 2024/06/20 00:11:55 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.874Z level=INFO source=images.go:704 msg="total blobs: 5"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.874Z level=INFO source=images.go:711 msg="total unused blobs removed: 0"
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
Jun 20 00:11:55 ai-llm ollama[377]:  - using env:        export GIN_MODE=release
Jun 20 00:11:55 ai-llm ollama[377]:  - using code:        gin.SetMode(gin.ReleaseMode)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullModelHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateModelHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushModelHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyModelHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteModelHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowModelHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).ProcessHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListModelsHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListModelsHandler-fm (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=INFO source=routes.go:1054 msg="Listening on 127.0.0.1:11434 (version 0.1.44)"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama950310909/runners
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60101 file=build/linux/x86_64/rocm_v60101/bin/deps.txt.gz
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60101 file=build/linux/x86_64/rocm_v60101/bin/ollama_llama_server.gz
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama950310909/runners/cpu
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama950310909/runners/cpu_avx
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama950310909/runners/cpu_avx2
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama950310909/runners/rocm_v60101
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 rocm_v60101 cpu cpu_avx]"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=sched.go:90 msg="starting llm scheduler"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=gpu.go:122 msg="Detecting GPUs"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=gpu.go:261 msg="Searching for GPU library" name=libcuda.so*
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=gpu.go:280 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.916Z level=DEBUG source=gpu.go:313 msg="discovered GPU libraries" paths=[]
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.916Z level=DEBUG source=gpu.go:261 msg="Searching for GPU library" name=libcudart.so*
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.916Z level=DEBUG source=gpu.go:280 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama950310909/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=gpu.go:313 msg="discovered GPU libraries" paths=[]
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 total="8.0 GiB"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_linux.go:244 msg="amdgpu memory" gpu=0 available="8.0 GiB"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=INFO source=amd_linux.go:305 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx90c driver=0.0 name=1002:164c total="8.0 GiB" available="8.0 GiB"

on Docker

2024/06/20 00:22:55 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-20T00:22:55.761Z level=INFO source=images.go:725 msg="total blobs: 10"
time=2024-06-20T00:22:55.762Z level=INFO source=images.go:732 msg="total unused blobs removed: 0"
time=2024-06-20T00:22:55.763Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.44)"
time=2024-06-20T00:22:55.765Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1629984131/runners
time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz
time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz
time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1629984131/runners/cpu
time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1629984131/runners/cpu_avx
time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1629984131/runners/cpu_avx2
time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1629984131/runners/cuda_v11
time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1629984131/runners/rocm_v60002
time=2024-06-20T00:22:58.309Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-06-20T00:22:58.309Z level=DEBUG source=sched.go:90 msg="starting llm scheduler"
time=2024-06-20T00:22:58.309Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-20T00:22:58.309Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-20T00:22:58.309Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-20T00:22:58.314Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[]
time=2024-06-20T00:22:58.314Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-20T00:22:58.314Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama1629984131/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-20T00:22:58.315Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[/tmp/ollama1629984131/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-20T00:22:58.316Z level=DEBUG source=gpu.go:339 msg="Unable to load cudart" library=/tmp/ollama1629984131/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-20T00:22:58.316Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-20T00:22:58.316Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="8.0 GiB"
time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="8.0 GiB"
time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-20T00:22:58.317Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
time=2024-06-20T00:22:58.317Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx90c driver=0.0 name=1002:164c total="8.0 GiB" available="8.0 GiB"
<!-- gh-comment-id:2179606078 --> @smellouk commented on GitHub (Jun 20, 2024): @dhiltgen this is current logs, but I can't spot any difference, any help from you is appreciated 🙏 ON LXC ``` Jun 20 00:11:55 ai-llm ollama[377]: 2024/06/20 00:11:55 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.874Z level=INFO source=images.go:704 msg="total blobs: 5" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.874Z level=INFO source=images.go:711 msg="total unused blobs removed: 0" Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. Jun 20 00:11:55 ai-llm ollama[377]: - using env: export GIN_MODE=release Jun 20 00:11:55 ai-llm ollama[377]: - using code: gin.SetMode(gin.ReleaseMode) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullModelHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateModelHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushModelHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyModelHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteModelHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowModelHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).ProcessHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListModelsHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListModelsHandler-fm (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=INFO source=routes.go:1054 msg="Listening on 127.0.0.1:11434 (version 0.1.44)" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama950310909/runners Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60101 file=build/linux/x86_64/rocm_v60101/bin/deps.txt.gz Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.875Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60101 file=build/linux/x86_64/rocm_v60101/bin/ollama_llama_server.gz Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama950310909/runners/cpu Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama950310909/runners/cpu_avx Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama950310909/runners/cpu_avx2 Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama950310909/runners/rocm_v60101 Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 rocm_v60101 cpu cpu_avx]" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=sched.go:90 msg="starting llm scheduler" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=gpu.go:122 msg="Detecting GPUs" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=gpu.go:261 msg="Searching for GPU library" name=libcuda.so* Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=gpu.go:280 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.916Z level=DEBUG source=gpu.go:313 msg="discovered GPU libraries" paths=[] Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.916Z level=DEBUG source=gpu.go:261 msg="Searching for GPU library" name=libcudart.so* Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.916Z level=DEBUG source=gpu.go:280 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama950310909/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=gpu.go:313 msg="discovered GPU libraries" paths=[] Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 total="8.0 GiB" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_linux.go:244 msg="amdgpu memory" gpu=0 available="8.0 GiB" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=INFO source=amd_linux.go:305 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.917Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx90c driver=0.0 name=1002:164c total="8.0 GiB" available="8.0 GiB" ``` on Docker ``` 2024/06/20 00:22:55 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" time=2024-06-20T00:22:55.761Z level=INFO source=images.go:725 msg="total blobs: 10" time=2024-06-20T00:22:55.762Z level=INFO source=images.go:732 msg="total unused blobs removed: 0" time=2024-06-20T00:22:55.763Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.44)" time=2024-06-20T00:22:55.765Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1629984131/runners time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz time=2024-06-20T00:22:55.765Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1629984131/runners/cpu time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1629984131/runners/cpu_avx time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1629984131/runners/cpu_avx2 time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1629984131/runners/cuda_v11 time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1629984131/runners/rocm_v60002 time=2024-06-20T00:22:58.309Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" time=2024-06-20T00:22:58.309Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-06-20T00:22:58.309Z level=DEBUG source=sched.go:90 msg="starting llm scheduler" time=2024-06-20T00:22:58.309Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-20T00:22:58.309Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-20T00:22:58.309Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-20T00:22:58.314Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[] time=2024-06-20T00:22:58.314Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-20T00:22:58.314Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama1629984131/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-20T00:22:58.315Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[/tmp/ollama1629984131/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-20T00:22:58.316Z level=DEBUG source=gpu.go:339 msg="Unable to load cudart" library=/tmp/ollama1629984131/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-20T00:22:58.316Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-20T00:22:58.316Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="8.0 GiB" time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="8.0 GiB" time=2024-06-20T00:22:58.316Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-20T00:22:58.317Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 time=2024-06-20T00:22:58.317Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx90c driver=0.0 name=1002:164c total="8.0 GiB" available="8.0 GiB" ```
Author
Owner

@smellouk commented on GitHub (Jun 20, 2024):

okay chatgpt did a great job 😆
Here are the differences between the two logs:

Configuration Differences

  • LXC Log:
    • OLLAMA_ORIGINS: [http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*]
  • Docker Log:
    • OLLAMA_ORIGINS: [http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*]
    • OLLAMA_FLASH_ATTENTION: false
    • OLLAMA_HOST: http://0.0.0.0:11434
    • OLLAMA_KEEP_ALIVE:
    • OLLAMA_MODELS: /root/.ollama/models
    • OLLAMA_NOHISTORY: false

Total Blobs

  • LXC Log:
    • total blobs: 5
  • Docker Log:
    • total blobs: 10

Listening Address and Version

  • LXC Log:
    • Listening on: 127.0.0.1:11434
    • Version: 0.1.44
  • Docker Log:
    • Listening on: [::]:11434
    • Version: 0.1.44

Extracting Variants

  • LXC Log:
    • Variants being extracted: cpu, cpu_avx, cpu_avx2, rocm_v60101
  • Docker Log:
    • Variants being extracted: cpu, cpu_avx, cpu_avx2, cuda_v11, rocm_v60002
  • LXC Log:
    • No GPU libraries discovered: paths=[]
  • Docker Log:
    • No equivalent log entries for GPU libraries search

Additional Variants in Docker

  • Docker Log includes additional CUDA variants being extracted:
    • variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
    • variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
    • variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
    • variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz

Time Differences

  • LXC Log:
    • Log entries around 00:11:55.875Z to 00:11:55.917Z
  • Docker Log:
    • Log entries around 00:22:55.761Z to 00:22:58.309Z

Functionality Warnings

  • LXC Log:
    • Warning about amdgpu version file missing: ollama recommends running the https://www.amd.com/en/support/linux-drivers
  • Docker Log:
    • No equivalent warning messages

File Paths for Runners

  • LXC Log:
    • dir=/tmp/ollama950310909/runners
  • Docker Log:
    • dir=/tmp/ollama1629984131/runners

These differences indicate variations in the configuration, environment variables, and behavior between the two logs from the LXC and Docker environments.

<!-- gh-comment-id:2179608590 --> @smellouk commented on GitHub (Jun 20, 2024): okay chatgpt did a great job 😆 Here are the differences between the two logs: ### Configuration Differences - **LXC Log**: - **OLLAMA_ORIGINS**: `[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*]` - **Docker Log**: - **OLLAMA_ORIGINS**: `[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*]` - **OLLAMA_FLASH_ATTENTION**: `false` - **OLLAMA_HOST**: `http://0.0.0.0:11434` - **OLLAMA_KEEP_ALIVE**: - **OLLAMA_MODELS**: `/root/.ollama/models` - **OLLAMA_NOHISTORY**: `false` ### Total Blobs - **LXC Log**: - **total blobs**: `5` - **Docker Log**: - **total blobs**: `10` ### Listening Address and Version - **LXC Log**: - **Listening on**: `127.0.0.1:11434` - **Version**: `0.1.44` - **Docker Log**: - **Listening on**: `[::]:11434` - **Version**: `0.1.44` ### Extracting Variants - **LXC Log**: - Variants being extracted: `cpu`, `cpu_avx`, `cpu_avx2`, `rocm_v60101` - **Docker Log**: - Variants being extracted: `cpu`, `cpu_avx`, `cpu_avx2`, `cuda_v11`, `rocm_v60002` ### GPU Libraries Search - **LXC Log**: - No GPU libraries discovered: `paths=[]` - **Docker Log**: - No equivalent log entries for GPU libraries search ### Additional Variants in Docker - **Docker Log** includes additional CUDA variants being extracted: - `variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz` - `variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz` - `variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz` - `variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz` ### Time Differences - **LXC Log**: - Log entries around `00:11:55.875Z` to `00:11:55.917Z` - **Docker Log**: - Log entries around `00:22:55.761Z` to `00:22:58.309Z` ### Functionality Warnings - **LXC Log**: - Warning about `amdgpu` version file missing: `ollama recommends running the https://www.amd.com/en/support/linux-drivers` - **Docker Log**: - No equivalent warning messages ### File Paths for Runners - **LXC Log**: - `dir=/tmp/ollama950310909/runners` - **Docker Log**: - `dir=/tmp/ollama1629984131/runners` These differences indicate variations in the configuration, environment variables, and behavior between the two logs from the LXC and Docker environments.
Author
Owner

@smellouk commented on GitHub (Jun 20, 2024):

The logs after trying to load the tinyllama on LXC

Jun 20 01:00:25 ai-llm ollama[15472]: [GIN] 2024/06/20 - 01:00:25 | 200 |      20.256µs |       127.0.0.1 | HEAD     "/"
Jun 20 01:00:25 ai-llm ollama[15472]: [GIN] 2024/06/20 - 01:00:25 | 200 |     313.398µs |       127.0.0.1 | POST     "/api/show"
Jun 20 01:00:25 ai-llm ollama[15472]: [GIN] 2024/06/20 - 01:00:25 | 200 |     466.989µs |       127.0.0.1 | POST     "/api/show"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.291Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.291Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.291Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[]
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama4143895818/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[/tmp/ollama4143895818/runners/cuda_v11/libcudart.so.11.0]
Jun 20 01:00:25 ai-llm ollama[15472]: cudaSetDevice err: 35
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=gpu.go:339 msg="Unable to load cudart" library=/tmp/ollama4143895818/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="8.0 GiB"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="8.0 GiB"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(*llm.containerGGUF)(0xc000914d40), kv:llm.KV{}, tensors:[]*llm.Tensor(nil), parameters:0x0}"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.606Z level=DEBUG source=sched.go:153 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.606Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="8.0 GiB"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.606Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0 GiB" memory.required.full="1.2 GiB" memory.required.partial="1.2 GiB" memory.required.kv="44.0 MiB" memory.weights.total="571.4 MiB" memory.weights.repeating="520.1 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="144.3 MiB"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.606Z level=DEBUG source=sched.go:563 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 gpu=0 available=8589934592 required="1.2 GiB"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.606Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="8.0 GiB"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0 GiB" memory.required.full="1.2 GiB" memory.required.partial="1.2 GiB" memory.required.kv="44.0 MiB" memory.weights.total="571.4 MiB" memory.weights.repeating="520.1 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="144.3 MiB"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu_avx
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu_avx2
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cuda_v11
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/rocm_v60002
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu_avx
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu_avx2
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cuda_v11
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/rocm_v60002
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama4143895818/runners/rocm_v60002/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 23 --verbose --parallel 1 --port 40601"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=server.go:356 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama4143895818/runners/rocm_v60002 HIP_VISIBLE_DEVICES=0]"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=INFO source=sched.go:338 msg="loaded runners" count=1
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error"
Jun 20 01:00:25 ai-llm ollama[15531]: INFO [main] build info | build=1 commit="5921b8f" tid="130082635385920" timestamp=1718845225
Jun 20 01:00:25 ai-llm ollama[15531]: INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="130082635385920" timestamp=1718845225 total_threads=8
Jun 20 01:00:25 ai-llm ollama[15531]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="40601" tid="130082635385920" timestamp=1718845225
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest))
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv   1:                               general.name str              = TinyLlama
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv   4:                          llama.block_count u32              = 22
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  11:                          general.file_type u32              = 2
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,61249]   = ["▁ t", "e r", "i n", "▁ a", "e n...
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 2
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% for message in messages %}\n{% if m...
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv  22:               general.quantization_version u32              = 2
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - type  f32:   45 tensors
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - type q4_0:  155 tensors
Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - type q6_K:    1 tensors
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_vocab: special tokens cache size = 259
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_vocab: token to piece cache size = 0.3368 MB
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: format           = GGUF V3 (latest)
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: arch             = llama
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: vocab type       = SPM
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_vocab          = 32000
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_merges         = 0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_ctx_train      = 2048
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_embd           = 2048
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_head           = 32
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_head_kv        = 4
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_layer          = 22
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_rot            = 64
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_embd_head_k    = 64
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_embd_head_v    = 64
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_gqa            = 8
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_embd_k_gqa     = 256
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_embd_v_gqa     = 256
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_ff             = 5632
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_expert         = 0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_expert_used    = 0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: causal attn      = 1
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: pooling type     = 0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: rope type        = 0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: rope scaling     = linear
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: freq_base_train  = 10000.0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: freq_scale_train = 1
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_yarn_orig_ctx  = 2048
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: rope_finetuned   = unknown
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: ssm_d_conv       = 0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: ssm_d_inner      = 0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: ssm_d_state      = 0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: ssm_dt_rank      = 0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: model type       = 1B
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: model ftype      = Q4_0
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: model params     = 1.10 B
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: model size       = 606.53 MiB (4.63 BPW)
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: general.name     = TinyLlama
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: BOS token        = 1 '<s>'
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: EOS token        = 2 '</s>'
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: UNK token        = 0 '<unk>'
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: PAD token        = 2 '</s>'
Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Jun 20 01:00:25 ai-llm ollama[15472]: rocBLAS error: Could not initialize Tensile host: No devices found
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: signal: aborted error:Could not initialize Tensile host: No devices found"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=sched.go:347 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=sched.go:258 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=sched.go:274 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 20 01:00:25 ai-llm ollama[15472]: [GIN] 2024/06/20 - 01:00:25 | 500 |    567.3885ms |       127.0.0.1 | POST     "/api/chat"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.861Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[]
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.861Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.861Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama4143895818/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.861Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[/tmp/ollama4143895818/runners/cuda_v11/libcudart.so.11.0]
Jun 20 01:00:25 ai-llm ollama[15472]: cudaSetDevice err: 35
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=gpu.go:339 msg="Unable to load cudart" library=/tmp/ollama4143895818/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="8.0 GiB"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="8.0 GiB"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=server.go:990 msg="stopping llama server"
Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=sched.go:279 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816
Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.113Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.113Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.113Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.114Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[]
Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.114Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.114Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama4143895818/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.115Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[/tmp/ollama4143895818/runners/cuda_v11/libcudart.so.11.0]
<!-- gh-comment-id:2179638207 --> @smellouk commented on GitHub (Jun 20, 2024): The logs after trying to load the tinyllama on LXC ``` Jun 20 01:00:25 ai-llm ollama[15472]: [GIN] 2024/06/20 - 01:00:25 | 200 | 20.256µs | 127.0.0.1 | HEAD "/" Jun 20 01:00:25 ai-llm ollama[15472]: [GIN] 2024/06/20 - 01:00:25 | 200 | 313.398µs | 127.0.0.1 | POST "/api/show" Jun 20 01:00:25 ai-llm ollama[15472]: [GIN] 2024/06/20 - 01:00:25 | 200 | 466.989µs | 127.0.0.1 | POST "/api/show" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.291Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.291Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.291Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[] Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama4143895818/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[/tmp/ollama4143895818/runners/cuda_v11/libcudart.so.11.0] Jun 20 01:00:25 ai-llm ollama[15472]: cudaSetDevice err: 35 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=gpu.go:339 msg="Unable to load cudart" library=/tmp/ollama4143895818/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.293Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="8.0 GiB" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="8.0 GiB" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.294Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(*llm.containerGGUF)(0xc000914d40), kv:llm.KV{}, tensors:[]*llm.Tensor(nil), parameters:0x0}" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.606Z level=DEBUG source=sched.go:153 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.606Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="8.0 GiB" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.606Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0 GiB" memory.required.full="1.2 GiB" memory.required.partial="1.2 GiB" memory.required.kv="44.0 MiB" memory.weights.total="571.4 MiB" memory.weights.repeating="520.1 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="144.3 MiB" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.606Z level=DEBUG source=sched.go:563 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 gpu=0 available=8589934592 required="1.2 GiB" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.606Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="8.0 GiB" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=23 memory.available="8.0 GiB" memory.required.full="1.2 GiB" memory.required.partial="1.2 GiB" memory.required.kv="44.0 MiB" memory.weights.total="571.4 MiB" memory.weights.repeating="520.1 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="148.0 MiB" memory.graph.partial="144.3 MiB" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu_avx Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu_avx2 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cuda_v11 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/rocm_v60002 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu_avx Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cpu_avx2 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/cuda_v11 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4143895818/runners/rocm_v60002 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama4143895818/runners/rocm_v60002/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 23 --verbose --parallel 1 --port 40601" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=DEBUG source=server.go:356 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama4143895818/runners/rocm_v60002 HIP_VISIBLE_DEVICES=0]" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=INFO source=sched.go:338 msg="loaded runners" count=1 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.607Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error" Jun 20 01:00:25 ai-llm ollama[15531]: INFO [main] build info | build=1 commit="5921b8f" tid="130082635385920" timestamp=1718845225 Jun 20 01:00:25 ai-llm ollama[15531]: INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="130082635385920" timestamp=1718845225 total_threads=8 Jun 20 01:00:25 ai-llm ollama[15531]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="40601" tid="130082635385920" timestamp=1718845225 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest)) Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 0: general.architecture str = llama Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 1: general.name str = TinyLlama Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 2: llama.context_length u32 = 2048 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 4: llama.block_count u32 = 22 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 11: general.file_type u32 = 2 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 12: tokenizer.ggml.model str = llama Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m... Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - kv 22: general.quantization_version u32 = 2 Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - type f32: 45 tensors Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - type q4_0: 155 tensors Jun 20 01:00:25 ai-llm ollama[15472]: llama_model_loader: - type q6_K: 1 tensors Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_vocab: special tokens cache size = 259 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_vocab: token to piece cache size = 0.3368 MB Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: format = GGUF V3 (latest) Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: arch = llama Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: vocab type = SPM Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_vocab = 32000 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_merges = 0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_ctx_train = 2048 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_embd = 2048 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_head = 32 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_head_kv = 4 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_layer = 22 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_rot = 64 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_embd_head_k = 64 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_embd_head_v = 64 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_gqa = 8 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_embd_k_gqa = 256 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_embd_v_gqa = 256 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: f_norm_eps = 0.0e+00 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: f_logit_scale = 0.0e+00 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_ff = 5632 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_expert = 0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_expert_used = 0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: causal attn = 1 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: pooling type = 0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: rope type = 0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: rope scaling = linear Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: freq_base_train = 10000.0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: freq_scale_train = 1 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: n_yarn_orig_ctx = 2048 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: rope_finetuned = unknown Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: ssm_d_conv = 0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: ssm_d_inner = 0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: ssm_d_state = 0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: ssm_dt_rank = 0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: model type = 1B Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: model ftype = Q4_0 Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: model params = 1.10 B Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: model size = 606.53 MiB (4.63 BPW) Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: general.name = TinyLlama Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: BOS token = 1 '<s>' Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: EOS token = 2 '</s>' Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: UNK token = 0 '<unk>' Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: PAD token = 2 '</s>' Jun 20 01:00:25 ai-llm ollama[15472]: llm_load_print_meta: LF token = 13 '<0x0A>' Jun 20 01:00:25 ai-llm ollama[15472]: rocBLAS error: Could not initialize Tensile host: No devices found Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: signal: aborted error:Could not initialize Tensile host: No devices found" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=sched.go:347 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=sched.go:258 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=sched.go:274 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.858Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 20 01:00:25 ai-llm ollama[15472]: [GIN] 2024/06/20 - 01:00:25 | 500 | 567.3885ms | 127.0.0.1 | POST "/api/chat" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.861Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[] Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.861Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.861Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama4143895818/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.861Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[/tmp/ollama4143895818/runners/cuda_v11/libcudart.so.11.0] Jun 20 01:00:25 ai-llm ollama[15472]: cudaSetDevice err: 35 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=gpu.go:339 msg="Unable to load cudart" library=/tmp/ollama4143895818/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="8.0 GiB" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="8.0 GiB" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=server.go:990 msg="stopping llama server" Jun 20 01:00:25 ai-llm ollama[15472]: time=2024-06-20T01:00:25.862Z level=DEBUG source=sched.go:279 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.113Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.113Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.113Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.114Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[] Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.114Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.114Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama4143895818/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" Jun 20 01:00:26 ai-llm ollama[15472]: time=2024-06-20T01:00:26.115Z level=DEBUG source=gpu.go:327 msg="discovered GPU libraries" paths=[/tmp/ollama4143895818/runners/cuda_v11/libcudart.so.11.0] ```
Author
Owner

@smellouk commented on GitHub (Jun 20, 2024):

So it seems it's trying to do something related to CUDA 🤔

library was not found (discovered GPU libraries paths=[])
cudaSetDevice err: 35
error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama"
<!-- gh-comment-id:2179642451 --> @smellouk commented on GitHub (Jun 20, 2024): So it seems it's trying to do something related to CUDA 🤔 ``` library was not found (discovered GPU libraries paths=[]) cudaSetDevice err: 35 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" ```
Author
Owner

@dhiltgen commented on GitHub (Jun 20, 2024):

Do you have rocm installed inside the LXC host? If so, then I think what may be going on is the host version of ROCm is behaving differently than the bundled ROCm in the container image.

<!-- gh-comment-id:2181697968 --> @dhiltgen commented on GitHub (Jun 20, 2024): Do you have rocm installed inside the LXC host? If so, then I think what may be going on is the host version of ROCm is behaving differently than the bundled ROCm in the container image.
Author
Owner

@smellouk commented on GitHub (Jun 21, 2024):

@dhiltgen yes I have installed rocm on the lxc machine, it's weird that it behaves differently, is it because lxc based on ubuntu and docker image is based on centos ? Also any suggestion of how I should proceed next ?

<!-- gh-comment-id:2182032885 --> @smellouk commented on GitHub (Jun 21, 2024): @dhiltgen yes I have installed rocm on the lxc machine, it's weird that it behaves differently, is it because lxc based on ubuntu and docker image is based on centos ? Also any suggestion of how I should proceed next ?
Author
Owner

@dhiltgen commented on GitHub (Jun 21, 2024):

If you aren't using ROCm for anything else on the host, a potential workaround is uninstall it, but we shouldn't stumble on a ROCm install and fail like this. Can you share what version of ROCm you installed?

<!-- gh-comment-id:2182992404 --> @dhiltgen commented on GitHub (Jun 21, 2024): If you aren't using ROCm for anything else on the host, a potential workaround is uninstall it, but we shouldn't stumble on a ROCm install and fail like this. Can you share what version of ROCm you installed?
Author
Owner

@smellouk commented on GitHub (Jun 21, 2024):

I can give it a try to not install rocm, I will share my findings.
Regarding the rocm version, on LXC I'm using "6.1.1".

<!-- gh-comment-id:2183002326 --> @smellouk commented on GitHub (Jun 21, 2024): I can give it a try to not install rocm, I will share my findings. Regarding the rocm version, on LXC I'm using "6.1.1".
Author
Owner

@smellouk commented on GitHub (Jun 21, 2024):

If you aren't using ROCm for anything else on the host, a potential workaround is uninstall it, but we shouldn't stumble on a ROCm install and fail like this.

Unfortunately, even without ROCm, it's still failing with the same error after setting "HSA_OVERRIDE_GFX_VERSION"

<!-- gh-comment-id:2183031541 --> @smellouk commented on GitHub (Jun 21, 2024): > If you aren't using ROCm for anything else on the host, a potential workaround is uninstall it, but we shouldn't stumble on a ROCm install and fail like this. Unfortunately, even without ROCm, it's still failing with the same error after setting "HSA_OVERRIDE_GFX_VERSION"
Author
Owner

@AlexHe99 commented on GitHub (Jul 4, 2024):

@smellouk

I can give it a try to not install rocm, I will share my findings. Regarding the rocm version, on LXC I'm using "6.1.1".

There is rocm_v60002 in docker ollama/rocm but it is newer Version: 6.1.1.60101-90~22.04
in host/LXC.

  1. Would you mind to try rocm6.0 at host/LXC?

  2. Another way may for trying is to use env "OLLAMA_LLM_LIBRARY=rocm_v60xxx" (according to the real ROCm version in your environment checked by the ollama log.

One more thing, have you make sure the iGPU passthrough successfully to the LXC. I do not have any experience about LXC. But https://forum.proxmox.com/threads/how-to-amdgpu-on-proxmox-7.95285/ show there have some extra steps about it.

''Reference log about OLLAMA_LLM_LIBRARY"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 rocm_v60101 cpu cpu_avx]"
Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
<!-- gh-comment-id:2207887347 --> @AlexHe99 commented on GitHub (Jul 4, 2024): @smellouk > I can give it a try to not install rocm, I will share my findings. Regarding the rocm version, on LXC I'm using "6.1.1". There is rocm_v60002 in docker ollama/rocm but it is newer Version: 6.1.1.60101-90~22.04 in host/LXC. 1. Would you mind to try rocm6.0 at host/LXC? 2. Another way may for trying is to use env "OLLAMA_LLM_LIBRARY=rocm_v60xxx" (according to the real ROCm version in your environment checked by the ollama log. One more thing, have you make sure the iGPU passthrough successfully to the LXC. I do not have any experience about LXC. But https://forum.proxmox.com/threads/how-to-amdgpu-on-proxmox-7.95285/ show there have some extra steps about it. ``` ''Reference log about OLLAMA_LLM_LIBRARY" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 rocm_v60101 cpu cpu_avx]" Jun 20 00:11:55 ai-llm ollama[377]: time=2024-06-20T00:11:55.914Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" ```
Author
Owner

@mathatan commented on GitHub (Aug 2, 2024):

I'm also running into a similar issue with rocBLAS failing with: Could not initialize Tensile host: No devices found, although I have a discrete Radeon Vega64. Ollama was working perfectly fine until I upgraded it to the latest from 1.0.48. (I also updated the drivers and Truenas (to 24.04.2) today although Ollama was working perfectly well before I upgraded it/drivers, now even installing the old version won't help)

I'm running Ollama in a Truenas Scale Jailmaker jail.

Environment:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"
Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0"
Environment="OLLAMA_NOHISTORY=1"
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_DEBUG=1"
Environment="AMD_SERIALIZE_KERNEL=3"

Logs:

Aug  2 20:05:43 ollama-ubuntu-jammy-2 systemd[1]: Started Ollama Service.
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: 2024/08/02 20:05:43 routes.go:1109: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.405+03:00 level=INFO source=images.go:781 msg="total blobs: 77"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.407+03:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=routes.go:1156 msg="Listening on [::]:11434 (version 0.3.2)"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama802441264/runners
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60102 cpu]"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[]
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so*
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama802441264/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0]
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: cudaSetDevice err: 35
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="8.0 GiB"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="8.0 GiB"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 |      42.189µs |       127.0.0.1 | HEAD     "/"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 |    4.888907ms |       127.0.0.1 | POST     "/api/show"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.4 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x816020 gpu_count=1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 parallel=4 available=8564838400 required="6.1 GiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=server.go:100 msg="system memory" total="62.7 GiB" free="44.3 GiB" free_swap="0 B"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="6.1 GiB" memory.required.partial="6.1 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[6.1 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=server.go:170 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 46445"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=DEBUG source=server.go:401 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama802441264/runners/rocm_v60102:/tmp/ollama802441264/runners HIP_VISIBLE_DEVICES=0]"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=sched.go:445 msg="loaded runners" count=1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] build info | build=1 commit="6eeaeba" tid="139832002468928" timestamp=1722618350
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139832002468928" timestamp=1722618350 total_threads=24
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="46445" tid="139832002468928" timestamp=1722618350
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest))
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   0:                       general.architecture str              = phi3
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   1:                               general.type str              = model
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   2:                               general.name str              = Phi 3 Mini 128k Instruct
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   3:                           general.finetune str              = 128k-instruct
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   4:                           general.basename str              = Phi-3
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   5:                         general.size_label str              = mini
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   6:                            general.license str              = mit
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/microsoft/Phi-...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   8:                               general.tags arr[str,3]       = ["nlp", "code", "text-generation"]
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["en"]
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  10:                        phi3.context_length u32              = 131072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  11:  phi3.rope.scaling.original_context_length u32              = 4096
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  12:                      phi3.embedding_length u32              = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  13:                   phi3.feed_forward_length u32              = 8192
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  14:                           phi3.block_count u32              = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  15:                  phi3.attention.head_count u32              = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  16:               phi3.attention.head_count_kv u32              = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  17:      phi3.attention.layer_norm_rms_epsilon f32              = 0.000010
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  18:                  phi3.rope.dimension_count u32              = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  19:                        phi3.rope.freq_base f32              = 10000.000000
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  20:                          general.file_type u32              = 2
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  21:              phi3.attention.sliding_window u32              = 262144
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  22:              phi3.rope.scaling.attn_factor f32              = 1.190238
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = llama
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = default
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  26:                      tokenizer.ggml.scores arr[f32,32064]   = [-1000.000000, -1000.000000, -1000.00...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,32064]   = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 32000
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 32000
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  33:               tokenizer.ggml.add_eos_token bool             = false
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  35:               general.quantization_version u32              = 2
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type  f32:   67 tensors
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q4_0:  129 tensors
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q6_K:    1 tensors
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: special tokens cache size = 14
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: token to piece cache size = 0.1685 MB
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: format           = GGUF V3 (latest)
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: arch             = phi3
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab type       = SPM
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_vocab          = 32064
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_merges         = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab_only       = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_train      = 131072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd           = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_layer          = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head           = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head_kv        = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_rot            = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_swa            = 262144
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_k    = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_v    = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_gqa            = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_k_gqa     = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_v_gqa     = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ff             = 8192
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert         = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert_used    = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: causal attn      = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: pooling type     = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope type        = 2
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope scaling     = linear
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_base_train  = 10000.0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_scale_train = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_orig_yarn  = 4096
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope_finetuned   = unknown
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_conv       = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_inner      = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_state      = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_dt_rank      = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model type       = 3B
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model ftype      = Q4_0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model params     = 3.82 B
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model size       = 2.03 GiB (4.55 BPW)
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: general.name     = Phi 3 Mini 128k Instruct
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: BOS token        = 1 '<s>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOS token        = 32000 '<|endoftext|>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: UNK token        = 0 '<unk>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: PAD token        = 32000 '<|endoftext|>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOT token        = 32007 '<|end|>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: max token length = 48
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: rocBLAS error: Could not initialize Tensile host: No devices found
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.997+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server not responding"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.086+03:00 level=DEBUG source=server.go:424 msg="llama runner terminated" error="signal: aborted (core dumped)"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=ERROR source=sched.go:451 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:454 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:355 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:371 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:52 | 500 |  1.613482132s |       127.0.0.1 | POST     "/api/chat"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=server.go:1042 msg="stopping llama server"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:376 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"

As can be seen, it detects the GPU perfectly fine and tries to use it, but for some reason rocBLAS fails.

ROCM-SMI:

======================================= ROCm System Management Interface =======================================
================================================= Concise Info =================================================
Device  [Model : Revision]    Temp    Power     Partitions      SCLK    MCLK    Fan  Perf  PwrCap  VRAM%  GPU%
        Name (20 chars)       (Edge)  (Socket)  (Mem, Compute)
================================================================================================================
0       [0x2308 : 0xc1]       45.0°C  4.0W      N/A, N/A        852Mhz  167Mhz  0%   auto  247.0W    0%   0%
        Vega 10 XL/XT [Radeo
================================================================================================================
============================================= End of ROCm SMI Log ==============================================

rocminfo:

ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 9 3900X 12-Core Processor
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen 9 3900X 12-Core Processor
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   3800
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            24
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    65771684(0x3eb98a4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65771684(0x3eb98a4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    65771684(0x3eb98a4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx900
  Uuid:                    GPU-021504f1231031a4
  Marketing Name:          Radeon RX Vega
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      4096(0x1000) KB
  Chip ID:                 26751(0x687f)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   1630
  BDFID:                   2560
  Internal Node ID:        1
  Compute Unit:            64
  SIMDs per CU:            4
  Shader Engines:          4
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Coherent Host Access:    FALSE
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          64(0x40)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        40(0x28)
  Max Work-item Per CU:    2560(0xa00)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 468
  SDMA engine uCode::      434
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx900:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

apt show rocm-libs -a (version)

Package: rocm-libs
Version: 6.0.2.60002-115~22.04
Priority: optional
Section: devel
Maintainer: ROCm Dev Support <rocm-dev.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 2.0.0.60002-115~22.04), hipblaslt (= 0.6.0.60002-115~22.04), hipfft (= 1.0.13.60002-115~22.04), hipsolver (= 2.0.0.60002-115~22.04), hipsparse (= 3.0.0.60002-115~22.04), hiptensor (= 1.1.0.60002-115~22.04), miopen-hip (= 3.00.0.60002-115~22.04), half (= 1.12.0.60002-115~22.04), rccl (= 2.18.3.60002-115~22.04), rocalution (= 3.0.3.60002-115~22.04), rocblas (= 4.0.0.60002-115~22.04), rocfft (= 1.0.25.60002-115~22.04), rocrand (= 3.0.0.60002-115~22.04), hiprand (= 2.10.16.60002-115~22.04), rocsolver (= 3.24.0.60002-115~22.04), rocsparse (= 3.0.2.60002-115~22.04), rocm-core (= 6.0.2.60002-115~22.04), composablekernel-dev (= 1.1.0.60002-115~22.04), hipblas-dev (= 2.0.0.60002-115~22.04), hipblaslt-dev (= 0.6.0.60002-115~22.04), hipcub-dev (= 3.0.0.60002-115~22.04), hipfft-dev (= 1.0.13.60002-115~22.04), hipsolver-dev (= 2.0.0.60002-115~22.04), hipsparse-dev (= 3.0.0.60002-115~22.04), hiptensor-dev (= 1.1.0.60002-115~22.04), miopen-hip-dev (= 3.00.0.60002-115~22.04), rccl-dev (= 2.18.3.60002-115~22.04), rocalution-dev (= 3.0.3.60002-115~22.04), rocblas-dev (= 4.0.0.60002-115~22.04), rocfft-dev (= 1.0.25.60002-115~22.04), rocprim-dev (= 3.0.0.60002-115~22.04), rocrand-dev (= 3.0.0.60002-115~22.04), hiprand-dev (= 2.10.16.60002-115~22.04), rocsolver-dev (= 3.24.0.60002-115~22.04), rocsparse-dev (= 3.0.2.60002-115~22.04), rocthrust-dev (= 3.0.0.60002-115~22.04), rocwmma-dev (= 1.3.0.60002-115~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1,050 B
APT-Sources: https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack
<!-- gh-comment-id:2265824021 --> @mathatan commented on GitHub (Aug 2, 2024): I'm also running into a similar issue with rocBLAS failing with: `Could not initialize Tensile host: No devices found`, although I have a discrete Radeon Vega64. Ollama was working perfectly fine until I upgraded it to the latest from 1.0.48. (I also updated the drivers and Truenas (to 24.04.2) today although Ollama was working perfectly well before I upgraded it/drivers, now even installing the old version won't help) I'm running Ollama in a Truenas Scale Jailmaker jail. Environment: ``` [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0" Environment="OLLAMA_NOHISTORY=1" Environment="HSA_ENABLE_SDMA=0" Environment="OLLAMA_DEBUG=1" Environment="AMD_SERIALIZE_KERNEL=3" ``` Logs: ``` Aug 2 20:05:43 ollama-ubuntu-jammy-2 systemd[1]: Started Ollama Service. Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: 2024/08/02 20:05:43 routes.go:1109: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.405+03:00 level=INFO source=images.go:781 msg="total blobs: 77" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.407+03:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=routes.go:1156 msg="Listening on [::]:11434 (version 0.3.2)" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama802441264/runners Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60102 cpu]" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[] Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so* Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama802441264/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0] Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: cudaSetDevice err: 35 Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932 Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="8.0 GiB" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="8.0 GiB" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 | 42.189µs | 127.0.0.1 | HEAD "/" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 | 4.888907ms | 127.0.0.1 | POST "/api/show" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.4 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x816020 gpu_count=1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 parallel=4 available=8564838400 required="6.1 GiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=server.go:100 msg="system memory" total="62.7 GiB" free="44.3 GiB" free_swap="0 B" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="6.1 GiB" memory.required.partial="6.1 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[6.1 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=server.go:170 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 46445" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=DEBUG source=server.go:401 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama802441264/runners/rocm_v60102:/tmp/ollama802441264/runners HIP_VISIBLE_DEVICES=0]" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=sched.go:445 msg="loaded runners" count=1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] build info | build=1 commit="6eeaeba" tid="139832002468928" timestamp=1722618350 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139832002468928" timestamp=1722618350 total_threads=24 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="46445" tid="139832002468928" timestamp=1722618350 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest)) Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 0: general.architecture str = phi3 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 1: general.type str = model Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 2: general.name str = Phi 3 Mini 128k Instruct Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 3: general.finetune str = 128k-instruct Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 4: general.basename str = Phi-3 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 5: general.size_label str = mini Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 6: general.license str = mit Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/microsoft/Phi-... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 8: general.tags arr[str,3] = ["nlp", "code", "text-generation"] Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 9: general.languages arr[str,1] = ["en"] Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 10: phi3.context_length u32 = 131072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 11: phi3.rope.scaling.original_context_length u32 = 4096 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 12: phi3.embedding_length u32 = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 13: phi3.feed_forward_length u32 = 8192 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 14: phi3.block_count u32 = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 15: phi3.attention.head_count u32 = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 16: phi3.attention.head_count_kv u32 = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 17: phi3.attention.layer_norm_rms_epsilon f32 = 0.000010 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 18: phi3.rope.dimension_count u32 = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 19: phi3.rope.freq_base f32 = 10000.000000 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 20: general.file_type u32 = 2 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 21: phi3.attention.sliding_window u32 = 262144 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 22: phi3.rope.scaling.attn_factor f32 = 1.190238 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 23: tokenizer.ggml.model str = llama Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = default Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,32064] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 26: tokenizer.ggml.scores arr[f32,32064] = [-1000.000000, -1000.000000, -1000.00... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,32064] = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 32000 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 32000 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 33: tokenizer.ggml.add_eos_token bool = false Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 34: tokenizer.chat_template str = {% for message in messages %}{% if me... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 35: general.quantization_version u32 = 2 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type f32: 67 tensors Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q4_0: 129 tensors Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q6_K: 1 tensors Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: special tokens cache size = 14 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: token to piece cache size = 0.1685 MB Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: format = GGUF V3 (latest) Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: arch = phi3 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab type = SPM Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_vocab = 32064 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_merges = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab_only = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_train = 131072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_layer = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head_kv = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_rot = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_swa = 262144 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_k = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_v = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_gqa = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_k_gqa = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_v_gqa = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_eps = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_logit_scale = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ff = 8192 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert_used = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: causal attn = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: pooling type = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope type = 2 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope scaling = linear Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_base_train = 10000.0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_scale_train = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_orig_yarn = 4096 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope_finetuned = unknown Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_conv = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_inner = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_state = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_dt_rank = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model type = 3B Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model ftype = Q4_0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model params = 3.82 B Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model size = 2.03 GiB (4.55 BPW) Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: general.name = Phi 3 Mini 128k Instruct Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: BOS token = 1 '<s>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOS token = 32000 '<|endoftext|>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: UNK token = 0 '<unk>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: PAD token = 32000 '<|endoftext|>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: LF token = 13 '<0x0A>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOT token = 32007 '<|end|>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: max token length = 48 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: rocBLAS error: Could not initialize Tensile host: No devices found Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.997+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server not responding" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.086+03:00 level=DEBUG source=server.go:424 msg="llama runner terminated" error="signal: aborted (core dumped)" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=ERROR source=sched.go:451 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:454 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:355 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:371 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:52 | 500 | 1.613482132s | 127.0.0.1 | POST "/api/chat" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=server.go:1042 msg="stopping llama server" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:376 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" ``` As can be seen, it detects the GPU perfectly fine and tries to use it, but for some reason rocBLAS fails. ROCM-SMI: ``` ======================================= ROCm System Management Interface ======================================= ================================================= Concise Info ================================================= Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% Name (20 chars) (Edge) (Socket) (Mem, Compute) ================================================================================================================ 0 [0x2308 : 0xc1] 45.0°C 4.0W N/A, N/A 852Mhz 167Mhz 0% auto 247.0W 0% 0% Vega 10 XL/XT [Radeo ================================================================================================================ ============================================= End of ROCm SMI Log ============================================== ``` rocminfo: ``` ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 9 3900X 12-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 9 3900X 12-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 3800 BDFID: 0 Internal Node ID: 0 Compute Unit: 24 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 65771684(0x3eb98a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 65771684(0x3eb98a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65771684(0x3eb98a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx900 Uuid: GPU-021504f1231031a4 Marketing Name: Radeon RX Vega Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 4096(0x1000) KB Chip ID: 26751(0x687f) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1630 BDFID: 2560 Internal Node ID: 1 Compute Unit: 64 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 468 SDMA engine uCode:: 434 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx900:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ``` `apt show rocm-libs -a` (version) ``` Package: rocm-libs Version: 6.0.2.60002-115~22.04 Priority: optional Section: devel Maintainer: ROCm Dev Support <rocm-dev.support@amd.com> Installed-Size: 13.3 kB Depends: hipblas (= 2.0.0.60002-115~22.04), hipblaslt (= 0.6.0.60002-115~22.04), hipfft (= 1.0.13.60002-115~22.04), hipsolver (= 2.0.0.60002-115~22.04), hipsparse (= 3.0.0.60002-115~22.04), hiptensor (= 1.1.0.60002-115~22.04), miopen-hip (= 3.00.0.60002-115~22.04), half (= 1.12.0.60002-115~22.04), rccl (= 2.18.3.60002-115~22.04), rocalution (= 3.0.3.60002-115~22.04), rocblas (= 4.0.0.60002-115~22.04), rocfft (= 1.0.25.60002-115~22.04), rocrand (= 3.0.0.60002-115~22.04), hiprand (= 2.10.16.60002-115~22.04), rocsolver (= 3.24.0.60002-115~22.04), rocsparse (= 3.0.2.60002-115~22.04), rocm-core (= 6.0.2.60002-115~22.04), composablekernel-dev (= 1.1.0.60002-115~22.04), hipblas-dev (= 2.0.0.60002-115~22.04), hipblaslt-dev (= 0.6.0.60002-115~22.04), hipcub-dev (= 3.0.0.60002-115~22.04), hipfft-dev (= 1.0.13.60002-115~22.04), hipsolver-dev (= 2.0.0.60002-115~22.04), hipsparse-dev (= 3.0.0.60002-115~22.04), hiptensor-dev (= 1.1.0.60002-115~22.04), miopen-hip-dev (= 3.00.0.60002-115~22.04), rccl-dev (= 2.18.3.60002-115~22.04), rocalution-dev (= 3.0.3.60002-115~22.04), rocblas-dev (= 4.0.0.60002-115~22.04), rocfft-dev (= 1.0.25.60002-115~22.04), rocprim-dev (= 3.0.0.60002-115~22.04), rocrand-dev (= 3.0.0.60002-115~22.04), hiprand-dev (= 2.10.16.60002-115~22.04), rocsolver-dev (= 3.24.0.60002-115~22.04), rocsparse-dev (= 3.0.2.60002-115~22.04), rocthrust-dev (= 3.0.0.60002-115~22.04), rocwmma-dev (= 1.3.0.60002-115~22.04) Homepage: https://github.com/RadeonOpenCompute/ROCm Download-Size: 1,050 B APT-Sources: https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 Packages Description: Radeon Open Compute (ROCm) Runtime software stack ```
Author
Owner

@mathatan commented on GitHub (Aug 2, 2024):

Did a rollback and got it running with v1.0.48.

Here's the details:

Env:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"
Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0"
Environment="OLLAMA_NOHISTORY=1"
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_DEBUG=1"

Logs:

Aug  2 20:54:19 ollama-ubuntu-jammy systemd[1]: Started Ollama Service.
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: 2024/08/02 20:54:19 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.927+03:00 level=INFO source=images.go:730 msg="total blobs: 77"
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.930+03:00 level=INFO source=images.go:737 msg="total unused blobs removed: 0"
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=INFO source=routes.go:1111 msg="Listening on [::]:11434 (version 0.1.48)"
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama4024265843/runners
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60101 file=build/linux/x86_64/rocm_v60101/bin/deps.txt.gz
Aug  2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60101 file=build/linux/x86_64/rocm_v60101/bin/ollama_llama_server.gz
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx2/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cuda_v11/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/rocm_v60101/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60101 cpu]"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=sched.go:94 msg="starting llm scheduler"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=gpu.go:205 msg="Detecting GPUs"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=gpu.go:435 msg="Searching for GPU library" name=libcuda.so*
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=gpu.go:454 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.593+03:00 level=DEBUG source=gpu.go:488 msg="discovered GPU libraries" paths=[]
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.593+03:00 level=DEBUG source=gpu.go:435 msg="Searching for GPU library" name=libcudart.so*
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.593+03:00 level=DEBUG source=gpu.go:454 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama4024265843/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.593+03:00 level=DEBUG source=gpu.go:488 msg="discovered GPU libraries" paths=[/tmp/ollama4024265843/runners/cuda_v11/libcudart.so.11.0]
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: cudaSetDevice err: 35
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=gpu.go:500 msg="Unable to load cudart" library=/tmp/ollama4024265843/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:87 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:112 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:87 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:202 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:236 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:268 msg="amdgpu memory" gpu=0 total="8.0 GiB"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:269 msg="amdgpu memory" gpu=0 available="8.0 GiB"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=INFO source=amd_linux.go:333 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.595+03:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: [GIN] 2024/08/02 - 20:54:22 | 200 |      20.669µs |       127.0.0.1 | HEAD     "/"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: [GIN] 2024/08/02 - 20:54:22 | 200 |    9.499155ms |       127.0.0.1 | POST     "/api/show"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.605+03:00 level=DEBUG source=gpu.go:333 msg="updating system memory data" before.total="62.7 GiB" before.free="54.0 GiB" now.total="62.7 GiB" now.free="54.0 GiB"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.605+03:00 level=DEBUG source=amd_linux.go:425 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=DEBUG source=sched.go:169 msg="loading first model" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=DEBUG source=sched.go:628 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 available=8564838400 required="3.4 GiB"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=DEBUG source=server.go:98 msg="system memory" total="62.7 GiB" free=58011594752
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="3.4 GiB" memory.required.partial="3.4 GiB" memory.required.kv="768.0 MiB" memory.required.allocations="[3.4 GiB]" memory.weights.total="2.6 GiB" memory.weights.repeating="2.6 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="128.0 MiB" memory.graph.partial="128.0 MiB"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx2/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cuda_v11/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/rocm_v60101/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx2/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cuda_v11/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/rocm_v60101/ollama_llama_server
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=INFO source=server.go:145 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4024265843/runners/rocm_v60101/ollama_llama_server --model /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 1 --port 41065"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=server.go:383 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama4024265843/runners/rocm_v60101:/tmp/ollama4024265843/runners HIP_VISIBLE_DEVICES=0]"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=INFO source=sched.go:382 msg="loaded runners" count=1
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=INFO source=server.go:556 msg="waiting for llama runner to start responding"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server error"
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[420]: INFO [main] build info | build=1 commit="7c26775" tid="140008895320896" timestamp=1722621262
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[420]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140008895320896" timestamp=1722621262 total_threads=24
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[420]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="41065" tid="140008895320896" timestamp=1722621262
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest))
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv   0:                       general.architecture str              = phi3
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv   1:                               general.type str              = model
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv   2:                               general.name str              = Phi 3 Mini 128k Instruct
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv   3:                           general.finetune str              = 128k-instruct
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv   4:                           general.basename str              = Phi-3
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv   5:                         general.size_label str              = mini
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv   6:                            general.license str              = mit
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/microsoft/Phi-...
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv   8:                               general.tags arr[str,3]       = ["nlp", "code", "text-generation"]
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["en"]
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  10:                        phi3.context_length u32              = 131072
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  11:  phi3.rope.scaling.original_context_length u32              = 4096
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  12:                      phi3.embedding_length u32              = 3072
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  13:                   phi3.feed_forward_length u32              = 8192
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  14:                           phi3.block_count u32              = 32
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  15:                  phi3.attention.head_count u32              = 32
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  16:               phi3.attention.head_count_kv u32              = 32
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  17:      phi3.attention.layer_norm_rms_epsilon f32              = 0.000010
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  18:                  phi3.rope.dimension_count u32              = 96
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  19:                        phi3.rope.freq_base f32              = 10000.000000
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  20:                          general.file_type u32              = 2
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  21:              phi3.attention.sliding_window u32              = 262144
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  22:              phi3.rope.scaling.attn_factor f32              = 1.190238
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = llama
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = default
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  26:                      tokenizer.ggml.scores arr[f32,32064]   = [-1000.000000, -1000.000000, -1000.00...
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,32064]   = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 1
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 32000
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 32000
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  33:               tokenizer.ggml.add_eos_token bool             = false
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv  35:               general.quantization_version u32              = 2
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - type  f32:   67 tensors
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - type q4_0:  129 tensors
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - type q6_K:    1 tensors
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_vocab: special tokens cache size = 323
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_vocab: token to piece cache size = 0.1685 MB
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: format           = GGUF V3 (latest)
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: arch             = phi3
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: vocab type       = SPM
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_vocab          = 32064
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_merges         = 0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_ctx_train      = 131072
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_embd           = 3072
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_head           = 32
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_head_kv        = 32
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_layer          = 32
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_rot            = 96
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_embd_head_k    = 96
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_embd_head_v    = 96
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_gqa            = 1
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_embd_k_gqa     = 3072
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_embd_v_gqa     = 3072
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_ff             = 8192
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_expert         = 0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_expert_used    = 0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: causal attn      = 1
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: pooling type     = 0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: rope type        = 2
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: rope scaling     = linear
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: freq_base_train  = 10000.0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: freq_scale_train = 1
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_ctx_orig_yarn  = 4096
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: rope_finetuned   = unknown
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: ssm_d_conv       = 0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: ssm_d_inner      = 0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: ssm_d_state      = 0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: ssm_dt_rank      = 0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: model type       = 3B
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: model ftype      = Q4_0
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: model params     = 3.82 B
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: model size       = 2.03 GiB (4.55 BPW)
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: general.name     = Phi 3 Mini 128k Instruct
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: BOS token        = 1 '<s>'
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: EOS token        = 32000 '<|endoftext|>'
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: UNK token        = 0 '<unk>'
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: PAD token        = 32000 '<|endoftext|>'
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: EOT token        = 32007 '<|end|>'
Aug  2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.862+03:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server loading model"
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: ggml_cuda_init: found 1 ROCm devices:
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]:   Device 0: Radeon RX Vega, compute capability 9.0, VMM: no
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors: ggml ctx size =    0.22 MiB
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors: offloading 32 repeating layers to GPU
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors: offloading non-repeating layers to GPU
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors: offloaded 33/33 layers to GPU
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors:      ROCm0 buffer size =  2021.84 MiB
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors:        CPU buffer size =    52.84 MiB
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:23.614+03:00 level=DEBUG source=server.go:605 msg="model load progress 0.13"
Aug  2 20:54:23 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:23.866+03:00 level=DEBUG source=server.go:605 msg="model load progress 0.63"
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: n_ctx      = 2048
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: n_batch    = 512
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: n_ubatch   = 512
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: flash_attn = 0
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: freq_base  = 10000.0
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: freq_scale = 1
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_kv_cache_init:      ROCm0 KV buffer size =   768.00 MiB
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: KV self size  =  768.00 MiB, K (f16):  384.00 MiB, V (f16):  384.00 MiB
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model:  ROCm_Host  output buffer size =     0.13 MiB
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.117+03:00 level=DEBUG source=server.go:605 msg="model load progress 1.00"
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model:      ROCm0 compute buffer size =   168.00 MiB
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model:  ROCm_Host compute buffer size =    10.01 MiB
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: graph nodes  = 1286
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: graph splits = 2
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[420]: DEBUG [initialize] initializing slots | n_slots=1 tid="140008895320896" timestamp=1722621264
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[420]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="140008895320896" timestamp=1722621264
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[420]: INFO [main] model loaded | tid="140008895320896" timestamp=1722621264
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[420]: DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="140008895320896" timestamp=1722621264
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[420]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=0 tid="140008895320896" timestamp=1722621264
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.367+03:00 level=INFO source=server.go:599 msg="llama runner started in 1.76 seconds"
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.367+03:00 level=DEBUG source=sched.go:395 msg="finished setting up runner" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.367+03:00 level=DEBUG source=prompt.go:172 msg="prompt now fits in context window" required=1 window=2048
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: [GIN] 2024/08/02 - 20:54:24 | 200 |   1.76273246s |       127.0.0.1 | POST     "/api/chat"
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.368+03:00 level=DEBUG source=sched.go:399 msg="context for request finished"
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.368+03:00 level=DEBUG source=sched.go:281 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s
Aug  2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.368+03:00 level=DEBUG source=sched.go:299 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.105+03:00 level=DEBUG source=sched.go:507 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=1 tid="140008895320896" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=2 tid="140008895320896" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [log_server_request] request | method="POST" params={} path="/tokenize" remote_addr="127.0.0.1" remote_port=54570 status=200 tid="140008877467200" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.149+03:00 level=DEBUG source=prompt.go:172 msg="prompt now fits in context window" required=15 window=2048
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.149+03:00 level=DEBUG source=routes.go:1367 msg="chat handler" prompt="<|user|>\nHey there!<|end|>\n<|assistant|>\n" images=0
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.149+03:00 level=DEBUG source=server.go:695 msg="setting token limit to 10x num_ctx" num_ctx=2048 num_predict=20480
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=3 tid="140008895320896" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=4 tid="140008895320896" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=13 slot_id=0 task_id=4 tid="140008895320896" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=4 tid="140008895320896" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [print_timings] prompt eval time     =     109.71 ms /    13 tokens (    8.44 ms per token,   118.49 tokens per second) | n_prompt_tokens_processed=13 n_tokens_second=118.49421201349011 slot_id=0 t_prompt_processing=109.71 t_token=8.439230769230768 task_id=4 tid="140008895320896" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [print_timings] generation eval time =     155.06 ms /    10 runs   (   15.51 ms per token,    64.49 tokens per second) | n_decoded=10 n_tokens_second=64.48950110921942 slot_id=0 t_token=15.5064 t_token_generation=155.064 task_id=4 tid="140008895320896" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [print_timings]           total time =     264.77 ms | slot_id=0 t_prompt_processing=109.71 t_token_generation=155.064 t_total=264.774 task_id=4 tid="140008895320896" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [update_slots] slot released | n_cache_tokens=23 n_ctx=2048 n_past=22 n_system_tokens=0 slot_id=0 task_id=4 tid="140008895320896" timestamp=1722621267 truncated=false
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [log_server_request] request | method="POST" params={} path="/completion" remote_addr="127.0.0.1" remote_port=56262 status=200 tid="140008869074496" timestamp=1722621267
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[385]: [GIN] 2024/08/02 - 20:54:27 | 200 |  310.845213ms |       127.0.0.1 | POST     "/api/chat"
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.415+03:00 level=DEBUG source=sched.go:348 msg="context for request finished"
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.415+03:00 level=DEBUG source=sched.go:281 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s
Aug  2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.415+03:00 level=DEBUG source=sched.go:299 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0

rocm-smi:

========================================== ROCm System Management Interface ==========================================
==================================================== Concise Info ====================================================
Device  Node  IDs              Temp    Power     Partitions          SCLK    MCLK    Fan  Perf  PwrCap  VRAM%  GPU%
              (DID,     GUID)  (Edge)  (Socket)  (Mem, Compute, ID)
======================================================================================================================
0       1     0x687f,   22782  51.0°C  10.0W     N/A, N/A, 0         852Mhz  167Mhz  0%   auto  247.0W  39%    0%
======================================================================================================================
================================================ End of ROCm SMI Log =================================================

rocinfo:

ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.13
Runtime Ext Version:     1.4
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 9 3900X 12-Core Processor
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen 9 3900X 12-Core Processor
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   3800
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            24
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    65771708(0x3eb98bc) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65771708(0x3eb98bc) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    65771708(0x3eb98bc) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx900
  Uuid:                    GPU-021504f1231031a4
  Marketing Name:          Radeon RX Vega
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      4096(0x1000) KB
  Chip ID:                 26751(0x687f)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   1630
  BDFID:                   2560
  Internal Node ID:        1
  Compute Unit:            64
  SIMDs per CU:            4
  Shader Engines:          4
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Coherent Host Access:    FALSE
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          64(0x40)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        40(0x28)
  Max Work-item Per CU:    2560(0xa00)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 468
  SDMA engine uCode::      434
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Recommended Granule:0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx900:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

apt show rocm-libs -a:

Package: rocm-libs
Version: 6.1.2.60102-119~22.04
Priority: optional
Section: devel
Maintainer: ROCm Dev Support <rocm-dev.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 2.1.0.60102-119~22.04), hipblaslt (= 0.7.0.60102-119~22.04), hipfft (= 1.0.14.60102-119~22.04), hipsolver (= 2.1.1.60102-119~22.04), hipsparse (= 3.0.1.60102-119~22.04), hiptensor (= 1.2.0.60102-119~22.04), miopen-hip (= 3.1.0.60102-119~22.04), half (= 1.12.0.60102-119~22.04), rccl (= 2.18.6.60102-119~22.04), rocalution (= 3.1.1.60102-119~22.04), rocblas (= 4.1.2.60102-119~22.04), rocfft (= 1.0.27.60102-119~22.04), rocrand (= 3.0.1.60102-119~22.04), hiprand (= 2.10.16.60102-119~22.04), rocsolver (= 3.25.0.60102-119~22.04), rocsparse (= 3.1.2.60102-119~22.04), rocm-core (= 6.1.2.60102-119~22.04), hipsparselt (= 0.2.0.60102-119~22.04), composablekernel-dev (= 1.1.0.60102-119~22.04), hipblas-dev (= 2.1.0.60102-119~22.04), hipblaslt-dev (= 0.7.0.60102-119~22.04), hipcub-dev (= 3.1.0.60102-119~22.04), hipfft-dev (= 1.0.14.60102-119~22.04), hipsolver-dev (= 2.1.1.60102-119~22.04), hipsparse-dev (= 3.0.1.60102-119~22.04), hiptensor-dev (= 1.2.0.60102-119~22.04), miopen-hip-dev (= 3.1.0.60102-119~22.04), rccl-dev (= 2.18.6.60102-119~22.04), rocalution-dev (= 3.1.1.60102-119~22.04), rocblas-dev (= 4.1.2.60102-119~22.04), rocfft-dev (= 1.0.27.60102-119~22.04), rocprim-dev (= 3.1.0.60102-119~22.04), rocrand-dev (= 3.0.1.60102-119~22.04), hiprand-dev (= 2.10.16.60102-119~22.04), rocsolver-dev (= 3.25.0.60102-119~22.04), rocsparse-dev (= 3.1.2.60102-119~22.04), rocthrust-dev (= 3.0.1.60102-119~22.04), rocwmma-dev (= 1.4.0.60102-119~22.04), hipsparselt-dev (= 0.2.0.60102-119~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1,068 B
APT-Sources: https://repo.radeon.com/rocm/apt/6.1.2 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

The biggest difference is the library which is updated from 6.0.2 to 6.1.2.

Also if I update Ollama to latest it stops working.

<!-- gh-comment-id:2265892604 --> @mathatan commented on GitHub (Aug 2, 2024): Did a rollback and got it running with v1.0.48. Here's the details: Env: ``` [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0" Environment="OLLAMA_NOHISTORY=1" Environment="HSA_ENABLE_SDMA=0" Environment="OLLAMA_DEBUG=1" ``` Logs: ``` Aug 2 20:54:19 ollama-ubuntu-jammy systemd[1]: Started Ollama Service. Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: 2024/08/02 20:54:19 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.927+03:00 level=INFO source=images.go:730 msg="total blobs: 77" Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.930+03:00 level=INFO source=images.go:737 msg="total unused blobs removed: 0" Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=INFO source=routes.go:1111 msg="Listening on [::]:11434 (version 0.1.48)" Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama4024265843/runners Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60101 file=build/linux/x86_64/rocm_v60101/bin/deps.txt.gz Aug 2 20:54:19 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:19.931+03:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60101 file=build/linux/x86_64/rocm_v60101/bin/ollama_llama_server.gz Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx2/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cuda_v11/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/rocm_v60101/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60101 cpu]" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=sched.go:94 msg="starting llm scheduler" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=gpu.go:205 msg="Detecting GPUs" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=gpu.go:435 msg="Searching for GPU library" name=libcuda.so* Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.591+03:00 level=DEBUG source=gpu.go:454 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.593+03:00 level=DEBUG source=gpu.go:488 msg="discovered GPU libraries" paths=[] Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.593+03:00 level=DEBUG source=gpu.go:435 msg="Searching for GPU library" name=libcudart.so* Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.593+03:00 level=DEBUG source=gpu.go:454 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama4024265843/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.593+03:00 level=DEBUG source=gpu.go:488 msg="discovered GPU libraries" paths=[/tmp/ollama4024265843/runners/cuda_v11/libcudart.so.11.0] Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: cudaSetDevice err: 35 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=gpu.go:500 msg="Unable to load cudart" library=/tmp/ollama4024265843/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:87 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:112 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:87 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:202 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:236 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:268 msg="amdgpu memory" gpu=0 total="8.0 GiB" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_linux.go:269 msg="amdgpu memory" gpu=0 available="8.0 GiB" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.594+03:00 level=INFO source=amd_linux.go:333 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.595+03:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: [GIN] 2024/08/02 - 20:54:22 | 200 | 20.669µs | 127.0.0.1 | HEAD "/" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: [GIN] 2024/08/02 - 20:54:22 | 200 | 9.499155ms | 127.0.0.1 | POST "/api/show" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.605+03:00 level=DEBUG source=gpu.go:333 msg="updating system memory data" before.total="62.7 GiB" before.free="54.0 GiB" now.total="62.7 GiB" now.free="54.0 GiB" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.605+03:00 level=DEBUG source=amd_linux.go:425 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=DEBUG source=sched.go:169 msg="loading first model" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=DEBUG source=sched.go:628 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 available=8564838400 required="3.4 GiB" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=DEBUG source=server.go:98 msg="system memory" total="62.7 GiB" free=58011594752 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.609+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="3.4 GiB" memory.required.partial="3.4 GiB" memory.required.kv="768.0 MiB" memory.required.allocations="[3.4 GiB]" memory.weights.total="2.6 GiB" memory.weights.repeating="2.6 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="128.0 MiB" memory.graph.partial="128.0 MiB" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx2/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cuda_v11/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/rocm_v60101/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cpu_avx2/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/cuda_v11/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama4024265843/runners/rocm_v60101/ollama_llama_server Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=INFO source=server.go:145 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4024265843/runners/rocm_v60101/ollama_llama_server --model /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 1 --port 41065" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=DEBUG source=server.go:383 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama4024265843/runners/rocm_v60101:/tmp/ollama4024265843/runners HIP_VISIBLE_DEVICES=0]" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=INFO source=sched.go:382 msg="loaded runners" count=1 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=INFO source=server.go:556 msg="waiting for llama runner to start responding" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.610+03:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server error" Aug 2 20:54:22 ollama-ubuntu-jammy ollama[420]: INFO [main] build info | build=1 commit="7c26775" tid="140008895320896" timestamp=1722621262 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[420]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140008895320896" timestamp=1722621262 total_threads=24 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[420]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="41065" tid="140008895320896" timestamp=1722621262 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest)) Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 0: general.architecture str = phi3 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 1: general.type str = model Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 2: general.name str = Phi 3 Mini 128k Instruct Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 3: general.finetune str = 128k-instruct Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 4: general.basename str = Phi-3 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 5: general.size_label str = mini Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 6: general.license str = mit Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/microsoft/Phi-... Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 8: general.tags arr[str,3] = ["nlp", "code", "text-generation"] Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 9: general.languages arr[str,1] = ["en"] Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 10: phi3.context_length u32 = 131072 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 11: phi3.rope.scaling.original_context_length u32 = 4096 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 12: phi3.embedding_length u32 = 3072 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 13: phi3.feed_forward_length u32 = 8192 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 14: phi3.block_count u32 = 32 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 15: phi3.attention.head_count u32 = 32 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 16: phi3.attention.head_count_kv u32 = 32 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 17: phi3.attention.layer_norm_rms_epsilon f32 = 0.000010 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 18: phi3.rope.dimension_count u32 = 96 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 19: phi3.rope.freq_base f32 = 10000.000000 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 20: general.file_type u32 = 2 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 21: phi3.attention.sliding_window u32 = 262144 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 22: phi3.rope.scaling.attn_factor f32 = 1.190238 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 23: tokenizer.ggml.model str = llama Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = default Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,32064] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 26: tokenizer.ggml.scores arr[f32,32064] = [-1000.000000, -1000.000000, -1000.00... Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,32064] = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 1 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 32000 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 32000 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 33: tokenizer.ggml.add_eos_token bool = false Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 34: tokenizer.chat_template str = {% for message in messages %}{% if me... Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - kv 35: general.quantization_version u32 = 2 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - type f32: 67 tensors Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - type q4_0: 129 tensors Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llama_model_loader: - type q6_K: 1 tensors Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_vocab: special tokens cache size = 323 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_vocab: token to piece cache size = 0.1685 MB Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: format = GGUF V3 (latest) Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: arch = phi3 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: vocab type = SPM Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_vocab = 32064 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_merges = 0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_ctx_train = 131072 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_embd = 3072 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_head = 32 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_head_kv = 32 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_layer = 32 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_rot = 96 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_embd_head_k = 96 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_embd_head_v = 96 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_gqa = 1 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_embd_k_gqa = 3072 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_embd_v_gqa = 3072 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: f_norm_eps = 0.0e+00 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: f_logit_scale = 0.0e+00 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_ff = 8192 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_expert = 0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_expert_used = 0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: causal attn = 1 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: pooling type = 0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: rope type = 2 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: rope scaling = linear Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: freq_base_train = 10000.0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: freq_scale_train = 1 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: n_ctx_orig_yarn = 4096 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: rope_finetuned = unknown Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: ssm_d_conv = 0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: ssm_d_inner = 0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: ssm_d_state = 0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: ssm_dt_rank = 0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: model type = 3B Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: model ftype = Q4_0 Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: model params = 3.82 B Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: model size = 2.03 GiB (4.55 BPW) Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: general.name = Phi 3 Mini 128k Instruct Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: BOS token = 1 '<s>' Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: EOS token = 32000 '<|endoftext|>' Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: UNK token = 0 '<unk>' Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: PAD token = 32000 '<|endoftext|>' Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: LF token = 13 '<0x0A>' Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: llm_load_print_meta: EOT token = 32007 '<|end|>' Aug 2 20:54:22 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:22.862+03:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server loading model" Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: ggml_cuda_init: found 1 ROCm devices: Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: Device 0: Radeon RX Vega, compute capability 9.0, VMM: no Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors: ggml ctx size = 0.22 MiB Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors: offloading 32 repeating layers to GPU Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors: offloading non-repeating layers to GPU Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors: offloaded 33/33 layers to GPU Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors: ROCm0 buffer size = 2021.84 MiB Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: llm_load_tensors: CPU buffer size = 52.84 MiB Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:23.614+03:00 level=DEBUG source=server.go:605 msg="model load progress 0.13" Aug 2 20:54:23 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:23.866+03:00 level=DEBUG source=server.go:605 msg="model load progress 0.63" Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: n_ctx = 2048 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: n_batch = 512 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: n_ubatch = 512 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: flash_attn = 0 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: freq_base = 10000.0 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: freq_scale = 1 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_kv_cache_init: ROCm0 KV buffer size = 768.00 MiB Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: KV self size = 768.00 MiB, K (f16): 384.00 MiB, V (f16): 384.00 MiB Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: ROCm_Host output buffer size = 0.13 MiB Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.117+03:00 level=DEBUG source=server.go:605 msg="model load progress 1.00" Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: ROCm0 compute buffer size = 168.00 MiB Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: ROCm_Host compute buffer size = 10.01 MiB Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: graph nodes = 1286 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: llama_new_context_with_model: graph splits = 2 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[420]: DEBUG [initialize] initializing slots | n_slots=1 tid="140008895320896" timestamp=1722621264 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[420]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="140008895320896" timestamp=1722621264 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[420]: INFO [main] model loaded | tid="140008895320896" timestamp=1722621264 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[420]: DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="140008895320896" timestamp=1722621264 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[420]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=0 tid="140008895320896" timestamp=1722621264 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.367+03:00 level=INFO source=server.go:599 msg="llama runner started in 1.76 seconds" Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.367+03:00 level=DEBUG source=sched.go:395 msg="finished setting up runner" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.367+03:00 level=DEBUG source=prompt.go:172 msg="prompt now fits in context window" required=1 window=2048 Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: [GIN] 2024/08/02 - 20:54:24 | 200 | 1.76273246s | 127.0.0.1 | POST "/api/chat" Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.368+03:00 level=DEBUG source=sched.go:399 msg="context for request finished" Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.368+03:00 level=DEBUG source=sched.go:281 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s Aug 2 20:54:24 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:24.368+03:00 level=DEBUG source=sched.go:299 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.105+03:00 level=DEBUG source=sched.go:507 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=1 tid="140008895320896" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=2 tid="140008895320896" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [log_server_request] request | method="POST" params={} path="/tokenize" remote_addr="127.0.0.1" remote_port=54570 status=200 tid="140008877467200" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.149+03:00 level=DEBUG source=prompt.go:172 msg="prompt now fits in context window" required=15 window=2048 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.149+03:00 level=DEBUG source=routes.go:1367 msg="chat handler" prompt="<|user|>\nHey there!<|end|>\n<|assistant|>\n" images=0 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.149+03:00 level=DEBUG source=server.go:695 msg="setting token limit to 10x num_ctx" num_ctx=2048 num_predict=20480 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=3 tid="140008895320896" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=4 tid="140008895320896" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=13 slot_id=0 task_id=4 tid="140008895320896" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=4 tid="140008895320896" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [print_timings] prompt eval time = 109.71 ms / 13 tokens ( 8.44 ms per token, 118.49 tokens per second) | n_prompt_tokens_processed=13 n_tokens_second=118.49421201349011 slot_id=0 t_prompt_processing=109.71 t_token=8.439230769230768 task_id=4 tid="140008895320896" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [print_timings] generation eval time = 155.06 ms / 10 runs ( 15.51 ms per token, 64.49 tokens per second) | n_decoded=10 n_tokens_second=64.48950110921942 slot_id=0 t_token=15.5064 t_token_generation=155.064 task_id=4 tid="140008895320896" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [print_timings] total time = 264.77 ms | slot_id=0 t_prompt_processing=109.71 t_token_generation=155.064 t_total=264.774 task_id=4 tid="140008895320896" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [update_slots] slot released | n_cache_tokens=23 n_ctx=2048 n_past=22 n_system_tokens=0 slot_id=0 task_id=4 tid="140008895320896" timestamp=1722621267 truncated=false Aug 2 20:54:27 ollama-ubuntu-jammy ollama[420]: DEBUG [log_server_request] request | method="POST" params={} path="/completion" remote_addr="127.0.0.1" remote_port=56262 status=200 tid="140008869074496" timestamp=1722621267 Aug 2 20:54:27 ollama-ubuntu-jammy ollama[385]: [GIN] 2024/08/02 - 20:54:27 | 200 | 310.845213ms | 127.0.0.1 | POST "/api/chat" Aug 2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.415+03:00 level=DEBUG source=sched.go:348 msg="context for request finished" Aug 2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.415+03:00 level=DEBUG source=sched.go:281 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s Aug 2 20:54:27 ollama-ubuntu-jammy ollama[385]: time=2024-08-02T20:54:27.415+03:00 level=DEBUG source=sched.go:299 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0 ``` rocm-smi: ``` ========================================== ROCm System Management Interface ========================================== ==================================================== Concise Info ==================================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Edge) (Socket) (Mem, Compute, ID) ====================================================================================================================== 0 1 0x687f, 22782 51.0°C 10.0W N/A, N/A, 0 852Mhz 167Mhz 0% auto 247.0W 39% 0% ====================================================================================================================== ================================================ End of ROCm SMI Log ================================================= ``` rocinfo: ``` ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.13 Runtime Ext Version: 1.4 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 9 3900X 12-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 9 3900X 12-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 3800 BDFID: 0 Internal Node ID: 0 Compute Unit: 24 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 65771708(0x3eb98bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 65771708(0x3eb98bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65771708(0x3eb98bc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx900 Uuid: GPU-021504f1231031a4 Marketing Name: Radeon RX Vega Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 4096(0x1000) KB Chip ID: 26751(0x687f) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1630 BDFID: 2560 Internal Node ID: 1 Compute Unit: 64 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 468 SDMA engine uCode:: 434 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx900:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ``` `apt show rocm-libs -a`: ``` Package: rocm-libs Version: 6.1.2.60102-119~22.04 Priority: optional Section: devel Maintainer: ROCm Dev Support <rocm-dev.support@amd.com> Installed-Size: 13.3 kB Depends: hipblas (= 2.1.0.60102-119~22.04), hipblaslt (= 0.7.0.60102-119~22.04), hipfft (= 1.0.14.60102-119~22.04), hipsolver (= 2.1.1.60102-119~22.04), hipsparse (= 3.0.1.60102-119~22.04), hiptensor (= 1.2.0.60102-119~22.04), miopen-hip (= 3.1.0.60102-119~22.04), half (= 1.12.0.60102-119~22.04), rccl (= 2.18.6.60102-119~22.04), rocalution (= 3.1.1.60102-119~22.04), rocblas (= 4.1.2.60102-119~22.04), rocfft (= 1.0.27.60102-119~22.04), rocrand (= 3.0.1.60102-119~22.04), hiprand (= 2.10.16.60102-119~22.04), rocsolver (= 3.25.0.60102-119~22.04), rocsparse (= 3.1.2.60102-119~22.04), rocm-core (= 6.1.2.60102-119~22.04), hipsparselt (= 0.2.0.60102-119~22.04), composablekernel-dev (= 1.1.0.60102-119~22.04), hipblas-dev (= 2.1.0.60102-119~22.04), hipblaslt-dev (= 0.7.0.60102-119~22.04), hipcub-dev (= 3.1.0.60102-119~22.04), hipfft-dev (= 1.0.14.60102-119~22.04), hipsolver-dev (= 2.1.1.60102-119~22.04), hipsparse-dev (= 3.0.1.60102-119~22.04), hiptensor-dev (= 1.2.0.60102-119~22.04), miopen-hip-dev (= 3.1.0.60102-119~22.04), rccl-dev (= 2.18.6.60102-119~22.04), rocalution-dev (= 3.1.1.60102-119~22.04), rocblas-dev (= 4.1.2.60102-119~22.04), rocfft-dev (= 1.0.27.60102-119~22.04), rocprim-dev (= 3.1.0.60102-119~22.04), rocrand-dev (= 3.0.1.60102-119~22.04), hiprand-dev (= 2.10.16.60102-119~22.04), rocsolver-dev (= 3.25.0.60102-119~22.04), rocsparse-dev (= 3.1.2.60102-119~22.04), rocthrust-dev (= 3.0.1.60102-119~22.04), rocwmma-dev (= 1.4.0.60102-119~22.04), hipsparselt-dev (= 0.2.0.60102-119~22.04) Homepage: https://github.com/RadeonOpenCompute/ROCm Download-Size: 1,068 B APT-Sources: https://repo.radeon.com/rocm/apt/6.1.2 jammy/main amd64 Packages Description: Radeon Open Compute (ROCm) Runtime software stack ``` The biggest difference is the library which is updated from 6.0.2 to 6.1.2. Also if I update Ollama to latest it stops working.
Author
Owner

@AmurgCodru commented on GitHub (Aug 11, 2024):

Can confirm that I had the same exact issues(although I was running it on a bare laptop and not via docker or lxc)

Setting the following in the systemd seemed to work

Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"
Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0"
Environment=HCC_AMDGPU_TARGETS=gfx902
Environment="OLLAMA_NOHISTORY=1"
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_DEBUG=1"

So probably a bug in the new rocm version or something.

My details

AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx

apt show rocm-libs -a

Package: rocm-libs
Version: 6.2.0.60200-66~22.04
Priority: optional
Section: devel
Maintainer: ROCm Dev Support <rocm-dev.support@amd.com>
Installed-Size: 13,3 kB
Depends: hipblas (= 2.2.0.60200-66~22.04), hipblaslt (= 0.8.0.60200-66~22.04), hipfft (= 1.0.14.60200-66~22.04), hipsolver (= 2.2.0.60200-66~22.04), hipsparse (= 3.1.1.60200-66~22.04), hiptensor (= 1.3.0.60200-66~22.04), miopen-hip (= 3.2.0.60200-66~22.04), half (= 1.12.0.60200-66~22.04), rccl (= 2.20.5.60200-66~22.04), rocalution (= 3.2.0.60200-66~22.04), rocblas (= 4.2.0.60200-66~22.04), rocfft (= 1.0.28.60200-66~22.04), rocrand (= 3.1.0.60200-66~22.04), hiprand (= 2.11.0.60200-66~22.04), rocsolver (= 3.26.0.60200-66~22.04), rocsparse (= 3.2.0.60200-66~22.04), rocm-core (= 6.2.0.60200-66~22.04), hipsparselt (= 0.2.1.60200-66~22.04), composablekernel-dev (= 1.1.0.60200-66~22.04), hipblas-dev (= 2.2.0.60200-66~22.04), hipblaslt-dev (= 0.8.0.60200-66~22.04), hipcub-dev (= 3.2.0.60200-66~22.04), hipfft-dev (= 1.0.14.60200-66~22.04), hipsolver-dev (= 2.2.0.60200-66~22.04), hipsparse-dev (= 3.1.1.60200-66~22.04), hiptensor-dev (= 1.3.0.60200-66~22.04), miopen-hip-dev (= 3.2.0.60200-66~22.04), rccl-dev (= 2.20.5.60200-66~22.04), rocalution-dev (= 3.2.0.60200-66~22.04), rocblas-dev (= 4.2.0.60200-66~22.04), rocfft-dev (= 1.0.28.60200-66~22.04), rocprim-dev (= 3.2.0.60200-66~22.04), rocrand-dev (= 3.1.0.60200-66~22.04), hiprand-dev (= 2.11.0.60200-66~22.04), rocsolver-dev (= 3.26.0.60200-66~22.04), rocsparse-dev (= 3.2.0.60200-66~22.04), rocthrust-dev (= 3.0.1.60200-66~22.04), rocwmma-dev (= 1.5.0.60200-66~22.04), hipsparselt-dev (= 0.2.1.60200-66~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1.056 B
APT-Sources: https://repo.radeon.com/rocm/apt/6.2 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack
<!-- gh-comment-id:2282668320 --> @AmurgCodru commented on GitHub (Aug 11, 2024): Can confirm that I had the same exact issues(although I was running it on a bare laptop and not via docker or lxc) Setting the following in the systemd seemed to work ```systemd Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0" Environment=HCC_AMDGPU_TARGETS=gfx902 Environment="OLLAMA_NOHISTORY=1" Environment="HSA_ENABLE_SDMA=0" Environment="OLLAMA_DEBUG=1" ``` So probably a bug in the new rocm version or something. My details AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx ```bash apt show rocm-libs -a Package: rocm-libs Version: 6.2.0.60200-66~22.04 Priority: optional Section: devel Maintainer: ROCm Dev Support <rocm-dev.support@amd.com> Installed-Size: 13,3 kB Depends: hipblas (= 2.2.0.60200-66~22.04), hipblaslt (= 0.8.0.60200-66~22.04), hipfft (= 1.0.14.60200-66~22.04), hipsolver (= 2.2.0.60200-66~22.04), hipsparse (= 3.1.1.60200-66~22.04), hiptensor (= 1.3.0.60200-66~22.04), miopen-hip (= 3.2.0.60200-66~22.04), half (= 1.12.0.60200-66~22.04), rccl (= 2.20.5.60200-66~22.04), rocalution (= 3.2.0.60200-66~22.04), rocblas (= 4.2.0.60200-66~22.04), rocfft (= 1.0.28.60200-66~22.04), rocrand (= 3.1.0.60200-66~22.04), hiprand (= 2.11.0.60200-66~22.04), rocsolver (= 3.26.0.60200-66~22.04), rocsparse (= 3.2.0.60200-66~22.04), rocm-core (= 6.2.0.60200-66~22.04), hipsparselt (= 0.2.1.60200-66~22.04), composablekernel-dev (= 1.1.0.60200-66~22.04), hipblas-dev (= 2.2.0.60200-66~22.04), hipblaslt-dev (= 0.8.0.60200-66~22.04), hipcub-dev (= 3.2.0.60200-66~22.04), hipfft-dev (= 1.0.14.60200-66~22.04), hipsolver-dev (= 2.2.0.60200-66~22.04), hipsparse-dev (= 3.1.1.60200-66~22.04), hiptensor-dev (= 1.3.0.60200-66~22.04), miopen-hip-dev (= 3.2.0.60200-66~22.04), rccl-dev (= 2.20.5.60200-66~22.04), rocalution-dev (= 3.2.0.60200-66~22.04), rocblas-dev (= 4.2.0.60200-66~22.04), rocfft-dev (= 1.0.28.60200-66~22.04), rocprim-dev (= 3.2.0.60200-66~22.04), rocrand-dev (= 3.1.0.60200-66~22.04), hiprand-dev (= 2.11.0.60200-66~22.04), rocsolver-dev (= 3.26.0.60200-66~22.04), rocsparse-dev (= 3.2.0.60200-66~22.04), rocthrust-dev (= 3.0.1.60200-66~22.04), rocwmma-dev (= 1.5.0.60200-66~22.04), hipsparselt-dev (= 0.2.1.60200-66~22.04) Homepage: https://github.com/RadeonOpenCompute/ROCm Download-Size: 1.056 B APT-Sources: https://repo.radeon.com/rocm/apt/6.2 jammy/main amd64 Packages Description: Radeon Open Compute (ROCm) Runtime software stack ```
Author
Owner
<!-- gh-comment-id:2314078522 --> @ayttop commented on GitHub (Aug 28, 2024): https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29004