[GH-ISSUE #11652] rocblaslt error: Could not load /usr/local/lib/ollama/rocm/hipblaslt/library/TensileLibrary_lazy_gfx1200.dat #54214

Closed
opened 2026-04-29 05:23:52 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @paugh7 on GitHub (Aug 3, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11652

What is the issue?

Ollama offloads to the CPU even though my GPU has more than enough memory to support the model. Rocm drivers installed from AMD website before installing ollama. TensileLibrary_lazy_gfx1200.dat file verified to be present.

GPU: AMD Radeon RX 9060 XT 16GB
OS: Ubuntu 24.0.4LTS
ROCM: 6.4
Ollama: 0.10.1

Relevant log output

Ollama.service:
ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/ollama.service.d
             └─override.conf
     Active: active (running) since Sun 2025-08-03 18:28:19 UTC; 1s ago
   Main PID: 1117215 (ollama)
      Tasks: 13 (limit: 18550)
     Memory: 12.0M (peak: 12.2M)
        CPU: 23ms
     CGroup: /system.slice/ollama.service
             └─1117215 /usr/local/bin/ollama serve

Aug 03 18:28:19 ai systemd[1]: Started ollama.service - Ollama Service.
Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.658Z level=INFO source=routes.go:1238 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_P>
Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.659Z level=INFO source=images.go:476 msg="total blobs: 10"
Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.659Z level=INFO source=images.go:483 msg="total unused blobs removed: 0"
Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.659Z level=INFO source=routes.go:1291 msg="Listening on 127.0.0.1:11434 (version 0.10.1)"
Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.659Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.671Z level=INFO source=amd_linux.go:386 msg="amdgpu is supported" gpu=GPU-0f6b9e8018204c06 gpu_type=gfx1200
Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.671Z level=INFO source=amd_linux.go:332 msg="filtering out device per user request" id=1 visible_devices=[0]
Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.674Z level=INFO source=types.go:130 msg="inference compute" id=GPU-0f6b9e8018204c06 library=rocm variant="" compute=gfx1200 driver=6.12 name=1002:7590 total="15.9 >
~

journalctl -u ollama.service -f:
Aug 03 18:21:57 ai ollama[40334]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Aug 03 18:21:57 ai ollama[40334]: time=2025-08-03T18:21:57.747Z level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Aug 03 18:22:00 ai ollama[40334]: load_tensors: offloading 40 repeating layers to GPU
Aug 03 18:22:00 ai ollama[40334]: load_tensors: offloading output layer to GPU
Aug 03 18:22:00 ai ollama[40334]: load_tensors: offloaded 41/41 layers to GPU
Aug 03 18:22:00 ai ollama[40334]: load_tensors:        ROCm0 model buffer size =  8423.47 MiB
Aug 03 18:22:00 ai ollama[40334]: load_tensors:   CPU_Mapped model buffer size =   417.30 MiB
Aug 03 18:22:01 ai ollama[40334]: llama_context: constructing llama_context
Aug 03 18:22:01 ai ollama[40334]: llama_context: n_seq_max     = 1
Aug 03 18:22:01 ai ollama[40334]: llama_context: n_ctx         = 4096
Aug 03 18:22:01 ai ollama[40334]: llama_context: n_ctx_per_seq = 4096
Aug 03 18:22:01 ai ollama[40334]: llama_context: n_batch       = 512
Aug 03 18:22:01 ai ollama[40334]: llama_context: n_ubatch      = 512
Aug 03 18:22:01 ai ollama[40334]: llama_context: causal_attn   = 1
Aug 03 18:22:01 ai ollama[40334]: llama_context: flash_attn    = 0
Aug 03 18:22:01 ai ollama[40334]: llama_context: freq_base     = 1000000.0
Aug 03 18:22:01 ai ollama[40334]: llama_context: freq_scale    = 1
Aug 03 18:22:01 ai ollama[40334]: llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
Aug 03 18:22:01 ai ollama[40334]: llama_context:  ROCm_Host  output buffer size =     0.60 MiB
Aug 03 18:22:01 ai ollama[40334]: llama_kv_cache_unified: kv_size = 4096, type_k = 'f16', type_v = 'f16', n_layer = 40, can_shift = 1, padding = 32
Aug 03 18:22:02 ai ollama[40334]: llama_kv_cache_unified:      ROCm0 KV buffer size =   640.00 MiB
Aug 03 18:22:02 ai ollama[40334]: llama_kv_cache_unified: KV self size  =  640.00 MiB, K (f16):  320.00 MiB, V (f16):  320.00 MiB
Aug 03 18:22:02 ai ollama[40334]: llama_context:      ROCm0 compute buffer size =   368.00 MiB
Aug 03 18:22:02 ai ollama[40334]: llama_context:  ROCm_Host compute buffer size =    18.01 MiB
Aug 03 18:22:02 ai ollama[40334]: llama_context: graph nodes  = 1526
Aug 03 18:22:02 ai ollama[40334]: llama_context: graph splits = 2
Aug 03 18:22:02 ai ollama[40334]: time=2025-08-03T18:22:02.521Z level=INFO source=server.go:637 msg="llama runner started in 6.03 seconds"
Aug 03 18:22:02 ai ollama[40334]: rocblaslt error: Could not load /usr/local/lib/ollama/rocm/hipblaslt/library/TensileLibrary_lazy_gfx1200.dat
Aug 03 18:22:02 ai ollama[40334]: hipBLASLt error: Heuristic Fetch Failed!
Aug 03 18:22:02 ai ollama[40334]: This message will be only be displayed once, unless the ROCBLAS_VERBOSE_HIPBLASLT_ERROR environment variable is set.
Aug 03 18:22:02 ai ollama[40334]: rocBLAS warning: hipBlasLT failed, falling back to tensile.
Aug 03 18:22:02 ai ollama[40334]: This message will be only be displayed once, unless the ROCBLAS_VERBOSE_TENSILE_ERROR environment variable is set.

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @paugh7 on GitHub (Aug 3, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11652 ### What is the issue? Ollama offloads to the CPU even though my GPU has more than enough memory to support the model. Rocm drivers installed from AMD website before installing ollama. TensileLibrary_lazy_gfx1200.dat file verified to be present. GPU: AMD Radeon RX 9060 XT 16GB OS: Ubuntu 24.0.4LTS ROCM: 6.4 Ollama: 0.10.1 ### Relevant log output ```shell Ollama.service: ollama.service - Ollama Service Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled) Drop-In: /etc/systemd/system/ollama.service.d └─override.conf Active: active (running) since Sun 2025-08-03 18:28:19 UTC; 1s ago Main PID: 1117215 (ollama) Tasks: 13 (limit: 18550) Memory: 12.0M (peak: 12.2M) CPU: 23ms CGroup: /system.slice/ollama.service └─1117215 /usr/local/bin/ollama serve Aug 03 18:28:19 ai systemd[1]: Started ollama.service - Ollama Service. Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.658Z level=INFO source=routes.go:1238 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_P> Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.659Z level=INFO source=images.go:476 msg="total blobs: 10" Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.659Z level=INFO source=images.go:483 msg="total unused blobs removed: 0" Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.659Z level=INFO source=routes.go:1291 msg="Listening on 127.0.0.1:11434 (version 0.10.1)" Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.659Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.671Z level=INFO source=amd_linux.go:386 msg="amdgpu is supported" gpu=GPU-0f6b9e8018204c06 gpu_type=gfx1200 Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.671Z level=INFO source=amd_linux.go:332 msg="filtering out device per user request" id=1 visible_devices=[0] Aug 03 18:28:19 ai ollama[1117215]: time=2025-08-03T18:28:19.674Z level=INFO source=types.go:130 msg="inference compute" id=GPU-0f6b9e8018204c06 library=rocm variant="" compute=gfx1200 driver=6.12 name=1002:7590 total="15.9 > ~ journalctl -u ollama.service -f: Aug 03 18:21:57 ai ollama[40334]: load_tensors: loading model tensors, this can take a while... (mmap = true) Aug 03 18:21:57 ai ollama[40334]: time=2025-08-03T18:21:57.747Z level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Aug 03 18:22:00 ai ollama[40334]: load_tensors: offloading 40 repeating layers to GPU Aug 03 18:22:00 ai ollama[40334]: load_tensors: offloading output layer to GPU Aug 03 18:22:00 ai ollama[40334]: load_tensors: offloaded 41/41 layers to GPU Aug 03 18:22:00 ai ollama[40334]: load_tensors: ROCm0 model buffer size = 8423.47 MiB Aug 03 18:22:00 ai ollama[40334]: load_tensors: CPU_Mapped model buffer size = 417.30 MiB Aug 03 18:22:01 ai ollama[40334]: llama_context: constructing llama_context Aug 03 18:22:01 ai ollama[40334]: llama_context: n_seq_max = 1 Aug 03 18:22:01 ai ollama[40334]: llama_context: n_ctx = 4096 Aug 03 18:22:01 ai ollama[40334]: llama_context: n_ctx_per_seq = 4096 Aug 03 18:22:01 ai ollama[40334]: llama_context: n_batch = 512 Aug 03 18:22:01 ai ollama[40334]: llama_context: n_ubatch = 512 Aug 03 18:22:01 ai ollama[40334]: llama_context: causal_attn = 1 Aug 03 18:22:01 ai ollama[40334]: llama_context: flash_attn = 0 Aug 03 18:22:01 ai ollama[40334]: llama_context: freq_base = 1000000.0 Aug 03 18:22:01 ai ollama[40334]: llama_context: freq_scale = 1 Aug 03 18:22:01 ai ollama[40334]: llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized Aug 03 18:22:01 ai ollama[40334]: llama_context: ROCm_Host output buffer size = 0.60 MiB Aug 03 18:22:01 ai ollama[40334]: llama_kv_cache_unified: kv_size = 4096, type_k = 'f16', type_v = 'f16', n_layer = 40, can_shift = 1, padding = 32 Aug 03 18:22:02 ai ollama[40334]: llama_kv_cache_unified: ROCm0 KV buffer size = 640.00 MiB Aug 03 18:22:02 ai ollama[40334]: llama_kv_cache_unified: KV self size = 640.00 MiB, K (f16): 320.00 MiB, V (f16): 320.00 MiB Aug 03 18:22:02 ai ollama[40334]: llama_context: ROCm0 compute buffer size = 368.00 MiB Aug 03 18:22:02 ai ollama[40334]: llama_context: ROCm_Host compute buffer size = 18.01 MiB Aug 03 18:22:02 ai ollama[40334]: llama_context: graph nodes = 1526 Aug 03 18:22:02 ai ollama[40334]: llama_context: graph splits = 2 Aug 03 18:22:02 ai ollama[40334]: time=2025-08-03T18:22:02.521Z level=INFO source=server.go:637 msg="llama runner started in 6.03 seconds" Aug 03 18:22:02 ai ollama[40334]: rocblaslt error: Could not load /usr/local/lib/ollama/rocm/hipblaslt/library/TensileLibrary_lazy_gfx1200.dat Aug 03 18:22:02 ai ollama[40334]: hipBLASLt error: Heuristic Fetch Failed! Aug 03 18:22:02 ai ollama[40334]: This message will be only be displayed once, unless the ROCBLAS_VERBOSE_HIPBLASLT_ERROR environment variable is set. Aug 03 18:22:02 ai ollama[40334]: rocBLAS warning: hipBlasLT failed, falling back to tensile. Aug 03 18:22:02 ai ollama[40334]: This message will be only be displayed once, unless the ROCBLAS_VERBOSE_TENSILE_ERROR environment variable is set. ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-29 05:23:52 -05:00
Author
Owner

@tmerten commented on GitHub (Aug 5, 2025):

FYI: This GPU works for me with current ROCM drivers (6.4.2) in 25.04.
Could it be that your kernel is to old? (I recall that this GPU is not well supported in 24.04. as it is rather new)

<!-- gh-comment-id:3156339810 --> @tmerten commented on GitHub (Aug 5, 2025): FYI: This GPU works for me with current ROCM drivers (6.4.2) in 25.04. Could it be that your kernel is to old? (I recall that this GPU is not well supported in 24.04. as it is rather new)
Author
Owner

@paugh7 commented on GitHub (Aug 6, 2025):

I just reimaged my box to 25.04 and have the same error message. Ollama installed via the script

<!-- gh-comment-id:3157110047 --> @paugh7 commented on GitHub (Aug 6, 2025): I just reimaged my box to 25.04 and have the same error message. Ollama installed via the script
Author
Owner

@ndrewpj commented on GitHub (Aug 12, 2025):

Could not load /usr/local/lib/ollama/rocm/hipblaslt/library/TensileLibrary_lazy_gfx1200.dat

I had such issue, if you Google it there is a way of linking this . dat file to another in the same folder and this dies the trick

<!-- gh-comment-id:3179857234 --> @ndrewpj commented on GitHub (Aug 12, 2025): Could not load /usr/local/lib/ollama/rocm/hipblaslt/library/TensileLibrary_lazy_gfx1200.dat I had such issue, if you Google it there is a way of linking this . dat file to another in the same folder and this dies the trick
Author
Owner

@trumblejoe commented on GitHub (Aug 28, 2025):

Fresh install in a VM; 9060 XT works for gaming, other apps etc but is not detected by Ollama.

<!-- gh-comment-id:3230554846 --> @trumblejoe commented on GitHub (Aug 28, 2025): Fresh install in a VM; 9060 XT works for gaming, other apps etc but is not detected by Ollama.
Author
Owner

@dhiltgen commented on GitHub (Mar 11, 2026):

Release 0.17.8 updates Linux to ROCm v7 which covers support for this GPU. Please give the RC a try and let us know if you run into any problems.

<!-- gh-comment-id:4041968158 --> @dhiltgen commented on GitHub (Mar 11, 2026): Release 0.17.8 updates Linux to ROCm v7 which covers support for this GPU. Please give the [RC a try](https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions) and let us know if you run into any problems.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54214