[GH-ISSUE #15452] rocblaslt error: Cannot read "TensileLibrary_lazy_gfx1200.dat": No such file or directory #9877

Open
opened 2026-04-12 22:44:22 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @tchwpkgorg on GitHub (Apr 9, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15452

What is the issue?

I'm getting an error with ollama 0.20.4 and Radeon RX 9060 XT. I'm getting the same error in ollama from docker (ollama/ollama:rocm) and ollama started from systemd (installed from https://ollama.com/install.sh).

I've used strace on ollama process - and while it initially does not find the file, it is able to access it eventually:

[pid 11959] newfstatat(AT_FDCWD, "TensileLibrary_lazy_gfx1200.dat", 0x7edce76fa260, 0) = -1 ENOENT (No such file or directory)
[pid 11959] write(2, ""TensileLibrary_lazy_gfx1200.dat"..., 33) = 33
[pid 11959] openat(AT_FDCWD, "TensileLibrary_lazy_gfx1200.dat", O_RDONLY <unfinished ...>
[pid 11959] write(2, ""TensileLibrary_lazy_gfx1200.dat"..., 33 <unfinished ...>
[pid 11959] access("/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1200.dat", R_OK) = 0
[pid 11963] openat(AT_FDCWD, "/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1200.dat", O_RDONLY) = 10

ollama log below:

Relevant log output

time=2026-04-09T14:34:01.253Z level=INFO source=sched.go:484 msg="system memory" total="30.6 GiB" free="15.4 GiB" free_swap="0 B"
time=2026-04-09T14:34:01.253Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-44d2e7be1f7e4650 library=ROCm available="15.4 GiB" free="15.9 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-09T14:34:01.253Z level=INFO source=server.go:771 msg="loading model" "model layers"=49 requested=-1
time=2026-04-09T14:34:01.266Z level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-09T14:34:01.266Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:39699"
time=2026-04-09T14:34:01.275Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:49[ID:GPU-44d2e7be1f7e4650 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-09T14:34:01.305Z level=INFO source=ggml.go:136 msg="" architecture=qwen3moe file_type=Q4_K_M name="Qwen3 Coder 30B A3B Instruct" description="" num_tensors=579 num_key_values=35
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, gfx1200 (0x1200), VMM: no, Wave Size: 32, ID: GPU-44d2e7be1f7e4650
load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
time=2026-04-09T14:34:03.289Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)

rocblaslt error: Cannot read "TensileLibrary_lazy_gfx1200.dat": No such file or directory

rocblaslt error: Could not load "TensileLibrary_lazy_gfx1200.dat"
time=2026-04-09T14:34:05.908Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:43[ID:GPU-44d2e7be1f7e4650 Layers:43(5..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-09T14:34:06.001Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:42[ID:GPU-44d2e7be1f7e4650 Layers:42(6..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-09T14:34:06.087Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:42[ID:GPU-44d2e7be1f7e4650 Layers:42(6..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-09T14:34:06.219Z level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:42[ID:GPU-44d2e7be1f7e4650 Layers:42(6..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-09T14:34:06.219Z level=INFO source=ggml.go:482 msg="offloading 42 repeating layers to GPU"
time=2026-04-09T14:34:06.219Z level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-04-09T14:34:06.219Z level=INFO source=ggml.go:494 msg="offloaded 42/49 layers to GPU"

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

0.20.4

Originally created by @tchwpkgorg on GitHub (Apr 9, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15452 ### What is the issue? I'm getting an error with ollama 0.20.4 and Radeon RX 9060 XT. I'm getting the same error in ollama from docker (ollama/ollama:rocm) and ollama started from systemd (installed from https://ollama.com/install.sh). I've used strace on ollama process - and while it initially does not find the file, it is able to access it eventually: [pid 11959] newfstatat(AT_FDCWD, "TensileLibrary_lazy_gfx1200.dat", 0x7edce76fa260, 0) = -1 ENOENT (No such file or directory) [pid 11959] write(2, "\"TensileLibrary_lazy_gfx1200.dat"..., 33) = 33 [pid 11959] openat(AT_FDCWD, "TensileLibrary_lazy_gfx1200.dat", O_RDONLY <unfinished ...> [pid 11959] write(2, "\"TensileLibrary_lazy_gfx1200.dat"..., 33 <unfinished ...> [pid 11959] access("/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1200.dat", R_OK) = 0 [pid 11963] openat(AT_FDCWD, "/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1200.dat", O_RDONLY) = 10 ollama log below: ### Relevant log output ```shell time=2026-04-09T14:34:01.253Z level=INFO source=sched.go:484 msg="system memory" total="30.6 GiB" free="15.4 GiB" free_swap="0 B" time=2026-04-09T14:34:01.253Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-44d2e7be1f7e4650 library=ROCm available="15.4 GiB" free="15.9 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-04-09T14:34:01.253Z level=INFO source=server.go:771 msg="loading model" "model layers"=49 requested=-1 time=2026-04-09T14:34:01.266Z level=INFO source=runner.go:1417 msg="starting ollama engine" time=2026-04-09T14:34:01.266Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:39699" time=2026-04-09T14:34:01.275Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:49[ID:GPU-44d2e7be1f7e4650 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-09T14:34:01.305Z level=INFO source=ggml.go:136 msg="" architecture=qwen3moe file_type=Q4_K_M name="Qwen3 Coder 30B A3B Instruct" description="" num_tensors=579 num_key_values=35 load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1200 (0x1200), VMM: no, Wave Size: 32, ID: GPU-44d2e7be1f7e4650 load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2026-04-09T14:34:03.289Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) rocblaslt error: Cannot read "TensileLibrary_lazy_gfx1200.dat": No such file or directory rocblaslt error: Could not load "TensileLibrary_lazy_gfx1200.dat" time=2026-04-09T14:34:05.908Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:43[ID:GPU-44d2e7be1f7e4650 Layers:43(5..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-09T14:34:06.001Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:42[ID:GPU-44d2e7be1f7e4650 Layers:42(6..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-09T14:34:06.087Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:42[ID:GPU-44d2e7be1f7e4650 Layers:42(6..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-09T14:34:06.219Z level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:42[ID:GPU-44d2e7be1f7e4650 Layers:42(6..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-09T14:34:06.219Z level=INFO source=ggml.go:482 msg="offloading 42 repeating layers to GPU" time=2026-04-09T14:34:06.219Z level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-04-09T14:34:06.219Z level=INFO source=ggml.go:494 msg="offloaded 42/49 layers to GPU" ``` ### OS Linux ### GPU AMD ### CPU Intel ### Ollama version 0.20.4
GiteaMirror added the bug label 2026-04-12 22:44:22 -05:00
Author
Owner

@HEETMEHTA18 commented on GitHub (Apr 9, 2026):

@tchwpkgorg The error is caused by rocBLASLt library path resolution, not by a genuinely missing file.
It first tries a relative lookup for TensileLibrary_lazy_gfx1200.dat and logs ENOENT.

Then it successfully accesses the real file at /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1200.dat.

So the root issue is: the runner starts without an explicit Tensile library path, rocBLASLt does a fragile first lookup, logs failure, and this can disrupt early GPU offload initialization.
so there can be introduced a permanent, reliable solution to it if needed.

<!-- gh-comment-id:4215745350 --> @HEETMEHTA18 commented on GitHub (Apr 9, 2026): @tchwpkgorg The error is caused by rocBLASLt library path resolution, not by a genuinely missing file. It first tries a relative lookup for TensileLibrary_lazy_gfx1200.dat and logs ENOENT. Then it successfully accesses the real file at /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1200.dat. So the root issue is: the runner starts without an explicit Tensile library path, rocBLASLt does a fragile first lookup, logs failure, and this can disrupt early GPU offload initialization. so there can be introduced a permanent, reliable solution to it if needed.
Author
Owner

@Jasdfgh commented on GitHub (Apr 10, 2026):

looking at your strace, the file does exist at /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1200.dat and later succeeds when it tries the full path. so this looks like a working directory / path resolution issue rather than a missing file.

from what I've seen across other RDNA4 setups, there's an open PR (ollama/ollama#14979) that fixes hipBLASLt packaging for gfx1200/1201. the path handling for Tensile libraries might be part of the same packaging gap. worth watching that PR.

is inference actually failing for you, or is it just the error message? your log shows it did manage to offload 42/49 layers after the error.

<!-- gh-comment-id:4221923491 --> @Jasdfgh commented on GitHub (Apr 10, 2026): looking at your strace, the file does exist at /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1200.dat and later succeeds when it tries the full path. so this looks like a working directory / path resolution issue rather than a missing file. from what I've seen across other RDNA4 setups, there's an open PR (ollama/ollama#14979) that fixes hipBLASLt packaging for gfx1200/1201. the path handling for Tensile libraries might be part of the same packaging gap. worth watching that PR. is inference actually failing for you, or is it just the error message? your log shows it did manage to offload 42/49 layers after the error.
Author
Owner

@HEETMEHTA18 commented on GitHub (Apr 10, 2026):

@Jasdfgh Your mention of PR #14979 is on point. That PR fixes the hipBLASLt packaging gap. My fix addresses the companion problem: once hipBLASLt is packaged correctly, we should have explicit path resolution to avoid the relative lookup failure entirely.

Two-Part Solution

PR #14979 (packaging): Ensures hipBLASLt kernels are actually installed in lib/ollama/rocm/hipblaslt/

My patch fix (in llm/server.go): Injects explicit Tensile paths at runner startup:

The pr #14979 solved that the kernal packages are installed but this patch solves that they are found reliable or not.

<!-- gh-comment-id:4222256046 --> @HEETMEHTA18 commented on GitHub (Apr 10, 2026): @Jasdfgh Your mention of PR #14979 is on point. That PR fixes the hipBLASLt packaging gap. My fix addresses the companion problem: once hipBLASLt is packaged correctly, we should have explicit path resolution to avoid the relative lookup failure entirely. Two-Part Solution PR #14979 (packaging): Ensures hipBLASLt kernels are actually installed in `lib/ollama/rocm/hipblaslt/` My patch fix (in `llm/server.go`): Injects explicit Tensile paths at runner startup: The pr #14979 solved that the kernal packages are installed but this patch solves that they are found reliable or not.
Author
Owner

@tchwpkgorg commented on GitHub (Apr 10, 2026):

is inference actually failing for you, or is it just the error message?

The GPU is used, so I think it's working. But I'm not able to say if optimally etc.

<!-- gh-comment-id:4222581349 --> @tchwpkgorg commented on GitHub (Apr 10, 2026): > is inference actually failing for you, or is it just the error message? The GPU is used, so I think it's working. But I'm not able to say if optimally etc.
Author
Owner

@HEETMEHTA18 commented on GitHub (Apr 10, 2026):

Thanks for confirming. Sounds like inference is working and this is mostly the rocBLASLt init/path lookup noise. I’ll keep working on a proper fix for the Tensile path handling and I’ll share the update once I’ve got the issue

<!-- gh-comment-id:4222804841 --> @HEETMEHTA18 commented on GitHub (Apr 10, 2026): Thanks for confirming. Sounds like inference is working and this is mostly the rocBLASLt init/path lookup noise. I’ll keep working on a proper fix for the Tensile path handling and I’ll share the update once I’ve got the issue
Author
Owner

@HEETMEHTA18 commented on GitHub (Apr 11, 2026):

@tchwpkgorg thanks for the detailed strace. This looks like a ROCm path-resolution issue rather than a genuinely missing file.

What was happening is:

  1. rocBLASLt first tries a relative lookup for TensileLibrary_lazy_gfx1200.dat and logs ENOENT.
  2. It then falls back to the absolute path under /usr/local/lib/ollama/rocm/rocblas/library/ and continues successfully.

So the warning is noisy, but in your log inference is still working. The GPU is being used and the model still offloads layers after the message.

I addressed this in server.go by resolving the ROCm Tensile library directory at startup and setting these env vars when they are not already provided:

  • ROCBLAS_TENSILE_LIBPATH and other env variables.

That makes the lookup deterministic across systemd, Docker, and manual runs and avoids the fragile fallback path that triggers the warning.

This is complementary to PR #14979, which fixes the packaging side for the hipBLASLt kernels. The runtime path fix handles the lookup side so the libraries are found reliably once they are installed.

I also added tests in server_test.go to cover Tensile path detection and env injection behavior.
and i am ready to merge the code with the changes into the codebase . shall i ?

<!-- gh-comment-id:4229423709 --> @HEETMEHTA18 commented on GitHub (Apr 11, 2026): @tchwpkgorg thanks for the detailed strace. This looks like a ROCm path-resolution issue rather than a genuinely missing file. What was happening is: 1. rocBLASLt first tries a relative lookup for `TensileLibrary_lazy_gfx1200.dat` and logs `ENOENT`. 2. It then falls back to the absolute path under `/usr/local/lib/ollama/rocm/rocblas/library/` and continues successfully. So the warning is noisy, but in your log inference is still working. The GPU is being used and the model still offloads layers after the message. I addressed this in server.go by resolving the ROCm Tensile library directory at startup and setting these env vars when they are not already provided: - `ROCBLAS_TENSILE_LIBPATH` and other env variables. That makes the lookup deterministic across systemd, Docker, and manual runs and avoids the fragile fallback path that triggers the warning. This is complementary to PR #14979, which fixes the packaging side for the hipBLASLt kernels. The runtime path fix handles the lookup side so the libraries are found reliably once they are installed. I also added tests in server_test.go to cover Tensile path detection and env injection behavior. and i am ready to merge the code with the changes into the codebase . shall i ?
Author
Owner

@HEETMEHTA18 commented on GitHub (Apr 12, 2026):

@tchwpkgorg Can I add the commit? into the code base if you approve it then I will add the commit.

<!-- gh-comment-id:4231925566 --> @HEETMEHTA18 commented on GitHub (Apr 12, 2026): @tchwpkgorg Can I add the commit? into the code base if you approve it then I will add the commit.
Author
Owner

@tchwpkgorg commented on GitHub (Apr 12, 2026):

@HEETMEHTA18 sure, please go on

<!-- gh-comment-id:4232536696 --> @tchwpkgorg commented on GitHub (Apr 12, 2026): @HEETMEHTA18 sure, please go on
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9877