[GH-ISSUE #15343] Qwen3.5:9b rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98 ggml_cuda_compute_forward: SOLVE_TRI failed ROCm error: invalid device function #71875

Closed
opened 2026-05-05 02:47:58 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @DaveyBonez on GitHub (Apr 5, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15343

What is the issue?

Error
500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details

windows 11 GPU: RX 9060 XT
only happens with Qwen models.

Relevant log output

time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-03-30T13:42:49.544-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 56996"
time=2026-03-30T13:42:49.544-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\VSCode\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\PowerShell\\7\\;C:\\Program Files\\AMD\\ROCm\\7.1\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\Ollama;C:\\Users\\DaveyBoneZ\\.lmstudio\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Python\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0
time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=367.4097ms OLLAMA_LIBRARY_PATH="[C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[HIP_VISIBLE_DEVICES:0]
time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=367.9302ms
time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16
time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:229 msg="loading first model" model=C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c
time=2026-03-30T13:42:49.977-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.013-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-30T13:42:50.017-04:00 level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-03-30T13:42:50.018-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\DaveyBoneZ\\.ollama\\models\\blobs\\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 57002"
time=2026-03-30T13:42:50.018-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\VSCode\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\PowerShell\\7\\;C:\\Program Files\\AMD\\ROCm\\7.1\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\Ollama;C:\\Users\\DaveyBoneZ\\.lmstudio\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Python\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0
time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:484 msg="system memory" total="31.9 GiB" free="22.8 GiB" free_swap="25.3 GiB"
time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=ROCm available="14.4 GiB" free="14.8 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-30T13:42:50.021-04:00 level=INFO source=server.go:759 msg="loading model" "model layers"=33 requested=-1
time=2026-03-30T13:42:50.051-04:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-30T13:42:50.052-04:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:57002"
time=2026-03-30T13:42:50.063-04:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:33[ID:0 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-30T13:42:50.100-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-30T13:42:50.103-04:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52
time=2026-03-30T13:42:50.103-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2026-03-30T13:42:50.116-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 9060 XT, gfx1200 (0x1200), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2026-03-30T13:42:50.142-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-30T13:42:50.544-04:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1
rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98
ggml_cuda_compute_forward: SOLVE_TRI failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2882
  err
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:57002/load\": read tcp 127.0.0.1:57007->127.0.0.1:57002: wsarecv: An existing connection was forcibly closed by the remote host."
time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:57002/load\": dial tcp 127.0.0.1:57002: connectex: No connection could be made because the target machine actively refused it."
time=2026-03-30T13:42:51.743-04:00 level=INFO source=sched.go:511 msg="Load failed" model=C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
time=2026-03-30T13:42:51.743-04:00 level=DEBUG source=server.go:1832 msg="stopping llama server" pid=2444
[GIN] 2026/03/30 - 13:42:51 | 500 |    2.3317189s |       127.0.0.1 | POST     "/api/chat"
time=2026-03-30T13:42:51.764-04:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1"
[GIN] 2026/03/30 - 13:43:19 | 200 |      1.5188ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/03/30 - 13:43:50 | 200 |       505.2µs |       127.0.0.1 | GET      "/api/tags"

OS

Win 11 pro

GPU

RX 9060 XT

CPU

AMD Ryzen 7 5800X3D

Ollama version

v0.18.3 > Current

Originally created by @DaveyBonez on GitHub (Apr 5, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15343 ### What is the issue? Error 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details windows 11 GPU: RX 9060 XT only happens with Qwen models. ### Relevant log output ```shell time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:264 msg="refreshing free memory" time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" time=2026-03-30T13:42:49.544-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 56996" time=2026-03-30T13:42:49.544-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\VSCode\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\PowerShell\\7\\;C:\\Program Files\\AMD\\ROCm\\7.1\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\Ollama;C:\\Users\\DaveyBoneZ\\.lmstudio\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Python\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0 time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=367.4097ms OLLAMA_LIBRARY_PATH="[C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[HIP_VISIBLE_DEVICES:0] time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=367.9302ms time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16 time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:229 msg="loading first model" model=C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c time=2026-03-30T13:42:49.977-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-30T13:42:50.013-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0 time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0 time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default="" time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default="" time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-30T13:42:50.017-04:00 level=INFO source=server.go:247 msg="enabling flash attention" time=2026-03-30T13:42:50.018-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\DaveyBoneZ\\.ollama\\models\\blobs\\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 57002" time=2026-03-30T13:42:50.018-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\VSCode\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\PowerShell\\7\\;C:\\Program Files\\AMD\\ROCm\\7.1\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\Ollama;C:\\Users\\DaveyBoneZ\\.lmstudio\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Python\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0 time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:484 msg="system memory" total="31.9 GiB" free="22.8 GiB" free_swap="25.3 GiB" time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=ROCm available="14.4 GiB" free="14.8 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-30T13:42:50.021-04:00 level=INFO source=server.go:759 msg="loading model" "model layers"=33 requested=-1 time=2026-03-30T13:42:50.051-04:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-30T13:42:50.052-04:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:57002" time=2026-03-30T13:42:50.063-04:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:33[ID:0 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-30T13:42:50.100-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-30T13:42:50.103-04:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52 time=2026-03-30T13:42:50.103-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama load_backend: loaded CPU backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2026-03-30T13:42:50.116-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 9060 XT, gfx1200 (0x1200), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll time=2026-03-30T13:42:50.142-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default="" time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default="" time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-30T13:42:50.544-04:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1 rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98 ggml_cuda_compute_forward: SOLVE_TRI failed ROCm error: invalid device function current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2882 err C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:57002/load\": read tcp 127.0.0.1:57007->127.0.0.1:57002: wsarecv: An existing connection was forcibly closed by the remote host." time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:57002/load\": dial tcp 127.0.0.1:57002: connectex: No connection could be made because the target machine actively refused it." time=2026-03-30T13:42:51.743-04:00 level=INFO source=sched.go:511 msg="Load failed" model=C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" time=2026-03-30T13:42:51.743-04:00 level=DEBUG source=server.go:1832 msg="stopping llama server" pid=2444 [GIN] 2026/03/30 - 13:42:51 | 500 | 2.3317189s | 127.0.0.1 | POST "/api/chat" time=2026-03-30T13:42:51.764-04:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1" [GIN] 2026/03/30 - 13:43:19 | 200 | 1.5188ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/30 - 13:43:50 | 200 | 505.2µs | 127.0.0.1 | GET "/api/tags" ``` ### OS Win 11 pro ### GPU RX 9060 XT ### CPU AMD Ryzen 7 5800X3D ### Ollama version [v0.18.3](https://github.com/ollama/ollama/releases/tag/v0.18.3) > Current
GiteaMirror added the bug label 2026-05-05 02:47:58 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71875