[GH-ISSUE #10418] "ggml_cuda_compute_forward: SCALE failed > ROCm error: invalid device function" when running with 6750XT with Windows 10 #32606

Closed
opened 2026-04-22 14:05:44 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @adrianhowchin on GitHub (Apr 26, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10418

What is the issue?

I'm running Ollama on Windows 10 with an Intel i5-13600KF and AMD Radeon 6750XT.

I have followed the instructions at "https://github.com/likelovewant/ollama-for-amd" as my GPU is unsupported. This used to work with "rocm.gfx1031.for.hip.sdk.6.1.2" back in February 2025, however since I have updated Ollama to 0.6.6 this is no longer working.

My current process:

  1. Install Ollama 0.6.6
  2. Download the "rocm gfx1031 for hip skd 6.2.4 (littlewu's logic)" drivers from "https://github.com/likelovewant/ollama-for-amd"
  3. Replace the rocblas.dll under: C:\Users\Adrian\AppData\Local\Programs\Ollama\lib\ollama\rocm
  4. Replace the library folder under: C:\Users\Adrian\AppData\Local\Programs\Ollama\lib\ollama\rocm\rocblas
  5. Reboot my machine
  6. Open a Command Prompt
  7. Enter "ollama run gemma3:4b"
  8. The prompt appears successfully. By this point the logfile is up to line 57: "[GIN] 2025/04/26 - 12:23:05 | 200 | 3.2309558s ..."
  9. I type in my prompt and press enter. I immediately receive what looks like some kind of network error in the DOS prompt, and the final lines of the log appear in server.log:

time=2025-04-26T12:23:31.774+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ggml_cuda_compute_forward: SCALE failed
ROCm error: invalid device function
current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2374
err
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:75: ROCm error
[GIN] 2025/04/26 - 12:23:31 | 200 | 345.2718ms | 127.0.0.1 | POST "/api/chat"
time=2025-04-26T12:23:31.968+10:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409"

All of this is running on the same machine.

Note that if I set "HSA_OVERRIDE_GFX_VERSION=10.3.1" then Ollama reports that no GPU has been found, and runs on the CPU.

I've since updated my AMD Adrenaline software to 25.3.1, and it still gives the same error. The only difference I could see was in the version of the ROCm driver reported, changing from 6.2 to 6.3.

Relevant log output

2025/04/26 12:22:21 routes.go:1232: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:1h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\Adrian\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-04-26T12:22:21.862+10:00 level=INFO source=images.go:458 msg="total blobs: 14"
time=2025-04-26T12:22:21.863+10:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-04-26T12:22:21.865+10:00 level=INFO source=routes.go:1299 msg="Listening on [::]:11434 (version 0.6.6)"
time=2025-04-26T12:22:21.867+10:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-04-26T12:22:21.869+10:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-04-26T12:22:21.869+10:00 level=INFO source=gpu_windows.go:183 msg="efficiency cores detected" maxEfficiencyClass=1
time=2025-04-26T12:22:21.869+10:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=14 efficiency=8 threads=20
time=2025-04-26T12:22:22.575+10:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1031 driver=6.3 name="AMD Radeon RX 6750 XT" total="12.0 GiB" available="11.8 GiB"
[GIN] 2025/04/26 - 12:23:02 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-04-26T12:23:02.487+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-26T12:23:02.511+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/04/26 - 12:23:02 | 200 |     70.9229ms |       127.0.0.1 | POST     "/api/show"
time=2025-04-26T12:23:02.539+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-26T12:23:02.827+10:00 level=INFO source=sched.go:187 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency"
time=2025-04-26T12:23:02.849+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-26T12:23:02.872+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-26T12:23:02.875+10:00 level=INFO source=sched.go:722 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\Adrian\.ollama\models\blobs\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 gpu=0 parallel=4 available=12573671424 required="5.8 GiB"
time=2025-04-26T12:23:03.158+10:00 level=INFO source=server.go:105 msg="system memory" total="31.8 GiB" free="26.8 GiB" free_swap="25.6 GiB"
time=2025-04-26T12:23:03.160+10:00 level=INFO source=server.go:138 msg=offload library=rocm layers.requested=-1 layers.model=35 layers.offload=35 layers.split="" memory.available="[11.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.8 GiB" memory.required.partial="5.8 GiB" memory.required.kv="682.0 MiB" memory.required.allocations="[5.8 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-04-26T12:23:03.227+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-26T12:23:03.231+10:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-04-26T12:23:03.235+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-04-26T12:23:03.235+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-04-26T12:23:03.235+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-04-26T12:23:03.235+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-04-26T12:23:03.235+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-04-26T12:23:03.239+10:00 level=INFO source=server.go:405 msg="starting llama server" cmd="C:\\Users\\Adrian\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Adrian\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --n-gpu-layers 35 --threads 6 --parallel 4 --port 49788"
time=2025-04-26T12:23:03.241+10:00 level=INFO source=sched.go:451 msg="loaded runners" count=1
time=2025-04-26T12:23:03.241+10:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
time=2025-04-26T12:23:03.241+10:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2025-04-26T12:23:03.253+10:00 level=INFO source=runner.go:866 msg="starting ollama engine"
time=2025-04-26T12:23:03.254+10:00 level=INFO source=runner.go:929 msg="Server listening on 127.0.0.1:49788"
time=2025-04-26T12:23:03.312+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-26T12:23:03.313+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-04-26T12:23:03.313+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-04-26T12:23:03.313+10:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
time=2025-04-26T12:23:03.493+10:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6750 XT, gfx1031 (0x1031), VMM: no, Wave Size: 32
load_backend: loaded ROCm backend from C:\Users\Adrian\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll
load_backend: loaded CPU backend from C:\Users\Adrian\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2025-04-26T12:23:03.576+10:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-04-26T12:23:03.972+10:00 level=INFO source=ggml.go:298 msg="model weights" buffer=ROCm0 size="3.1 GiB"
time=2025-04-26T12:23:03.972+10:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="525.0 MiB"
time=2025-04-26T12:23:05.537+10:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-04-26T12:23:05.541+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-04-26T12:23:05.541+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-04-26T12:23:05.541+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-04-26T12:23:05.541+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-04-26T12:23:05.541+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-04-26T12:23:05.609+10:00 level=INFO source=ggml.go:556 msg="compute graph" backend=ROCm0 buffer_type=ROCm0 size="162.0 MiB"
time=2025-04-26T12:23:05.609+10:00 level=INFO source=ggml.go:556 msg="compute graph" backend=CPU buffer_type=CPU size="5.0 MiB"
time=2025-04-26T12:23:05.746+10:00 level=INFO source=server.go:619 msg="llama runner started in 2.50 seconds"
[GIN] 2025/04/26 - 12:23:05 | 200 |    3.2309558s |       127.0.0.1 | POST     "/api/generate"
time=2025-04-26T12:23:31.774+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ggml_cuda_compute_forward: SCALE failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2374
  err
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:75: ROCm error
[GIN] 2025/04/26 - 12:23:31 | 200 |    345.2718ms |       127.0.0.1 | POST     "/api/chat"
time=2025-04-26T12:23:31.968+10:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409"

OS

Windows

GPU

AMD

CPU

Intel

Ollama version

0.6.6

Originally created by @adrianhowchin on GitHub (Apr 26, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10418 ### What is the issue? I'm running Ollama on Windows 10 with an Intel i5-13600KF and AMD Radeon 6750XT. I have followed the instructions at "https://github.com/likelovewant/ollama-for-amd" as my GPU is unsupported. This used to work with "rocm.gfx1031.for.hip.sdk.6.1.2" back in February 2025, however since I have updated Ollama to 0.6.6 this is no longer working. My current process: 1. Install Ollama 0.6.6 2. Download the "rocm gfx1031 for hip skd 6.2.4 (littlewu's logic)" drivers from "https://github.com/likelovewant/ollama-for-amd" 3. Replace the rocblas.dll under: C:\Users\Adrian\AppData\Local\Programs\Ollama\lib\ollama\rocm 4. Replace the library folder under: C:\Users\Adrian\AppData\Local\Programs\Ollama\lib\ollama\rocm\rocblas 5. Reboot my machine 6. Open a Command Prompt 7. Enter "ollama run gemma3:4b" 8. The prompt appears successfully. By this point the logfile is up to line 57: "[GIN] 2025/04/26 - 12:23:05 | 200 | 3.2309558s ..." 9. I type in my prompt and press enter. I immediately receive what looks like some kind of network error in the DOS prompt, and the final lines of the log appear in server.log: > time=2025-04-26T12:23:31.774+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 > ggml_cuda_compute_forward: SCALE failed > ROCm error: invalid device function > current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2374 > err > C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:75: ROCm error > [GIN] 2025/04/26 - 12:23:31 | 200 | 345.2718ms | 127.0.0.1 | POST "/api/chat" > time=2025-04-26T12:23:31.968+10:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409" All of this is running on the same machine. Note that if I set "HSA_OVERRIDE_GFX_VERSION=10.3.1" then Ollama reports that no GPU has been found, and runs on the CPU. I've since updated my AMD Adrenaline software to 25.3.1, and it still gives the same error. The only difference I could see was in the version of the ROCm driver reported, changing from 6.2 to 6.3. ### Relevant log output ```shell 2025/04/26 12:22:21 routes.go:1232: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:1h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\Adrian\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-04-26T12:22:21.862+10:00 level=INFO source=images.go:458 msg="total blobs: 14" time=2025-04-26T12:22:21.863+10:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0" time=2025-04-26T12:22:21.865+10:00 level=INFO source=routes.go:1299 msg="Listening on [::]:11434 (version 0.6.6)" time=2025-04-26T12:22:21.867+10:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-04-26T12:22:21.869+10:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-04-26T12:22:21.869+10:00 level=INFO source=gpu_windows.go:183 msg="efficiency cores detected" maxEfficiencyClass=1 time=2025-04-26T12:22:21.869+10:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=14 efficiency=8 threads=20 time=2025-04-26T12:22:22.575+10:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1031 driver=6.3 name="AMD Radeon RX 6750 XT" total="12.0 GiB" available="11.8 GiB" [GIN] 2025/04/26 - 12:23:02 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-04-26T12:23:02.487+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-26T12:23:02.511+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/04/26 - 12:23:02 | 200 | 70.9229ms | 127.0.0.1 | POST "/api/show" time=2025-04-26T12:23:02.539+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-26T12:23:02.827+10:00 level=INFO source=sched.go:187 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" time=2025-04-26T12:23:02.849+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-26T12:23:02.872+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-26T12:23:02.875+10:00 level=INFO source=sched.go:722 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\Adrian\.ollama\models\blobs\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 gpu=0 parallel=4 available=12573671424 required="5.8 GiB" time=2025-04-26T12:23:03.158+10:00 level=INFO source=server.go:105 msg="system memory" total="31.8 GiB" free="26.8 GiB" free_swap="25.6 GiB" time=2025-04-26T12:23:03.160+10:00 level=INFO source=server.go:138 msg=offload library=rocm layers.requested=-1 layers.model=35 layers.offload=35 layers.split="" memory.available="[11.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.8 GiB" memory.required.partial="5.8 GiB" memory.required.kv="682.0 MiB" memory.required.allocations="[5.8 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-04-26T12:23:03.227+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-26T12:23:03.231+10:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-04-26T12:23:03.235+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-04-26T12:23:03.235+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-04-26T12:23:03.235+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-04-26T12:23:03.235+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-04-26T12:23:03.235+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-04-26T12:23:03.239+10:00 level=INFO source=server.go:405 msg="starting llama server" cmd="C:\\Users\\Adrian\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Adrian\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --n-gpu-layers 35 --threads 6 --parallel 4 --port 49788" time=2025-04-26T12:23:03.241+10:00 level=INFO source=sched.go:451 msg="loaded runners" count=1 time=2025-04-26T12:23:03.241+10:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding" time=2025-04-26T12:23:03.241+10:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" time=2025-04-26T12:23:03.253+10:00 level=INFO source=runner.go:866 msg="starting ollama engine" time=2025-04-26T12:23:03.254+10:00 level=INFO source=runner.go:929 msg="Server listening on 127.0.0.1:49788" time=2025-04-26T12:23:03.312+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-26T12:23:03.313+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-04-26T12:23:03.313+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-04-26T12:23:03.313+10:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 time=2025-04-26T12:23:03.493+10:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 6750 XT, gfx1031 (0x1031), VMM: no, Wave Size: 32 load_backend: loaded ROCm backend from C:\Users\Adrian\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll load_backend: loaded CPU backend from C:\Users\Adrian\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll time=2025-04-26T12:23:03.576+10:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-04-26T12:23:03.972+10:00 level=INFO source=ggml.go:298 msg="model weights" buffer=ROCm0 size="3.1 GiB" time=2025-04-26T12:23:03.972+10:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="525.0 MiB" time=2025-04-26T12:23:05.537+10:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-04-26T12:23:05.541+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-04-26T12:23:05.541+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-04-26T12:23:05.541+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-04-26T12:23:05.541+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-04-26T12:23:05.541+10:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-04-26T12:23:05.609+10:00 level=INFO source=ggml.go:556 msg="compute graph" backend=ROCm0 buffer_type=ROCm0 size="162.0 MiB" time=2025-04-26T12:23:05.609+10:00 level=INFO source=ggml.go:556 msg="compute graph" backend=CPU buffer_type=CPU size="5.0 MiB" time=2025-04-26T12:23:05.746+10:00 level=INFO source=server.go:619 msg="llama runner started in 2.50 seconds" [GIN] 2025/04/26 - 12:23:05 | 200 | 3.2309558s | 127.0.0.1 | POST "/api/generate" time=2025-04-26T12:23:31.774+10:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ggml_cuda_compute_forward: SCALE failed ROCm error: invalid device function current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2374 err C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:75: ROCm error [GIN] 2025/04/26 - 12:23:31 | 200 | 345.2718ms | 127.0.0.1 | POST "/api/chat" time=2025-04-26T12:23:31.968+10:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409" ``` ### OS Windows ### GPU AMD ### CPU Intel ### Ollama version 0.6.6
GiteaMirror added the bug label 2026-04-22 14:05:44 -05:00
Author
Owner

@jessegross commented on GitHub (Apr 28, 2025):

In general we can't support modified versions of Ollama, however, this looks like #10234, which has an attached patch.

<!-- gh-comment-id:2836204552 --> @jessegross commented on GitHub (Apr 28, 2025): In general we can't support modified versions of Ollama, however, this looks like #10234, which has an attached patch.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32606