[GH-ISSUE #14686] ROCm backend fails to initialize on AMD Radeon AI PRO R9700 (RDNA4, gfx1201) in Windows 11 #71564

Open
opened 2026-05-05 02:08:15 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @aibin8910 on GitHub (Mar 7, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14686

Description

I am using Ollama v0.17.7 on Windows 11 with an AMD Radeon AI PRO R9700 (RDNA4 architecture, gfx1201). The ROCm backend fails to initialize, falling back to CPU, even though the HIP SDK 7.1 is installed and the Vulkan backend works perfectly.

Environment

  • OS: Windows 11 Pro (10.0.26200)
  • Ollama version: 0.17.7 (installed in C:\Ollama)
  • GPU: AMD Radeon AI PRO R9700 (2x, 32GB GDDR6 each) – RDNA4, gfx1201
  • Driver: AMD Software Pro Edition 26.2.2 (driver date 2026/2/17)
  • HIP SDK: 7.1.0 installed (AMD HIP SDK components)
  • ROCm files used: ollama-windows-amd64-rocm.zip (extracted to C:\Ollama\lib\ollama\)

Expected BehaviorOllama should initialize the ROCm backend and utilize the GPU(s) for inference.

Actual BehaviorOllama detects the GPU (description="AMD Radeon AI PRO R9700" compute=gfx1201) but then logs filtering device which didn't fully initialize and falls back to CPU. The inference compute log shows only CPU.

Steps Already Taken

  1. Installed latest AMD driver and HIP SDK 7.1.
  2. Set HIP_VISIBLE_DEVICES=1 and OLLAMA_DEBUG=1.
  3. Replaced ollama.exe with the one from ollama-windows-amd64.zip and copied ROCm libraries from ollama-windows-amd64-rocm.zip to C:\Ollama\lib\ollama\.
  4. Tried HSA_OVERRIDE_GFX_VERSION=12.0.1, 12.0.0, 11.0.0 – all result in the same filtering.
  5. Vulkan backend (OLLAMA_VULKAN=1) works fine, utilizing both GPUs (see logs below).

Relevant Logs (from ollama serve with OLLAMA_DEBUG=1)

time=2026-03-07T09:49:00.849+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\Ollama\lib\ollama\rocm pci_id=0000:07:00.0 library=ROCm
time=2026-03-07T09:49:00.850+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu ...

The Vulkan backend successfully enumerates both GPUs:

ggml_vulkan: Found 3 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
...

AnalysisThe ROCm libraries bundled with Ollama (version 6.x, as indicated by amdhip64_6.dll) do not include precompiled kernels for gfx1201 (RDNA4). The TensileLibrary_* files from the ROCm 6.2 build path (as seen in the user's attached file list) contain kernels only up to gfx1151. Thus, even with HSA_OVERRIDE_GFX_VERSION, the HIP runtime cannot find suitable kernels for this architecture, leading to initialization failure.

RequestPlease update the Windows ROCm support in Ollama to include gfx1201 (RDNA4). This could be achieved by:

  • Bundling a newer version of ROCm (e.g., 7.x) that adds support for RDNA4.
  • Adding the necessary kernel builds for gfx1201 to the existing ROCm 6.x package.

Thank you for your work on Ollama! I'm happy to provide any additional information or testing if needed.


问题描述

我在 Windows 11 上使用 Ollama v0.17.7,显卡是 AMD Radeon AI PRO R9700(RDNA4 架构,gfx1201)。尽管已经安装了 HIP SDK 7.1,并且 Vulkan 后端可以完美运行,但 ROCm 后端始终初始化失败,最终回退到 CPU 运行。

环境信息

  • 操作系统:Windows 11 Pro (10.0.26200)
  • Ollama 版本:0.17.7(安装在 C:\Ollama
  • GPU:AMD Radeon AI PRO R9700(两张,各 32GB GDDR6)—— RDNA4,gfx1201
  • 驱动程序:AMD Software Pro Edition 26.2.2(驱动日期 2026/2/17)
  • HIP SDK:7.1.0(已安装所有组件)
  • 使用的 ROCm 文件:ollama-windows-amd64-rocm.zip(解压至 C:\Ollama\lib\ollama\

预期行为Ollama 应能初始化 ROCm 后端并正常使用 GPU 进行推理。

实际行为Ollama 检测到 GPU(日志显示 description="AMD Radeon AI PRO R9700" compute=gfx1201),但随后出现 filtering device which didn't fully initialize,最终只使用 CPU。推理日志显示 inference compute 仅为 CPU。

已尝试的解决方案

  1. 安装了最新的 AMD 驱动和 HIP SDK 7.1。
  2. 设置 HIP_VISIBLE_DEVICES=1OLLAMA_DEBUG=1
  3. ollama-windows-amd64.zip 中的 ollama.exe 替换原文件,并将 ollama-windows-amd64-rocm.zip 中的所有文件复制到 C:\Ollama\lib\ollama\
  4. 尝试了 HSA_OVERRIDE_GFX_VERSION=12.0.112.0.011.0.0,均得到相同的过滤结果。
  5. 启用 Vulkan 后端(OLLAMA_VULKAN=1)后,两张显卡均被成功识别并用于推理(见下方日志)。

相关日志ollama serve 并设置 OLLAMA_DEBUG=1 时的输出)

time=2026-03-07T09:49:00.849+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\Ollama\lib\ollama\rocm pci_id=0000:07:00.0 library=ROCm
time=2026-03-07T09:49:00.850+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu ...

Vulkan 后端成功枚举两张显卡:

ggml_vulkan: Found 3 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
...

问题分析Ollama 自带的 ROCm 库(版本 6.x,从 amdhip64_6.dll 可以看出)没有为 gfx1201(RDNA4)预编译内核。从用户提供的文件列表中,TensileLibrary_* 文件中仅包含截至 gfx1151 的内核。因此,即使设置了 HSA_OVERRIDE_GFX_VERSION,HIP 运行时也无法找到适合该架构的内核,导致初始化失败。

请求希望官方能在 Windows 版的 ROCm 支持中添加 gfx1201(RDNA4)。具体途径可以是:

  • 打包更新的 ROCm 版本(例如 7.x),其中包含对 RDNA4 的支持。
  • 或在现有的 ROCm 6.x 包中为 gfx1201 添加所需的内核编译。

感谢 Ollama 团队的辛勤工作!如果需要更多信息或测试,请随时告知。

Originally created by @aibin8910 on GitHub (Mar 7, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14686 **Description** I am using Ollama v0.17.7 on Windows 11 with an **AMD Radeon AI PRO R9700** (RDNA4 architecture, gfx1201). The ROCm backend fails to initialize, falling back to CPU, even though the HIP SDK 7.1 is installed and the Vulkan backend works perfectly. **Environment** * OS: Windows 11 Pro (10.0.26200) * Ollama version: 0.17.7 (installed in `C:\Ollama`) * GPU: AMD Radeon AI PRO R9700 (2x, 32GB GDDR6 each) – RDNA4, gfx1201 * Driver: AMD Software Pro Edition 26.2.2 (driver date 2026/2/17) * HIP SDK: 7.1.0 installed (AMD HIP SDK components) * ROCm files used: `ollama-windows-amd64-rocm.zip` (extracted to `C:\Ollama\lib\ollama\`) **Expected Behavior**Ollama should initialize the ROCm backend and utilize the GPU(s) for inference. **Actual Behavior**Ollama detects the GPU (`description="AMD Radeon AI PRO R9700" compute=gfx1201`) but then logs `filtering device which didn't fully initialize` and falls back to CPU. The inference compute log shows only CPU. **Steps Already Taken** 1. Installed latest AMD driver and HIP SDK 7.1. 2. Set `HIP_VISIBLE_DEVICES=1` and `OLLAMA_DEBUG=1`. 3. Replaced `ollama.exe` with the one from `ollama-windows-amd64.zip` and copied ROCm libraries from `ollama-windows-amd64-rocm.zip` to `C:\Ollama\lib\ollama\`. 4. Tried `HSA_OVERRIDE_GFX_VERSION=12.0.1`, `12.0.0`, `11.0.0` – all result in the same filtering. 5. Vulkan backend (`OLLAMA_VULKAN=1`) works fine, utilizing both GPUs (see logs below). **Relevant Logs** (from `ollama serve` with `OLLAMA_DEBUG=1`) time=2026-03-07T09:49:00.849+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\Ollama\lib\ollama\rocm pci_id=0000:07:00.0 library=ROCm time=2026-03-07T09:49:00.850+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu ... The Vulkan backend successfully enumerates both GPUs: ggml_vulkan: Found 3 Vulkan devices: ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat ... **Analysis**The ROCm libraries bundled with Ollama (version 6.x, as indicated by `amdhip64_6.dll`) do not include precompiled kernels for `gfx1201` (RDNA4). The `TensileLibrary_*` files from the ROCm 6.2 build path (as seen in the user's attached file list) contain kernels only up to `gfx1151`. Thus, even with `HSA_OVERRIDE_GFX_VERSION`, the HIP runtime cannot find suitable kernels for this architecture, leading to initialization failure. **Request**Please update the Windows ROCm support in Ollama to include `gfx1201` (RDNA4). This could be achieved by: * Bundling a newer version of ROCm (e.g., 7.x) that adds support for RDNA4. * Adding the necessary kernel builds for `gfx1201` to the existing ROCm 6.x package. Thank you for your work on Ollama! I'm happy to provide any additional information or testing if needed. --- **问题描述** 我在 Windows 11 上使用 Ollama v0.17.7,显卡是 **AMD Radeon AI PRO R9700**(RDNA4 架构,gfx1201)。尽管已经安装了 HIP SDK 7.1,并且 Vulkan 后端可以完美运行,但 ROCm 后端始终初始化失败,最终回退到 CPU 运行。 **环境信息** * 操作系统:Windows 11 Pro (10.0.26200) * Ollama 版本:0.17.7(安装在 `C:\Ollama`) * GPU:AMD Radeon AI PRO R9700(两张,各 32GB GDDR6)—— RDNA4,gfx1201 * 驱动程序:AMD Software Pro Edition 26.2.2(驱动日期 2026/2/17) * HIP SDK:7.1.0(已安装所有组件) * 使用的 ROCm 文件:`ollama-windows-amd64-rocm.zip`(解压至 `C:\Ollama\lib\ollama\`) **预期行为**Ollama 应能初始化 ROCm 后端并正常使用 GPU 进行推理。 **实际行为**Ollama 检测到 GPU(日志显示 `description="AMD Radeon AI PRO R9700" compute=gfx1201`),但随后出现 `filtering device which didn't fully initialize`,最终只使用 CPU。推理日志显示 `inference compute` 仅为 CPU。 **已尝试的解决方案** 1. 安装了最新的 AMD 驱动和 HIP SDK 7.1。 2. 设置 `HIP_VISIBLE_DEVICES=1` 和 `OLLAMA_DEBUG=1`。 3. 用 `ollama-windows-amd64.zip` 中的 `ollama.exe` 替换原文件,并将 `ollama-windows-amd64-rocm.zip` 中的所有文件复制到 `C:\Ollama\lib\ollama\`。 4. 尝试了 `HSA_OVERRIDE_GFX_VERSION=12.0.1`、`12.0.0`、`11.0.0`,均得到相同的过滤结果。 5. 启用 Vulkan 后端(`OLLAMA_VULKAN=1`)后,两张显卡均被成功识别并用于推理(见下方日志)。 **相关日志**(`ollama serve` 并设置 `OLLAMA_DEBUG=1` 时的输出) time=2026-03-07T09:49:00.849+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\Ollama\lib\ollama\rocm pci_id=0000:07:00.0 library=ROCm time=2026-03-07T09:49:00.850+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu ... Vulkan 后端成功枚举两张显卡: ggml_vulkan: Found 3 Vulkan devices: ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat ... **问题分析**Ollama 自带的 ROCm 库(版本 6.x,从 `amdhip64_6.dll` 可以看出)没有为 `gfx1201`(RDNA4)预编译内核。从用户提供的文件列表中,`TensileLibrary_*` 文件中仅包含截至 `gfx1151` 的内核。因此,即使设置了 `HSA_OVERRIDE_GFX_VERSION`,HIP 运行时也无法找到适合该架构的内核,导致初始化失败。 **请求**希望官方能在 Windows 版的 ROCm 支持中添加 `gfx1201`(RDNA4)。具体途径可以是: * 打包更新的 ROCm 版本(例如 7.x),其中包含对 RDNA4 的支持。 * 或在现有的 ROCm 6.x 包中为 `gfx1201` 添加所需的内核编译。 感谢 Ollama 团队的辛勤工作!如果需要更多信息或测试,请随时告知。
GiteaMirror added the feature request label 2026-05-05 02:08:15 -05:00
Author
Owner

@trtr6842-git commented on GitHub (Mar 7, 2026):

This might be a little old:
https://github.com/ollama/ollama/issues/10430#issuecomment-3707294717

But since then I've added a R9700, and it works just fine:

Image
C:\Users\ttyle>ollama -v
ollama version is 0.13.5

C:\Users\ttyle>ollama ps
NAME    ID    SIZE    PROCESSOR    CONTEXT    UNTIL

C:\Users\ttyle>ollama ps
NAME                 ID              SIZE     PROCESSOR    CONTEXT    UNTIL
qwen2.5-coder:14b    9ec8897f747e    17 GB    100% GPU     262144     4 minutes from now

C:\Users\ttyle>
Image

Did you add the ROCBLAS_TENSILE_LIBPATH environment variable pointing to ...\AMD\ROCm\6.4\bin\rocblas\library?

<!-- gh-comment-id:4016005280 --> @trtr6842-git commented on GitHub (Mar 7, 2026): This might be a little old: https://github.com/ollama/ollama/issues/10430#issuecomment-3707294717 But since then I've added a R9700, and it works just fine: <img width="839" height="518" alt="Image" src="https://github.com/user-attachments/assets/c4058937-0fa2-47c6-8d87-934dee9553a6" /> ``` C:\Users\ttyle>ollama -v ollama version is 0.13.5 C:\Users\ttyle>ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL C:\Users\ttyle>ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL qwen2.5-coder:14b 9ec8897f747e 17 GB 100% GPU 262144 4 minutes from now C:\Users\ttyle> ``` <img width="961" height="651" alt="Image" src="https://github.com/user-attachments/assets/d74c6120-9719-46e7-9402-0164662523ce" /> Did you add the `ROCBLAS_TENSILE_LIBPATH` environment variable pointing to `...\AMD\ROCm\6.4\bin\rocblas\library`?
Author
Owner

@Jasdfgh commented on GitHub (Mar 11, 2026):

based on what trtr6842-git mentioned: might be worth trying a clean ollama install (without the manual rocm zip replacement) + setting ROCBLAS_TENSILE_LIBPATH to your HIP SDK 7.1's rocblas library path. the manual file extraction could be what's conflicting.

your dual R9700 setup might hit a known issue with dual AMD GPUs on Windows. if things still don't work after the above, testing with HIP_VISIBLE_DEVICES=0 could help isolate that.

<!-- gh-comment-id:4037435117 --> @Jasdfgh commented on GitHub (Mar 11, 2026): based on what trtr6842-git mentioned: might be worth trying a clean ollama install (without the manual rocm zip replacement) + setting ROCBLAS_TENSILE_LIBPATH to your HIP SDK 7.1's rocblas library path. the manual file extraction could be what's conflicting. your dual R9700 setup might hit a [known issue with dual AMD GPUs on Windows](https://github.com/ollama/ollama/pull/10676). if things still don't work after the above, testing with HIP_VISIBLE_DEVICES=0 could help isolate that.
Author
Owner

@aibin8910 commented on GitHub (Mar 11, 2026):

After adding ROCBLAS_TENSILE_LIBPATH, a new issue arose. When ollama loaded the qwen3.5-35B model, it encountered a 500 error. However, after switching to Vulkan, everything functioned normally. Currently, the HIP_VISIBLE_DEVICES=0,1 parameter is set

<!-- gh-comment-id:4037494137 --> @aibin8910 commented on GitHub (Mar 11, 2026): After adding ROCBLAS_TENSILE_LIBPATH, a new issue arose. When ollama loaded the qwen3.5-35B model, it encountered a 500 error. However, after switching to Vulkan, everything functioned normally. Currently, the HIP_VISIBLE_DEVICES=0,1 parameter is set
Author
Owner

@Jasdfgh commented on GitHub (Mar 15, 2026):

nice, so ROCBLAS_TENSILE_LIBPATH got ROCm initializing — that's progress. the 500 on qwen3.5-35B specifically is interesting. could you share the ollama server log around the time of the 500? would help narrow down if it's a HIP kernel error (qwen 3.5 has a known dispatch overhead issue on ROCm, tracked at https://github.com/ggml-org/llama.cpp/issues/18823) or something else like OOM on the dual GPU split.

<!-- gh-comment-id:4063211764 --> @Jasdfgh commented on GitHub (Mar 15, 2026): nice, so ROCBLAS_TENSILE_LIBPATH got ROCm initializing — that's progress. the 500 on qwen3.5-35B specifically is interesting. could you share the ollama server log around the time of the 500? would help narrow down if it's a HIP kernel error (qwen 3.5 has a known dispatch overhead issue on ROCm, tracked at https://github.com/ggml-org/llama.cpp/issues/18823) or something else like OOM on the dual GPU split.
Author
Owner

@aibin8910 commented on GitHub (Mar 16, 2026):

time=2026-03-16T20:43:35.626+08:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:D:\\ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-03-16T20:43:35.633+08:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false"
time=2026-03-16T20:43:35.634+08:00 level=INFO source=images.go:477 msg="total blobs: 17"
time=2026-03-16T20:43:35.634+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-16T20:43:35.635+08:00 level=INFO source=routes.go:1782 msg="Listening on [::]:11434 (version 0.18.0)"
time=2026-03-16T20:43:35.635+08:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-03-16T20:43:35.636+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-16T20:43:35.648+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 62132"
time=2026-03-16T20:43:35.648+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v13;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\cuda_v13
time=2026-03-16T20:43:35.762+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=124.3364ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[]
time=2026-03-16T20:43:35.763+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 62140"
time=2026-03-16T20:43:35.763+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm
time=2026-03-16T20:43:37.754+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=1.9924009s OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[]
time=2026-03-16T20:43:37.754+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-03-16T20:43:37.755+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55694"
time=2026-03-16T20:43:37.755+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v12;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\cuda_v12
time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=68.3443ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=map[]
time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=3
time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon(TM) Graphics" compute=gfx1036 id=0 pci_id=0000:7e:00.0
time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon AI PRO R9700" compute=gfx1201 id=1 pci_id=0000:03:00.0
time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon AI PRO R9700" compute=gfx1201 id=2 pci_id=0000:07:00.0
time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55701"
time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55699"
time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=2 GGML_CUDA_INIT=1
time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55700"
time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1 GGML_CUDA_INIT=1
time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0 GGML_CUDA_INIT=1
time=2026-03-16T20:43:38.067+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=243.8174ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:0]"
time=2026-03-16T20:43:38.067+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\CustomApp\Ollama\lib\ollama\rocm pci_id=0000:7e:00.0 library=ROCm
time=2026-03-16T20:43:38.459+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=636.0776ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:1]"
time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=651.603ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:2]"
time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=1 new_ID=0
time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=2 new_ID=1
time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=2.8395903s
time=2026-03-16T20:43:38.475+08:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=1 library=ROCm compute=gfx1201 name=ROCm1 description="AMD Radeon AI PRO R9700" libdirs=ollama,rocm driver=60551.38 pci_id=0000:03:00.0 type=discrete total="31.9 GiB" available="31.9 GiB"
time=2026-03-16T20:43:38.475+08:00 level=INFO source=types.go:42 msg="inference compute" id=1 filter_id=2 library=ROCm compute=gfx1201 name=ROCm2 description="AMD Radeon AI PRO R9700" libdirs=ollama,rocm driver=60551.38 pci_id=0000:07:00.0 type=discrete total="31.9 GiB" available="31.9 GiB"
time=2026-03-16T20:43:38.475+08:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="63.7 GiB" default_num_ctx=262144
[GIN] 2026/03/16 - 20:43:38 | 200 |       530.7µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/16 - 20:43:38 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/16 - 20:43:38 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/16 - 20:43:38 | 200 |      2.1113ms |       127.0.0.1 | GET      "/api/tags"
time=2026-03-16T20:43:38.572+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
[GIN] 2026/03/16 - 20:43:38 | 200 |    105.8152ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/03/16 - 20:43:39 | 200 |    823.5583ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/03/16 - 20:43:39 | 200 |    823.5583ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/03/16 - 20:43:43 | 200 |       1.023ms |       127.0.0.1 | GET      "/api/tags"
time=2026-03-16T20:43:43.778+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
[GIN] 2026/03/16 - 20:43:43 | 200 |    103.1114ms |       127.0.0.1 | POST     "/api/show"
time=2026-03-16T20:43:43.872+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
[GIN] 2026/03/16 - 20:43:43 | 200 |     91.9583ms |       127.0.0.1 | POST     "/api/show"
time=2026-03-16T20:43:43.963+08:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
time=2026-03-16T20:43:43.963+08:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-03-16T20:43:43.967+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 59552"
time=2026-03-16T20:43:43.967+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1,2
time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=1.7939987s OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[HIP_VISIBLE_DEVICES:1,2]
time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=1.7939987s
time=2026-03-16T20:43:45.757+08:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-16T20:43:45.757+08:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32
time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2
time=2026-03-16T20:43:45.772+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-16T20:43:45.774+08:00 level=DEBUG source=sched.go:256 msg="loading first model" model=D:\ollama\models\blobs\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.pooling_type default=0
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.head_count_kv default=0
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.type default=""
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.type default=""
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.factor default=1
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.original_context_length default=0
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.scale default=0
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.norm_top_k_prob default=true
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.mrope_interleaved default=false
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.rope.freq_base default=10000
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.num_positional_embeddings default=2304
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-16T20:43:45.818+08:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-03-16T20:43:45.818+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --model D:\\ollama\\models\\blobs\\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a --port 59558"
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1,2
time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:489 msg="system memory" total="125.6 GiB" free="111.7 GiB" free_swap="109.2 GiB"
time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=0 library=ROCm available="31.4 GiB" free="31.9 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=1 library=ROCm available="31.4 GiB" free="31.9 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-16T20:43:45.827+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1
time=2026-03-16T20:43:45.857+08:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-16T20:43:45.865+08:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:59558"
time=2026-03-16T20:43:45.870+08:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:41[ID:0 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-16T20:43:45.893+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-16T20:43:45.896+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=57
time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\CustomApp\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\CustomApp\Ollama\lib\ollama\ggml-cpu-icelake.dll
time=2026-03-16T20:43:45.908+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\CustomApp\Ollama\lib\ollama\rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, ID: 0
  Device 1: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, ID: 1
load_backend: loaded ROCm backend from C:\CustomApp\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2026-03-16T20:43:45.942+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 ROCm.1.NO_VMM=1 ROCm.1.NO_PEER_COPY=1 ROCm.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.pooling_type default=0
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.head_count_kv default=0
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.type default=""
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.type default=""
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.factor default=1
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.original_context_length default=0
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.scale default=0
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.norm_top_k_prob default=true
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.mrope_interleaved default=false
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.rope.freq_base default=10000
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.num_positional_embeddings default=2304
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-16T20:43:46.212+08:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1
rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98
ggml_cuda_compute_forward: SOLVE_TRI failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2882
  err
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
time=2026-03-16T20:43:47.974+08:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:59558/load\": read tcp 127.0.0.1:59563->127.0.0.1:59558: wsarecv: An existing connection was forcibly closed by the remote host."
time=2026-03-16T20:43:47.974+08:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:59558/load\": dial tcp 127.0.0.1:59558: connectex: No connection could be made because the target machine actively refused it."
time=2026-03-16T20:43:47.974+08:00 level=INFO source=sched.go:516 msg="Load failed" model=D:\ollama\models\blobs\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
time=2026-03-16T20:43:47.974+08:00 level=DEBUG source=server.go:1830 msg="stopping llama server" pid=4728
[GIN] 2026/03/16 - 20:43:47 | 500 |    4.0893504s |       127.0.0.1 | POST     "/api/chat"
time=2026-03-16T20:43:48.022+08:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 1"
[GIN] 2026/03/16 - 20:44:13 | 200 |      1.0248ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/03/16 - 20:44:43 | 200 |      1.0593ms |       127.0.0.1 | GET      "/api/tags"

上面是详细的500错误的server.log

Image
<!-- gh-comment-id:4067404608 --> @aibin8910 commented on GitHub (Mar 16, 2026): ``` time=2026-03-16T20:43:35.626+08:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:D:\\ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" time=2026-03-16T20:43:35.633+08:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false" time=2026-03-16T20:43:35.634+08:00 level=INFO source=images.go:477 msg="total blobs: 17" time=2026-03-16T20:43:35.634+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-16T20:43:35.635+08:00 level=INFO source=routes.go:1782 msg="Listening on [::]:11434 (version 0.18.0)" time=2026-03-16T20:43:35.635+08:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler" time=2026-03-16T20:43:35.636+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-16T20:43:35.648+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 62132" time=2026-03-16T20:43:35.648+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v13;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\cuda_v13 time=2026-03-16T20:43:35.762+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=124.3364ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[] time=2026-03-16T20:43:35.763+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 62140" time=2026-03-16T20:43:35.763+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm time=2026-03-16T20:43:37.754+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=1.9924009s OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[] time=2026-03-16T20:43:37.754+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-16T20:43:37.755+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55694" time=2026-03-16T20:43:37.755+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v12;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\cuda_v12 time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=68.3443ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=map[] time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=3 time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon(TM) Graphics" compute=gfx1036 id=0 pci_id=0000:7e:00.0 time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon AI PRO R9700" compute=gfx1201 id=1 pci_id=0000:03:00.0 time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon AI PRO R9700" compute=gfx1201 id=2 pci_id=0000:07:00.0 time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55701" time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55699" time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=2 GGML_CUDA_INIT=1 time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55700" time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1 GGML_CUDA_INIT=1 time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0 GGML_CUDA_INIT=1 time=2026-03-16T20:43:38.067+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=243.8174ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:0]" time=2026-03-16T20:43:38.067+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\CustomApp\Ollama\lib\ollama\rocm pci_id=0000:7e:00.0 library=ROCm time=2026-03-16T20:43:38.459+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=636.0776ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:1]" time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=651.603ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:2]" time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=1 new_ID=0 time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=2 new_ID=1 time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=2.8395903s time=2026-03-16T20:43:38.475+08:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=1 library=ROCm compute=gfx1201 name=ROCm1 description="AMD Radeon AI PRO R9700" libdirs=ollama,rocm driver=60551.38 pci_id=0000:03:00.0 type=discrete total="31.9 GiB" available="31.9 GiB" time=2026-03-16T20:43:38.475+08:00 level=INFO source=types.go:42 msg="inference compute" id=1 filter_id=2 library=ROCm compute=gfx1201 name=ROCm2 description="AMD Radeon AI PRO R9700" libdirs=ollama,rocm driver=60551.38 pci_id=0000:07:00.0 type=discrete total="31.9 GiB" available="31.9 GiB" time=2026-03-16T20:43:38.475+08:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="63.7 GiB" default_num_ctx=262144 [GIN] 2026/03/16 - 20:43:38 | 200 | 530.7µs | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/16 - 20:43:38 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/16 - 20:43:38 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/16 - 20:43:38 | 200 | 2.1113ms | 127.0.0.1 | GET "/api/tags" time=2026-03-16T20:43:38.572+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 [GIN] 2026/03/16 - 20:43:38 | 200 | 105.8152ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/03/16 - 20:43:39 | 200 | 823.5583ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/03/16 - 20:43:39 | 200 | 823.5583ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/03/16 - 20:43:43 | 200 | 1.023ms | 127.0.0.1 | GET "/api/tags" time=2026-03-16T20:43:43.778+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 [GIN] 2026/03/16 - 20:43:43 | 200 | 103.1114ms | 127.0.0.1 | POST "/api/show" time=2026-03-16T20:43:43.872+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 [GIN] 2026/03/16 - 20:43:43 | 200 | 91.9583ms | 127.0.0.1 | POST "/api/show" time=2026-03-16T20:43:43.963+08:00 level=DEBUG source=runner.go:264 msg="refreshing free memory" time=2026-03-16T20:43:43.963+08:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" time=2026-03-16T20:43:43.967+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 59552" time=2026-03-16T20:43:43.967+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1,2 time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=1.7939987s OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[HIP_VISIBLE_DEVICES:1,2] time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=1.7939987s time=2026-03-16T20:43:45.757+08:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-03-16T20:43:45.757+08:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32 time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2 time=2026-03-16T20:43:45.772+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-16T20:43:45.774+08:00 level=DEBUG source=sched.go:256 msg="loading first model" model=D:\ollama\models\blobs\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.pooling_type default=0 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.head_count_kv default=0 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.type default="" time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.type default="" time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.factor default=1 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.original_context_length default=0 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.scale default=0 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.norm_top_k_prob default=true time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.mrope_interleaved default=false time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.rope.freq_base default=10000 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.num_positional_embeddings default=2304 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-16T20:43:45.818+08:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-03-16T20:43:45.818+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --model D:\\ollama\\models\\blobs\\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a --port 59558" time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1,2 time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:489 msg="system memory" total="125.6 GiB" free="111.7 GiB" free_swap="109.2 GiB" time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=0 library=ROCm available="31.4 GiB" free="31.9 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=1 library=ROCm available="31.4 GiB" free="31.9 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-16T20:43:45.827+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1 time=2026-03-16T20:43:45.857+08:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-16T20:43:45.865+08:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:59558" time=2026-03-16T20:43:45.870+08:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:41[ID:0 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-16T20:43:45.893+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-16T20:43:45.896+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=57 time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\CustomApp\Ollama\lib\ollama load_backend: loaded CPU backend from C:\CustomApp\Ollama\lib\ollama\ggml-cpu-icelake.dll time=2026-03-16T20:43:45.908+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\CustomApp\Ollama\lib\ollama\rocm ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 ROCm devices: Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, ID: 0 Device 1: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, ID: 1 load_backend: loaded ROCm backend from C:\CustomApp\Ollama\lib\ollama\rocm\ggml-hip.dll time=2026-03-16T20:43:45.942+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 ROCm.1.NO_VMM=1 ROCm.1.NO_PEER_COPY=1 ROCm.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.pooling_type default=0 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.head_count_kv default=0 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.type default="" time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.type default="" time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.factor default=1 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.original_context_length default=0 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.scale default=0 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.norm_top_k_prob default=true time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.mrope_interleaved default=false time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.rope.freq_base default=10000 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.num_positional_embeddings default=2304 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-16T20:43:46.212+08:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1 rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98 ggml_cuda_compute_forward: SOLVE_TRI failed ROCm error: invalid device function current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2882 err C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error time=2026-03-16T20:43:47.974+08:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:59558/load\": read tcp 127.0.0.1:59563->127.0.0.1:59558: wsarecv: An existing connection was forcibly closed by the remote host." time=2026-03-16T20:43:47.974+08:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:59558/load\": dial tcp 127.0.0.1:59558: connectex: No connection could be made because the target machine actively refused it." time=2026-03-16T20:43:47.974+08:00 level=INFO source=sched.go:516 msg="Load failed" model=D:\ollama\models\blobs\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" time=2026-03-16T20:43:47.974+08:00 level=DEBUG source=server.go:1830 msg="stopping llama server" pid=4728 [GIN] 2026/03/16 - 20:43:47 | 500 | 4.0893504s | 127.0.0.1 | POST "/api/chat" time=2026-03-16T20:43:48.022+08:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 1" [GIN] 2026/03/16 - 20:44:13 | 200 | 1.0248ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/16 - 20:44:43 | 200 | 1.0593ms | 127.0.0.1 | GET "/api/tags" ``` 上面是详细的500错误的server.log <img width="996" height="782" alt="Image" src="https://github.com/user-attachments/assets/cbda7542-d2d5-4f49-9ce0-3126b1d00d97" />
Author
Owner

@aibin8910 commented on GitHub (Mar 16, 2026):

我现在已经配置了ROCBLAS_TENSILE_LIBPATH,关闭了Vulkan,模型换成了glm-4.7-flash,已经可以运行了;就是qwen3.5系列模型都无法运行。只能够使用Vulkan去跑qwen3.5系列模型。

<!-- gh-comment-id:4067932361 --> @aibin8910 commented on GitHub (Mar 16, 2026): 我现在已经配置了ROCBLAS_TENSILE_LIBPATH,关闭了Vulkan,模型换成了glm-4.7-flash,已经可以运行了;就是qwen3.5系列模型都无法运行。只能够使用Vulkan去跑qwen3.5系列模型。
Author
Owner

@Jasdfgh commented on GitHub (Mar 18, 2026):

Qwen3.5 系列用了 DeltaNet 架构,会调 rocBLAS 的三角求解,gfx1201 上这个操作好像目前没有正确的 kernel。
GLM-4.7 是标准 transformer 不走这个路径所以没问题。

同款 GPU 同样的问题在 #14423 也有好几个人确认了。目前我认为 Vulkan 是最靠谱的绕过方案。

<!-- gh-comment-id:4079842494 --> @Jasdfgh commented on GitHub (Mar 18, 2026): Qwen3.5 系列用了 DeltaNet 架构,会调 rocBLAS 的三角求解,gfx1201 上这个操作好像目前没有正确的 kernel。 GLM-4.7 是标准 transformer 不走这个路径所以没问题。 同款 GPU 同样的问题在 #14423 也有好几个人确认了。目前我认为 Vulkan 是最靠谱的绕过方案。
Author
Owner

@aibin8910 commented on GitHub (Mar 20, 2026):

最新的情况发现,deepseek-r1的模型都无法使用GPU,只能够使用CPU计算,token输出速度deepseek-r1:70b只有2token/s,deepseek-r1:32b只有7token/s,无论是使用ROCm还是Vulkan,都无法使用GPU,也配置了HIP_VISIBLE_DEVICES=0.1;OLLAMA_NUM_GPU=2,都无法让GPU运行。

<!-- gh-comment-id:4096125550 --> @aibin8910 commented on GitHub (Mar 20, 2026): 最新的情况发现,deepseek-r1的模型都无法使用GPU,只能够使用CPU计算,token输出速度deepseek-r1:70b只有2token/s,deepseek-r1:32b只有7token/s,无论是使用ROCm还是Vulkan,都无法使用GPU,也配置了HIP_VISIBLE_DEVICES=0.1;OLLAMA_NUM_GPU=2,都无法让GPU运行。
Author
Owner

@slojosic-amd commented on GitHub (Mar 23, 2026):

FYI: https://github.com/ollama/ollama/pull/14979

<!-- gh-comment-id:4110149758 --> @slojosic-amd commented on GitHub (Mar 23, 2026): FYI: https://github.com/ollama/ollama/pull/14979
Author
Owner

@xiaoxihooo-source commented on GitHub (Mar 29, 2026):

最新的情况发现,deepseek-r1的模型都无法使用GPU,只能够使用CPU计算,token输出速度deepseek-r1:70b只有2token/s,deepseek-r1:32b只有7token/s,无论是使用ROCm还是Vulkan,都无法使用GPU,也配置了HIP_VISIBLE_DEVICES=0.1;OLLAMA_NUM_GPU=2,都无法让GPU运行。

ollama他们动作太慢了,我已经转用lm studio 了,虽然也不完美,但至少可以用

<!-- gh-comment-id:4149062026 --> @xiaoxihooo-source commented on GitHub (Mar 29, 2026): > 最新的情况发现,deepseek-r1的模型都无法使用GPU,只能够使用CPU计算,token输出速度deepseek-r1:70b只有2token/s,deepseek-r1:32b只有7token/s,无论是使用ROCm还是Vulkan,都无法使用GPU,也配置了HIP_VISIBLE_DEVICES=0.1;OLLAMA_NUM_GPU=2,都无法让GPU运行。 ollama他们动作太慢了,我已经转用lm studio 了,虽然也不完美,但至少可以用
Author
Owner

@aibin8910 commented on GitHub (Mar 31, 2026):

我也已经转用LM Studio了,虽然经常内存暴涨,至少能用起来了。

<!-- gh-comment-id:4161288052 --> @aibin8910 commented on GitHub (Mar 31, 2026): 我也已经转用LM Studio了,虽然经常内存暴涨,至少能用起来了。
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71564