[GH-ISSUE #14686] ROCm backend fails to initialize on AMD Radeon AI PRO R9700 (RDNA4, gfx1201) in Windows 11 #9504

New Issue

GiteaMirror · 2026-04-12T22:25:39-05:00

GiteaMirror commented

2026-04-12 22:25:39 -05:00

Originally created by @aibin8910 on GitHub (Mar 7, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14686

Description

I am using Ollama v0.17.7 on Windows 11 with an AMD Radeon AI PRO R9700 (RDNA4 architecture, gfx1201). The ROCm backend fails to initialize, falling back to CPU, even though the HIP SDK 7.1 is installed and the Vulkan backend works perfectly.

Environment

OS: Windows 11 Pro (10.0.26200)
Ollama version: 0.17.7 (installed in C:\Ollama)
GPU: AMD Radeon AI PRO R9700 (2x, 32GB GDDR6 each) – RDNA4, gfx1201
Driver: AMD Software Pro Edition 26.2.2 (driver date 2026/2/17)
HIP SDK: 7.1.0 installed (AMD HIP SDK components)
ROCm files used: ollama-windows-amd64-rocm.zip (extracted to C:\Ollama\lib\ollama\)

Expected BehaviorOllama should initialize the ROCm backend and utilize the GPU(s) for inference.

Actual BehaviorOllama detects the GPU (description="AMD Radeon AI PRO R9700" compute=gfx1201) but then logs filtering device which didn't fully initialize and falls back to CPU. The inference compute log shows only CPU.

Steps Already Taken

Installed latest AMD driver and HIP SDK 7.1.
Set HIP_VISIBLE_DEVICES=1 and OLLAMA_DEBUG=1.
Replaced ollama.exe with the one from ollama-windows-amd64.zip and copied ROCm libraries from ollama-windows-amd64-rocm.zip to C:\Ollama\lib\ollama\.
Tried HSA_OVERRIDE_GFX_VERSION=12.0.1, 12.0.0, 11.0.0 – all result in the same filtering.
Vulkan backend (OLLAMA_VULKAN=1) works fine, utilizing both GPUs (see logs below).

Relevant Logs (from ollama serve with OLLAMA_DEBUG=1)

time=2026-03-07T09:49:00.849+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\Ollama\lib\ollama\rocm pci_id=0000:07:00.0 library=ROCm
time=2026-03-07T09:49:00.850+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu ...

The Vulkan backend successfully enumerates both GPUs:

ggml_vulkan: Found 3 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
...

AnalysisThe ROCm libraries bundled with Ollama (version 6.x, as indicated by amdhip64_6.dll) do not include precompiled kernels for gfx1201 (RDNA4). The TensileLibrary_* files from the ROCm 6.2 build path (as seen in the user's attached file list) contain kernels only up to gfx1151. Thus, even with HSA_OVERRIDE_GFX_VERSION, the HIP runtime cannot find suitable kernels for this architecture, leading to initialization failure.

RequestPlease update the Windows ROCm support in Ollama to include gfx1201 (RDNA4). This could be achieved by:

Bundling a newer version of ROCm (e.g., 7.x) that adds support for RDNA4.
Adding the necessary kernel builds for gfx1201 to the existing ROCm 6.x package.

Thank you for your work on Ollama! I'm happy to provide any additional information or testing if needed.

问题描述

我在 Windows 11 上使用 Ollama v0.17.7，显卡是 AMD Radeon AI PRO R9700（RDNA4 架构，gfx1201）。尽管已经安装了 HIP SDK 7.1，并且 Vulkan 后端可以完美运行，但 ROCm 后端始终初始化失败，最终回退到 CPU 运行。

环境信息

操作系统：Windows 11 Pro (10.0.26200)
Ollama 版本：0.17.7（安装在 C:\Ollama）
GPU：AMD Radeon AI PRO R9700（两张，各 32GB GDDR6）—— RDNA4，gfx1201
驱动程序：AMD Software Pro Edition 26.2.2（驱动日期 2026/2/17）
HIP SDK：7.1.0（已安装所有组件）
使用的 ROCm 文件：ollama-windows-amd64-rocm.zip（解压至 C:\Ollama\lib\ollama\）

预期行为Ollama 应能初始化 ROCm 后端并正常使用 GPU 进行推理。

实际行为Ollama 检测到 GPU（日志显示 description="AMD Radeon AI PRO R9700" compute=gfx1201），但随后出现 filtering device which didn't fully initialize，最终只使用 CPU。推理日志显示 inference compute 仅为 CPU。

已尝试的解决方案

安装了最新的 AMD 驱动和 HIP SDK 7.1。
设置 HIP_VISIBLE_DEVICES=1 和 OLLAMA_DEBUG=1。
用 ollama-windows-amd64.zip 中的 ollama.exe 替换原文件，并将 ollama-windows-amd64-rocm.zip 中的所有文件复制到 C:\Ollama\lib\ollama\。
尝试了 HSA_OVERRIDE_GFX_VERSION=12.0.1、12.0.0、11.0.0，均得到相同的过滤结果。
启用 Vulkan 后端（OLLAMA_VULKAN=1）后，两张显卡均被成功识别并用于推理（见下方日志）。

相关日志（ollama serve 并设置 OLLAMA_DEBUG=1 时的输出）

time=2026-03-07T09:49:00.849+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\Ollama\lib\ollama\rocm pci_id=0000:07:00.0 library=ROCm
time=2026-03-07T09:49:00.850+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu ...

Vulkan 后端成功枚举两张显卡：

ggml_vulkan: Found 3 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
...

问题分析Ollama 自带的 ROCm 库（版本 6.x，从 amdhip64_6.dll 可以看出）没有为 gfx1201（RDNA4）预编译内核。从用户提供的文件列表中，TensileLibrary_* 文件中仅包含截至 gfx1151 的内核。因此，即使设置了 HSA_OVERRIDE_GFX_VERSION，HIP 运行时也无法找到适合该架构的内核，导致初始化失败。

请求希望官方能在 Windows 版的 ROCm 支持中添加 gfx1201（RDNA4）。具体途径可以是：

打包更新的 ROCm 版本（例如 7.x），其中包含对 RDNA4 的支持。
或在现有的 ROCm 6.x 包中为 gfx1201 添加所需的内核编译。

感谢 Ollama 团队的辛勤工作！如果需要更多信息或测试，请随时告知。

Originally created by @aibin8910 on GitHub (Mar 7, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14686 **Description** I am using Ollama v0.17.7 on Windows 11 with an **AMD Radeon AI PRO R9700** (RDNA4 architecture, gfx1201). The ROCm backend fails to initialize, falling back to CPU, even though the HIP SDK 7.1 is installed and the Vulkan backend works perfectly. **Environment** * OS: Windows 11 Pro (10.0.26200) * Ollama version: 0.17.7 (installed in `C:\Ollama`) * GPU: AMD Radeon AI PRO R9700 (2x, 32GB GDDR6 each) – RDNA4, gfx1201 * Driver: AMD Software Pro Edition 26.2.2 (driver date 2026/2/17) * HIP SDK: 7.1.0 installed (AMD HIP SDK components) * ROCm files used: `ollama-windows-amd64-rocm.zip` (extracted to `C:\Ollama\lib\ollama\`) **Expected Behavior**Ollama should initialize the ROCm backend and utilize the GPU(s) for inference. **Actual Behavior**Ollama detects the GPU (`description="AMD Radeon AI PRO R9700" compute=gfx1201`) but then logs `filtering device which didn't fully initialize` and falls back to CPU. The inference compute log shows only CPU. **Steps Already Taken** 1. Installed latest AMD driver and HIP SDK 7.1. 2. Set `HIP_VISIBLE_DEVICES=1` and `OLLAMA_DEBUG=1`. 3. Replaced `ollama.exe` with the one from `ollama-windows-amd64.zip` and copied ROCm libraries from `ollama-windows-amd64-rocm.zip` to `C:\Ollama\lib\ollama\`. 4. Tried `HSA_OVERRIDE_GFX_VERSION=12.0.1`, `12.0.0`, `11.0.0` – all result in the same filtering. 5. Vulkan backend (`OLLAMA_VULKAN=1`) works fine, utilizing both GPUs (see logs below). **Relevant Logs** (from `ollama serve` with `OLLAMA_DEBUG=1`) time=2026-03-07T09:49:00.849+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\Ollama\lib\ollama\rocm pci_id=0000:07:00.0 library=ROCm time=2026-03-07T09:49:00.850+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu ... The Vulkan backend successfully enumerates both GPUs: ggml_vulkan: Found 3 Vulkan devices: ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat ... **Analysis**The ROCm libraries bundled with Ollama (version 6.x, as indicated by `amdhip64_6.dll`) do not include precompiled kernels for `gfx1201` (RDNA4). The `TensileLibrary_*` files from the ROCm 6.2 build path (as seen in the user's attached file list) contain kernels only up to `gfx1151`. Thus, even with `HSA_OVERRIDE_GFX_VERSION`, the HIP runtime cannot find suitable kernels for this architecture, leading to initialization failure. **Request**Please update the Windows ROCm support in Ollama to include `gfx1201` (RDNA4). This could be achieved by: * Bundling a newer version of ROCm (e.g., 7.x) that adds support for RDNA4. * Adding the necessary kernel builds for `gfx1201` to the existing ROCm 6.x package. Thank you for your work on Ollama! I'm happy to provide any additional information or testing if needed. --- **问题描述** 我在 Windows 11 上使用 Ollama v0.17.7，显卡是 **AMD Radeon AI PRO R9700**（RDNA4 架构，gfx1201）。尽管已经安装了 HIP SDK 7.1，并且 Vulkan 后端可以完美运行，但 ROCm 后端始终初始化失败，最终回退到 CPU 运行。 **环境信息** * 操作系统：Windows 11 Pro (10.0.26200) * Ollama 版本：0.17.7（安装在 `C:\Ollama`） * GPU：AMD Radeon AI PRO R9700（两张，各 32GB GDDR6）—— RDNA4，gfx1201 * 驱动程序：AMD Software Pro Edition 26.2.2（驱动日期 2026/2/17） * HIP SDK：7.1.0（已安装所有组件） * 使用的 ROCm 文件：`ollama-windows-amd64-rocm.zip`（解压至 `C:\Ollama\lib\ollama\`） **预期行为**Ollama 应能初始化 ROCm 后端并正常使用 GPU 进行推理。 **实际行为**Ollama 检测到 GPU（日志显示 `description="AMD Radeon AI PRO R9700" compute=gfx1201`），但随后出现 `filtering device which didn't fully initialize`，最终只使用 CPU。推理日志显示 `inference compute` 仅为 CPU。 **已尝试的解决方案** 1. 安装了最新的 AMD 驱动和 HIP SDK 7.1。 2. 设置 `HIP_VISIBLE_DEVICES=1` 和 `OLLAMA_DEBUG=1`。 3. 用 `ollama-windows-amd64.zip` 中的 `ollama.exe` 替换原文件，并将 `ollama-windows-amd64-rocm.zip` 中的所有文件复制到 `C:\Ollama\lib\ollama\`。 4. 尝试了 `HSA_OVERRIDE_GFX_VERSION=12.0.1`、`12.0.0`、`11.0.0`，均得到相同的过滤结果。 5. 启用 Vulkan 后端（`OLLAMA_VULKAN=1`）后，两张显卡均被成功识别并用于推理（见下方日志）。 **相关日志**（`ollama serve` 并设置 `OLLAMA_DEBUG=1` 时的输出） time=2026-03-07T09:49:00.849+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\Ollama\lib\ollama\rocm pci_id=0000:07:00.0 library=ROCm time=2026-03-07T09:49:00.850+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu ... Vulkan 后端成功枚举两张显卡： ggml_vulkan: Found 3 Vulkan devices: ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat ... **问题分析**Ollama 自带的 ROCm 库（版本 6.x，从 `amdhip64_6.dll` 可以看出）没有为 `gfx1201`（RDNA4）预编译内核。从用户提供的文件列表中，`TensileLibrary_*` 文件中仅包含截至 `gfx1151` 的内核。因此，即使设置了 `HSA_OVERRIDE_GFX_VERSION`，HIP 运行时也无法找到适合该架构的内核，导致初始化失败。 **请求**希望官方能在 Windows 版的 ROCm 支持中添加 `gfx1201`（RDNA4）。具体途径可以是： * 打包更新的 ROCm 版本（例如 7.x），其中包含对 RDNA4 的支持。 * 或在现有的 ROCm 6.x 包中为 `gfx1201` 添加所需的内核编译。感谢 Ollama 团队的辛勤工作！如果需要更多信息或测试，请随时告知。

GiteaMirror added the feature request label 2026-04-12 22:25:39 -05:00

GiteaMirror commented

2026-04-12 22:25:40 -05:00

@trtr6842-git commented on GitHub (Mar 7, 2026):

This might be a little old:
https://github.com/ollama/ollama/issues/10430#issuecomment-3707294717

But since then I've added a R9700, and it works just fine:

C:\Users\ttyle>ollama -v
ollama version is 0.13.5

C:\Users\ttyle>ollama ps
NAME    ID    SIZE    PROCESSOR    CONTEXT    UNTIL

C:\Users\ttyle>ollama ps
NAME                 ID              SIZE     PROCESSOR    CONTEXT    UNTIL
qwen2.5-coder:14b    9ec8897f747e    17 GB    100% GPU     262144     4 minutes from now

C:\Users\ttyle>

Did you add the ROCBLAS_TENSILE_LIBPATH environment variable pointing to ...\AMD\ROCm\6.4\bin\rocblas\library?

@trtr6842-git commented on GitHub (Mar 7, 2026): This might be a little old: https://github.com/ollama/ollama/issues/10430#issuecomment-3707294717 But since then I've added a R9700, and it works just fine: <img width="839" height="518" alt="Image" src="https://github.com/user-attachments/assets/c4058937-0fa2-47c6-8d87-934dee9553a6" /> ``` C:\Users\ttyle>ollama -v ollama version is 0.13.5 C:\Users\ttyle>ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL C:\Users\ttyle>ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL qwen2.5-coder:14b 9ec8897f747e 17 GB 100% GPU 262144 4 minutes from now C:\Users\ttyle> ``` <img width="961" height="651" alt="Image" src="https://github.com/user-attachments/assets/d74c6120-9719-46e7-9402-0164662523ce" /> Did you add the `ROCBLAS_TENSILE_LIBPATH` environment variable pointing to `...\AMD\ROCm\6.4\bin\rocblas\library`?

GiteaMirror commented

2026-04-12 22:25:41 -05:00

@Jasdfgh commented on GitHub (Mar 11, 2026):

based on what trtr6842-git mentioned: might be worth trying a clean ollama install (without the manual rocm zip replacement) + setting ROCBLAS_TENSILE_LIBPATH to your HIP SDK 7.1's rocblas library path. the manual file extraction could be what's conflicting.

your dual R9700 setup might hit a known issue with dual AMD GPUs on Windows. if things still don't work after the above, testing with HIP_VISIBLE_DEVICES=0 could help isolate that.

@Jasdfgh commented on GitHub (Mar 11, 2026): based on what trtr6842-git mentioned: might be worth trying a clean ollama install (without the manual rocm zip replacement) + setting ROCBLAS_TENSILE_LIBPATH to your HIP SDK 7.1's rocblas library path. the manual file extraction could be what's conflicting. your dual R9700 setup might hit a [known issue with dual AMD GPUs on Windows](https://github.com/ollama/ollama/pull/10676). if things still don't work after the above, testing with HIP_VISIBLE_DEVICES=0 could help isolate that.

GiteaMirror commented

2026-04-12 22:25:41 -05:00

@aibin8910 commented on GitHub (Mar 11, 2026):

After adding ROCBLAS_TENSILE_LIBPATH, a new issue arose. When ollama loaded the qwen3.5-35B model, it encountered a 500 error. However, after switching to Vulkan, everything functioned normally. Currently, the HIP_VISIBLE_DEVICES=0,1 parameter is set

@aibin8910 commented on GitHub (Mar 11, 2026): After adding ROCBLAS_TENSILE_LIBPATH, a new issue arose. When ollama loaded the qwen3.5-35B model, it encountered a 500 error. However, after switching to Vulkan, everything functioned normally. Currently, the HIP_VISIBLE_DEVICES=0,1 parameter is set

GiteaMirror commented

2026-04-12 22:25:42 -05:00

@Jasdfgh commented on GitHub (Mar 15, 2026):

nice, so ROCBLAS_TENSILE_LIBPATH got ROCm initializing — that's progress. the 500 on qwen3.5-35B specifically is interesting. could you share the ollama server log around the time of the 500? would help narrow down if it's a HIP kernel error (qwen 3.5 has a known dispatch overhead issue on ROCm, tracked at https://github.com/ggml-org/llama.cpp/issues/18823) or something else like OOM on the dual GPU split.

@Jasdfgh commented on GitHub (Mar 15, 2026): nice, so ROCBLAS_TENSILE_LIBPATH got ROCm initializing — that's progress. the 500 on qwen3.5-35B specifically is interesting. could you share the ollama server log around the time of the 500? would help narrow down if it's a HIP kernel error (qwen 3.5 has a known dispatch overhead issue on ROCm, tracked at https://github.com/ggml-org/llama.cpp/issues/18823) or something else like OOM on the dual GPU split.

GiteaMirror commented

2026-04-12 22:25:42 -05:00

@aibin8910 commented on GitHub (Mar 16, 2026):

time=2026-03-16T20:43:35.626+08:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:D:\\ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-03-16T20:43:35.633+08:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false"
time=2026-03-16T20:43:35.634+08:00 level=INFO source=images.go:477 msg="total blobs: 17"
time=2026-03-16T20:43:35.634+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-16T20:43:35.635+08:00 level=INFO source=routes.go:1782 msg="Listening on [::]:11434 (version 0.18.0)"
time=2026-03-16T20:43:35.635+08:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-03-16T20:43:35.636+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-16T20:43:35.648+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 62132"
time=2026-03-16T20:43:35.648+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v13;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\cuda_v13
time=2026-03-16T20:43:35.762+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=124.3364ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[]
time=2026-03-16T20:43:35.763+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 62140"
time=2026-03-16T20:43:35.763+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm
time=2026-03-16T20:43:37.754+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=1.9924009s OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[]
time=2026-03-16T20:43:37.754+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-03-16T20:43:37.755+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55694"
time=2026-03-16T20:43:37.755+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v12;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\cuda_v12
time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=68.3443ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=map[]
time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=3
time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon(TM) Graphics" compute=gfx1036 id=0 pci_id=0000:7e:00.0
time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon AI PRO R9700" compute=gfx1201 id=1 pci_id=0000:03:00.0
time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon AI PRO R9700" compute=gfx1201 id=2 pci_id=0000:07:00.0
time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55701"
time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55699"
time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=2 GGML_CUDA_INIT=1
time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55700"
time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1 GGML_CUDA_INIT=1
time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0 GGML_CUDA_INIT=1
time=2026-03-16T20:43:38.067+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=243.8174ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:0]"
time=2026-03-16T20:43:38.067+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\CustomApp\Ollama\lib\ollama\rocm pci_id=0000:7e:00.0 library=ROCm
time=2026-03-16T20:43:38.459+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=636.0776ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:1]"
time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=651.603ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:2]"
time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=1 new_ID=0
time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=2 new_ID=1
time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=2.8395903s
time=2026-03-16T20:43:38.475+08:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=1 library=ROCm compute=gfx1201 name=ROCm1 description="AMD Radeon AI PRO R9700" libdirs=ollama,rocm driver=60551.38 pci_id=0000:03:00.0 type=discrete total="31.9 GiB" available="31.9 GiB"
time=2026-03-16T20:43:38.475+08:00 level=INFO source=types.go:42 msg="inference compute" id=1 filter_id=2 library=ROCm compute=gfx1201 name=ROCm2 description="AMD Radeon AI PRO R9700" libdirs=ollama,rocm driver=60551.38 pci_id=0000:07:00.0 type=discrete total="31.9 GiB" available="31.9 GiB"
time=2026-03-16T20:43:38.475+08:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="63.7 GiB" default_num_ctx=262144
[GIN] 2026/03/16 - 20:43:38 | 200 |       530.7µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/16 - 20:43:38 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/16 - 20:43:38 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/16 - 20:43:38 | 200 |      2.1113ms |       127.0.0.1 | GET      "/api/tags"
time=2026-03-16T20:43:38.572+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
[GIN] 2026/03/16 - 20:43:38 | 200 |    105.8152ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/03/16 - 20:43:39 | 200 |    823.5583ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/03/16 - 20:43:39 | 200 |    823.5583ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/03/16 - 20:43:43 | 200 |       1.023ms |       127.0.0.1 | GET      "/api/tags"
time=2026-03-16T20:43:43.778+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
[GIN] 2026/03/16 - 20:43:43 | 200 |    103.1114ms |       127.0.0.1 | POST     "/api/show"
time=2026-03-16T20:43:43.872+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
[GIN] 2026/03/16 - 20:43:43 | 200 |     91.9583ms |       127.0.0.1 | POST     "/api/show"
time=2026-03-16T20:43:43.963+08:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
time=2026-03-16T20:43:43.963+08:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-03-16T20:43:43.967+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 59552"
time=2026-03-16T20:43:43.967+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1,2
time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=1.7939987s OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[HIP_VISIBLE_DEVICES:1,2]
time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=1.7939987s
time=2026-03-16T20:43:45.757+08:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-16T20:43:45.757+08:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32
time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2
time=2026-03-16T20:43:45.772+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-16T20:43:45.774+08:00 level=DEBUG source=sched.go:256 msg="loading first model" model=D:\ollama\models\blobs\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.pooling_type default=0
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.head_count_kv default=0
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.type default=""
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.type default=""
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.factor default=1
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.original_context_length default=0
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.scale default=0
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.norm_top_k_prob default=true
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.mrope_interleaved default=false
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.rope.freq_base default=10000
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.num_positional_embeddings default=2304
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-16T20:43:45.818+08:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-03-16T20:43:45.818+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --model D:\\ollama\\models\\blobs\\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a --port 59558"
time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1,2
time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:489 msg="system memory" total="125.6 GiB" free="111.7 GiB" free_swap="109.2 GiB"
time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=0 library=ROCm available="31.4 GiB" free="31.9 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=1 library=ROCm available="31.4 GiB" free="31.9 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-16T20:43:45.827+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1
time=2026-03-16T20:43:45.857+08:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-16T20:43:45.865+08:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:59558"
time=2026-03-16T20:43:45.870+08:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:41[ID:0 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-16T20:43:45.893+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-16T20:43:45.896+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=57
time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\CustomApp\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\CustomApp\Ollama\lib\ollama\ggml-cpu-icelake.dll
time=2026-03-16T20:43:45.908+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\CustomApp\Ollama\lib\ollama\rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, ID: 0
  Device 1: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, ID: 1
load_backend: loaded ROCm backend from C:\CustomApp\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2026-03-16T20:43:45.942+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 ROCm.1.NO_VMM=1 ROCm.1.NO_PEER_COPY=1 ROCm.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.pooling_type default=0
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.head_count_kv default=0
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.type default=""
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.type default=""
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.factor default=1
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.original_context_length default=0
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.scale default=0
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.norm_top_k_prob default=true
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.mrope_interleaved default=false
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.rope.freq_base default=10000
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.num_positional_embeddings default=2304
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-16T20:43:46.212+08:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1
rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98
ggml_cuda_compute_forward: SOLVE_TRI failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2882
  err
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
time=2026-03-16T20:43:47.974+08:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:59558/load\": read tcp 127.0.0.1:59563->127.0.0.1:59558: wsarecv: An existing connection was forcibly closed by the remote host."
time=2026-03-16T20:43:47.974+08:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:59558/load\": dial tcp 127.0.0.1:59558: connectex: No connection could be made because the target machine actively refused it."
time=2026-03-16T20:43:47.974+08:00 level=INFO source=sched.go:516 msg="Load failed" model=D:\ollama\models\blobs\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
time=2026-03-16T20:43:47.974+08:00 level=DEBUG source=server.go:1830 msg="stopping llama server" pid=4728
[GIN] 2026/03/16 - 20:43:47 | 500 |    4.0893504s |       127.0.0.1 | POST     "/api/chat"
time=2026-03-16T20:43:48.022+08:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 1"
[GIN] 2026/03/16 - 20:44:13 | 200 |      1.0248ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/03/16 - 20:44:43 | 200 |      1.0593ms |       127.0.0.1 | GET      "/api/tags"

上面是详细的500错误的server.log

@aibin8910 commented on GitHub (Mar 16, 2026): ``` time=2026-03-16T20:43:35.626+08:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:D:\\ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" time=2026-03-16T20:43:35.633+08:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false" time=2026-03-16T20:43:35.634+08:00 level=INFO source=images.go:477 msg="total blobs: 17" time=2026-03-16T20:43:35.634+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-16T20:43:35.635+08:00 level=INFO source=routes.go:1782 msg="Listening on [::]:11434 (version 0.18.0)" time=2026-03-16T20:43:35.635+08:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler" time=2026-03-16T20:43:35.636+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-16T20:43:35.648+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 62132" time=2026-03-16T20:43:35.648+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v13;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\cuda_v13 time=2026-03-16T20:43:35.762+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=124.3364ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[] time=2026-03-16T20:43:35.763+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 62140" time=2026-03-16T20:43:35.763+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm time=2026-03-16T20:43:37.754+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=1.9924009s OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[] time=2026-03-16T20:43:37.754+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-16T20:43:37.755+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55694" time=2026-03-16T20:43:37.755+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v12;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\cuda_v12 time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=68.3443ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=map[] time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=3 time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon(TM) Graphics" compute=gfx1036 id=0 pci_id=0000:7e:00.0 time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon AI PRO R9700" compute=gfx1201 id=1 pci_id=0000:03:00.0 time=2026-03-16T20:43:37.823+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\CustomApp\Ollama\lib\ollama\rocm description="AMD Radeon AI PRO R9700" compute=gfx1201 id=2 pci_id=0000:07:00.0 time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55701" time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55699" time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=2 GGML_CUDA_INIT=1 time=2026-03-16T20:43:37.824+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 55700" time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1 GGML_CUDA_INIT=1 time=2026-03-16T20:43:37.824+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0 GGML_CUDA_INIT=1 time=2026-03-16T20:43:38.067+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=243.8174ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:0]" time=2026-03-16T20:43:38.067+08:00 level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=C:\CustomApp\Ollama\lib\ollama\rocm pci_id=0000:7e:00.0 library=ROCm time=2026-03-16T20:43:38.459+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=636.0776ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:1]" time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=651.603ms OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs="map[GGML_CUDA_INIT:1 HIP_VISIBLE_DEVICES:2]" time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=1 new_ID=0 time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=2 new_ID=1 time=2026-03-16T20:43:38.475+08:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=2.8395903s time=2026-03-16T20:43:38.475+08:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=1 library=ROCm compute=gfx1201 name=ROCm1 description="AMD Radeon AI PRO R9700" libdirs=ollama,rocm driver=60551.38 pci_id=0000:03:00.0 type=discrete total="31.9 GiB" available="31.9 GiB" time=2026-03-16T20:43:38.475+08:00 level=INFO source=types.go:42 msg="inference compute" id=1 filter_id=2 library=ROCm compute=gfx1201 name=ROCm2 description="AMD Radeon AI PRO R9700" libdirs=ollama,rocm driver=60551.38 pci_id=0000:07:00.0 type=discrete total="31.9 GiB" available="31.9 GiB" time=2026-03-16T20:43:38.475+08:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="63.7 GiB" default_num_ctx=262144 [GIN] 2026/03/16 - 20:43:38 | 200 | 530.7µs | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/16 - 20:43:38 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/16 - 20:43:38 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/16 - 20:43:38 | 200 | 2.1113ms | 127.0.0.1 | GET "/api/tags" time=2026-03-16T20:43:38.572+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 [GIN] 2026/03/16 - 20:43:38 | 200 | 105.8152ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/03/16 - 20:43:39 | 200 | 823.5583ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/03/16 - 20:43:39 | 200 | 823.5583ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/03/16 - 20:43:43 | 200 | 1.023ms | 127.0.0.1 | GET "/api/tags" time=2026-03-16T20:43:43.778+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 [GIN] 2026/03/16 - 20:43:43 | 200 | 103.1114ms | 127.0.0.1 | POST "/api/show" time=2026-03-16T20:43:43.872+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 [GIN] 2026/03/16 - 20:43:43 | 200 | 91.9583ms | 127.0.0.1 | POST "/api/show" time=2026-03-16T20:43:43.963+08:00 level=DEBUG source=runner.go:264 msg="refreshing free memory" time=2026-03-16T20:43:43.963+08:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" time=2026-03-16T20:43:43.967+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --port 59552" time=2026-03-16T20:43:43.967+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1,2 time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=1.7939987s OLLAMA_LIBRARY_PATH="[C:\\CustomApp\\Ollama\\lib\\ollama C:\\CustomApp\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[HIP_VISIBLE_DEVICES:1,2] time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=1.7939987s time=2026-03-16T20:43:45.757+08:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-03-16T20:43:45.757+08:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32 time=2026-03-16T20:43:45.757+08:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2 time=2026-03-16T20:43:45.772+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-16T20:43:45.774+08:00 level=DEBUG source=sched.go:256 msg="loading first model" model=D:\ollama\models\blobs\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.pooling_type default=0 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.head_count_kv default=0 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.type default="" time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.type default="" time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.factor default=1 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.original_context_length default=0 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.scale default=0 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.norm_top_k_prob default=true time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.mrope_interleaved default=false time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.rope.freq_base default=10000 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.num_positional_embeddings default=2304 time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-16T20:43:45.818+08:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-03-16T20:43:45.818+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\CustomApp\\Ollama\\ollama.exe runner --ollama-engine --model D:\\ollama\\models\\blobs\\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a --port 59558" time=2026-03-16T20:43:45.818+08:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_VULKAN=0 OLLAMA_HOST=0.0.0.0 HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" PATH="C:\\CustomApp\\Ollama\\lib\\ollama;C:\\CustomApp\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\AMD\\ROCm\\6.4\\\\bin;C:\\Users\\aibin\\.venv\\Scripts;C:\\Program Files\\Git\\cmd;C:\\Program Files\\nodejs\\;C:\\Users\\aibin\\.local\\bin;C:\\Users\\aibin\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\aibin\\AppData\\Local\\Programs\\Ollama;C:\\Users\\aibin\\AppData\\Roaming\\npm;C:\\Users\\aibin\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\CustomApp\\Ollama;" HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=1 OLLAMA_MODELS=D:\ollama\models OLLAMA_NO_CLOUD=0 OLLAMA_LIBRARY_PATH=C:\CustomApp\Ollama\lib\ollama;C:\CustomApp\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1,2 time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:489 msg="system memory" total="125.6 GiB" free="111.7 GiB" free_swap="109.2 GiB" time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=0 library=ROCm available="31.4 GiB" free="31.9 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-16T20:43:45.827+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=1 library=ROCm available="31.4 GiB" free="31.9 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-16T20:43:45.827+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1 time=2026-03-16T20:43:45.857+08:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-16T20:43:45.865+08:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:59558" time=2026-03-16T20:43:45.870+08:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:41[ID:0 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-16T20:43:45.893+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-16T20:43:45.896+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=57 time=2026-03-16T20:43:45.896+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\CustomApp\Ollama\lib\ollama load_backend: loaded CPU backend from C:\CustomApp\Ollama\lib\ollama\ggml-cpu-icelake.dll time=2026-03-16T20:43:45.908+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\CustomApp\Ollama\lib\ollama\rocm ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 ROCm devices: Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, ID: 0 Device 1: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, ID: 1 load_backend: loaded ROCm backend from C:\CustomApp\Ollama\lib\ollama\rocm\ggml-hip.dll time=2026-03-16T20:43:45.942+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 ROCm.1.NO_VMM=1 ROCm.1.NO_PEER_COPY=1 ROCm.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.pooling_type default=0 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.head_count_kv default=0 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.type default="" time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.type default="" time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.factor default=1 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.rope.scaling.original_context_length default=0 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.attention.scale default=0 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.norm_top_k_prob default=true time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.mrope_interleaved default=false time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.rope.freq_base default=10000 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35moe.vision.num_positional_embeddings default=2304 time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-16T20:43:45.949+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-16T20:43:46.212+08:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1 rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98 ggml_cuda_compute_forward: SOLVE_TRI failed ROCm error: invalid device function current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2882 err C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error time=2026-03-16T20:43:47.974+08:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:59558/load\": read tcp 127.0.0.1:59563->127.0.0.1:59558: wsarecv: An existing connection was forcibly closed by the remote host." time=2026-03-16T20:43:47.974+08:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:59558/load\": dial tcp 127.0.0.1:59558: connectex: No connection could be made because the target machine actively refused it." time=2026-03-16T20:43:47.974+08:00 level=INFO source=sched.go:516 msg="Load failed" model=D:\ollama\models\blobs\sha256-900dde62fb7ebe8a5a25e35d5b7633f403f226a310965fed51d50f5238ba145a error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" time=2026-03-16T20:43:47.974+08:00 level=DEBUG source=server.go:1830 msg="stopping llama server" pid=4728 [GIN] 2026/03/16 - 20:43:47 | 500 | 4.0893504s | 127.0.0.1 | POST "/api/chat" time=2026-03-16T20:43:48.022+08:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 1" [GIN] 2026/03/16 - 20:44:13 | 200 | 1.0248ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/16 - 20:44:43 | 200 | 1.0593ms | 127.0.0.1 | GET "/api/tags" ``` 上面是详细的500错误的server.log <img width="996" height="782" alt="Image" src="https://github.com/user-attachments/assets/cbda7542-d2d5-4f49-9ce0-3126b1d00d97" />

GiteaMirror commented

2026-04-12 22:25:43 -05:00

@aibin8910 commented on GitHub (Mar 16, 2026):

我现在已经配置了ROCBLAS_TENSILE_LIBPATH，关闭了Vulkan，模型换成了glm-4.7-flash，已经可以运行了；就是qwen3.5系列模型都无法运行。只能够使用Vulkan去跑qwen3.5系列模型。

@aibin8910 commented on GitHub (Mar 16, 2026): 我现在已经配置了ROCBLAS_TENSILE_LIBPATH，关闭了Vulkan，模型换成了glm-4.7-flash，已经可以运行了；就是qwen3.5系列模型都无法运行。只能够使用Vulkan去跑qwen3.5系列模型。

GiteaMirror commented

2026-04-12 22:25:44 -05:00

@Jasdfgh commented on GitHub (Mar 18, 2026):

Qwen3.5 系列用了 DeltaNet 架构，会调 rocBLAS 的三角求解，gfx1201 上这个操作好像目前没有正确的 kernel。
GLM-4.7 是标准 transformer 不走这个路径所以没问题。

同款 GPU 同样的问题在 #14423 也有好几个人确认了。目前我认为 Vulkan 是最靠谱的绕过方案。

@Jasdfgh commented on GitHub (Mar 18, 2026): Qwen3.5 系列用了 DeltaNet 架构，会调 rocBLAS 的三角求解，gfx1201 上这个操作好像目前没有正确的 kernel。 GLM-4.7 是标准 transformer 不走这个路径所以没问题。同款 GPU 同样的问题在 #14423 也有好几个人确认了。目前我认为 Vulkan 是最靠谱的绕过方案。

GiteaMirror commented

2026-04-12 22:25:44 -05:00

@aibin8910 commented on GitHub (Mar 20, 2026):

最新的情况发现，deepseek-r1的模型都无法使用GPU，只能够使用CPU计算，token输出速度deepseek-r1:70b只有2token/s，deepseek-r1:32b只有7token/s，无论是使用ROCm还是Vulkan，都无法使用GPU，也配置了HIP_VISIBLE_DEVICES=0.1;OLLAMA_NUM_GPU=2,都无法让GPU运行。

@aibin8910 commented on GitHub (Mar 20, 2026): 最新的情况发现，deepseek-r1的模型都无法使用GPU，只能够使用CPU计算，token输出速度deepseek-r1:70b只有2token/s，deepseek-r1:32b只有7token/s，无论是使用ROCm还是Vulkan，都无法使用GPU，也配置了HIP_VISIBLE_DEVICES=0.1;OLLAMA_NUM_GPU=2,都无法让GPU运行。

GiteaMirror commented

2026-04-12 22:25:45 -05:00

@slojosic-amd commented on GitHub (Mar 23, 2026):

FYI: https://github.com/ollama/ollama/pull/14979

@slojosic-amd commented on GitHub (Mar 23, 2026): FYI: https://github.com/ollama/ollama/pull/14979

GiteaMirror commented

2026-04-12 22:25:45 -05:00

@xiaoxihooo-source commented on GitHub (Mar 29, 2026):

最新的情况发现，deepseek-r1的模型都无法使用GPU，只能够使用CPU计算，token输出速度deepseek-r1:70b只有2token/s，deepseek-r1:32b只有7token/s，无论是使用ROCm还是Vulkan，都无法使用GPU，也配置了HIP_VISIBLE_DEVICES=0.1;OLLAMA_NUM_GPU=2,都无法让GPU运行。

ollama他们动作太慢了，我已经转用lm studio 了，虽然也不完美，但至少可以用

@xiaoxihooo-source commented on GitHub (Mar 29, 2026): > 最新的情况发现，deepseek-r1的模型都无法使用GPU，只能够使用CPU计算，token输出速度deepseek-r1:70b只有2token/s，deepseek-r1:32b只有7token/s，无论是使用ROCm还是Vulkan，都无法使用GPU，也配置了HIP_VISIBLE_DEVICES=0.1;OLLAMA_NUM_GPU=2,都无法让GPU运行。 ollama他们动作太慢了，我已经转用lm studio 了，虽然也不完美，但至少可以用

GiteaMirror commented

2026-04-12 22:25:46 -05:00

@aibin8910 commented on GitHub (Mar 31, 2026):

我也已经转用LM Studio了，虽然经常内存暴涨，至少能用起来了。

@aibin8910 commented on GitHub (Mar 31, 2026): 我也已经转用LM Studio了，虽然经常内存暴涨，至少能用起来了。

GiteaMirror referenced this issue

2026-04-22 12:46:55 -05:00

[GH-ISSUE #9504] Request modular updates to reduce download size (decouple CUDA libraries from core updates) #31952

GiteaMirror referenced this issue

2026-04-29 00:34:06 -05:00

[GH-ISSUE #9504] Request modular updates to reduce download size (decouple CUDA libraries from core updates) #52704

GiteaMirror referenced this issue

2026-05-04 12:59:49 -05:00

[GH-ISSUE #9504] Request modular updates to reduce download size (decouple CUDA libraries from core updates) #68248

GiteaMirror referenced this issue

2026-05-05 03:36:22 -05:00

[GH-ISSUE #15889] Cloud-only binaries #72184

GiteaMirror referenced this issue

2026-05-09 19:18:08 -05:00

[GH-ISSUE #9504] Request modular updates to reduce download size (decouple CUDA libraries from core updates) #83877

GiteaMirror referenced this issue

2026-05-10 06:24:54 -05:00

[GH-ISSUE #15889] Cloud-only binaries #87812

Sign in to join this conversation.

Branches Tags

main

hoyyeva/fix-claude-channels-env

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#9504