[GH-ISSUE #14731] 有显卡有驱动就是无法调用 #35286

Open
opened 2026-04-22 19:40:58 -05:00 by GiteaMirror · 17 comments
Owner

Originally created by @kittyzero520 on GitHub (Mar 9, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14731

What is the issue?

无法调用显卡

Relevant log output

(base) root@kittyzero:~# ollama ps
NAME              ID              SIZE      PROCESSOR    CONTEXT    UNTIL              
qwen3.5:latest    6488c96fa5fa    8.5 GB    100% CPU     4096       3 minutes from now  

(base) root@kittyzero:~# rocminfo | grep gfx
  Name:                    gfx1100                            
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Name:                    amdgcn-amd-amdhsa--gfx11-generic   
(base) root@kittyzero:~# dpkg -l|grep rocm
ii  rocm                                             7.2.0.70200-43~24.04                     amd64        Radeon Open Compute (ROCm) software stack meta package
ii  rocm-cmake                                       0.14.0.70200-43~24.04                    amd64        rocm-cmake built using CMake
ii  rocm-core                                        7.2.0.70200-43~24.04                     amd64        ROCm Runtime software stack
ii  rocm-dbgapi                                      0.77.4.70200-43~24.04                    amd64        Library to provide AMD GPU debugger API
ii  rocm-debug-agent                                 2.1.0.70200-43~24.04                     amd64        Radeon Open Compute Debug Agent (ROCdebug-agent)
ii  rocm-developer-tools                             7.2.0.70200-43~24.04                     amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-device-libs                                 1.0.0.70200-43~24.04                     amd64        Radeon Open Compute - device libraries
ii  rocm-gdb                                         16.3.70200-43~24.04                      amd64        ROCgdb
ii  rocm-hip                                         7.2.0.70200-43~24.04                     amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-llvm                                        22.0.0.26014.70200-43~24.04              amd64        ROCm core compiler
ii  rocm-opencl                                      2.0.0.70200-43~24.04                     amd64        clr built using CMake
ii  rocm-opencl-dev                                  2.0.0.70200-43~24.04                     amd64        clr built using CMake
ii  rocm-opencl-sdk                                  7.2.0.70200-43~24.04                     amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-openmp                                      7.2.0.70200-43~24.04                     amd64        Radeon Open Compute (ROCm) OpenMP Software development Kit.
ii  rocm-smi-lib                                     7.8.0.70200-43~24.04                     amd64        AMD System Management libraries
ii  rocminfo

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

v0.17.7

Originally created by @kittyzero520 on GitHub (Mar 9, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14731 ### What is the issue? 无法调用显卡 ### Relevant log output ```shell (base) root@kittyzero:~# ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL qwen3.5:latest 6488c96fa5fa 8.5 GB 100% CPU 4096 3 minutes from now (base) root@kittyzero:~# rocminfo | grep gfx Name: gfx1100 Name: amdgcn-amd-amdhsa--gfx1100 Name: amdgcn-amd-amdhsa--gfx11-generic (base) root@kittyzero:~# dpkg -l|grep rocm ii rocm 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) software stack meta package ii rocm-cmake 0.14.0.70200-43~24.04 amd64 rocm-cmake built using CMake ii rocm-core 7.2.0.70200-43~24.04 amd64 ROCm Runtime software stack ii rocm-dbgapi 0.77.4.70200-43~24.04 amd64 Library to provide AMD GPU debugger API ii rocm-debug-agent 2.1.0.70200-43~24.04 amd64 Radeon Open Compute Debug Agent (ROCdebug-agent) ii rocm-developer-tools 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) Runtime software stack ii rocm-device-libs 1.0.0.70200-43~24.04 amd64 Radeon Open Compute - device libraries ii rocm-gdb 16.3.70200-43~24.04 amd64 ROCgdb ii rocm-hip 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) Runtime software stack ii rocm-llvm 22.0.0.26014.70200-43~24.04 amd64 ROCm core compiler ii rocm-opencl 2.0.0.70200-43~24.04 amd64 clr built using CMake ii rocm-opencl-dev 2.0.0.70200-43~24.04 amd64 clr built using CMake ii rocm-opencl-sdk 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) Runtime software stack ii rocm-openmp 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) OpenMP Software development Kit. ii rocm-smi-lib 7.8.0.70200-43~24.04 amd64 AMD System Management libraries ii rocminfo ``` ### OS Linux ### GPU AMD ### CPU Intel ### Ollama version v0.17.7
GiteaMirror added the bug label 2026-04-22 19:40:58 -05:00
Author
Owner

@kittyzero520 commented on GitHub (Mar 9, 2026):

Image
<!-- gh-comment-id:4023030362 --> @kittyzero520 commented on GitHub (Mar 9, 2026): <img width="2396" height="952" alt="Image" src="https://github.com/user-attachments/assets/ab301be4-6f19-4452-be1c-48efed8f84b3" />
Author
Owner

@kittyzero520 commented on GitHub (Mar 9, 2026):

Image
<!-- gh-comment-id:4023036107 --> @kittyzero520 commented on GitHub (Mar 9, 2026): <img width="1796" height="100" alt="Image" src="https://github.com/user-attachments/assets/253be2d8-384a-4047-85a2-49d9a3e88ca3" />
Author
Owner

@rick-github commented on GitHub (Mar 9, 2026):

What GPU card?

<!-- gh-comment-id:4023720485 --> @rick-github commented on GitHub (Mar 9, 2026): What GPU card?
Author
Owner

@kittyzero520 commented on GitHub (Mar 9, 2026):

What GPU card?

amd 7900xtx 24G

(base) root@kittyzero:~# rocminfo | grep gfx
Name: gfx1100
Name: amdgcn-amd-amdhsa--gfx1100
Name: amdgcn-amd-amdhsa--gfx11-generic

<!-- gh-comment-id:4024169251 --> @kittyzero520 commented on GitHub (Mar 9, 2026): > What GPU card? amd 7900xtx 24G (base) root@kittyzero:~# rocminfo | grep gfx Name: gfx1100 Name: amdgcn-amd-amdhsa--gfx1100 Name: amdgcn-amd-amdhsa--gfx11-generic
Author
Owner

@rick-github commented on GitHub (Mar 9, 2026):

Try the Vulkan accelerator.

<!-- gh-comment-id:4024264661 --> @rick-github commented on GitHub (Mar 9, 2026): Try the [Vulkan](https://docs.ollama.com/gpu#vulkan-gpu-support) accelerator.
Author
Owner

@kittyzero520 commented on GitHub (Mar 9, 2026):

What GPU card?

time=2026-03-09T23:06:45.482+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33345"
time=2026-03-09T23:06:45.549+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44173"
time=2026-03-09T23:06:49.208+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1"
time=2026-03-09T23:06:49.208+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36395"
time=2026-03-09T23:06:49.251+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42499"
time=2026-03-09T23:07:19.252+08:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:GPU-6c19dc557b10e676]" error="failed to finish discovery before timeout"
time=2026-03-09T23:07:19.252+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="15.5 GiB" available="14.1 GiB"
time=2026-03-09T23:07:19.252+08:00 level=INFO source=routes.go:1763 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096
[GIN] 2026/03/09 - 23:09:16 | 200 | 3.24016ms | 127.0.0.1 | HEAD "/"
[GIN] 2026/03/09 - 23:09:16 | 200 | 789.821µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/03/09 - 23:09:26 | 200 | 28.565µs | 127.0.0.1 | HEAD "/"
[GIN] 2026/03/09 - 23:09:26 | 200 | 394.170705ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/03/09 - 23:09:27 | 200 | 437.20574ms | 127.0.0.1 | POST "/api/show"
time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 46523"
time=2026-03-09T23:09:28.159+08:00 level=INFO source=sched.go:489 msg="system memory" total="15.5 GiB" free="14.1 GiB" free_swap="14.9 GiB"
time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=33 requested=-1
time=2026-03-09T23:09:28.176+08:00 level=INFO source=runner.go:1429 msg="starting ollama engine"
time=2026-03-09T23:09:28.176+08:00 level=INFO source=runner.go:1464 msg="Server listening on 127.0.0.1:46523"
time=2026-03-09T23:09:28.181+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-09T23:09:28.277+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-sandybridge.so
time=2026-03-09T23:09:28.286+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-03-09T23:09:28.980+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/33 layers to GPU"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="6.1 GiB"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.4 GiB"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="433.7 MiB"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:272 msg="total memory" size="7.9 GiB"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=sched.go:565 msg="loaded runners" count=1
time=2026-03-09T23:09:30.371+08:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-03-09T23:09:30.371+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-09T23:09:46.201+08:00 level=INFO source=server.go:1388 msg="llama runner started in 18.04 seconds"
[GIN] 2026/03/09 - 23:09:46 | 200 | 18.98180016s | 127.0.0.1 | POST "/api/generate"

<!-- gh-comment-id:4024464142 --> @kittyzero520 commented on GitHub (Mar 9, 2026): > What GPU card? time=2026-03-09T23:06:45.482+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33345" time=2026-03-09T23:06:45.549+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44173" time=2026-03-09T23:06:49.208+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-09T23:06:49.208+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36395" time=2026-03-09T23:06:49.251+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42499" time=2026-03-09T23:07:19.252+08:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:GPU-6c19dc557b10e676]" error="failed to finish discovery before timeout" time=2026-03-09T23:07:19.252+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="15.5 GiB" available="14.1 GiB" time=2026-03-09T23:07:19.252+08:00 level=INFO source=routes.go:1763 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 [GIN] 2026/03/09 - 23:09:16 | 200 | 3.24016ms | 127.0.0.1 | HEAD "/" [GIN] 2026/03/09 - 23:09:16 | 200 | 789.821µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/09 - 23:09:26 | 200 | 28.565µs | 127.0.0.1 | HEAD "/" [GIN] 2026/03/09 - 23:09:26 | 200 | 394.170705ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/03/09 - 23:09:27 | 200 | 437.20574ms | 127.0.0.1 | POST "/api/show" time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 46523" time=2026-03-09T23:09:28.159+08:00 level=INFO source=sched.go:489 msg="system memory" total="15.5 GiB" free="14.1 GiB" free_swap="14.9 GiB" time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=33 requested=-1 time=2026-03-09T23:09:28.176+08:00 level=INFO source=runner.go:1429 msg="starting ollama engine" time=2026-03-09T23:09:28.176+08:00 level=INFO source=runner.go:1464 msg="Server listening on 127.0.0.1:46523" time=2026-03-09T23:09:28.181+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-09T23:09:28.277+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-sandybridge.so time=2026-03-09T23:09:28.286+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-03-09T23:09:28.980+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-09T23:09:30.368+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/33 layers to GPU" time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="6.1 GiB" time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.4 GiB" time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="433.7 MiB" time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:272 msg="total memory" size="7.9 GiB" time=2026-03-09T23:09:30.368+08:00 level=INFO source=sched.go:565 msg="loaded runners" count=1 time=2026-03-09T23:09:30.371+08:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-03-09T23:09:30.371+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" time=2026-03-09T23:09:46.201+08:00 level=INFO source=server.go:1388 msg="llama runner started in 18.04 seconds" [GIN] 2026/03/09 - 23:09:46 | 200 | 18.98180016s | 127.0.0.1 | POST "/api/generate"
Author
Owner

@kittyzero520 commented on GitHub (Mar 10, 2026):

Try the Vulkan accelerator.

The problem is that I’ve never opened this before, and it was normal before. Why isn’t it working all of a sudden?

<!-- gh-comment-id:4027942850 --> @kittyzero520 commented on GitHub (Mar 10, 2026): > Try the [Vulkan](https://docs.ollama.com/gpu#vulkan-gpu-support) accelerator. The problem is that I’ve never opened this before, and it was normal before. Why isn’t it working all of a sudden?
Author
Owner

@kittyzero520 commented on GitHub (Mar 11, 2026):

(base) root@kittyzero:# export LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm-7.2.0/lib:$LD_LIBRARY_PATH
(base) root@kittyzero:
# export HSA_OVERRIDE_GFX_VERSION=11.0.0
(base) root@kittyzero:# export HIP_VISIBLE_DEVICES=0
(base) root@kittyzero:
# pkill -9 ollama
(base) root@kittyzero:# ollama serve 2>&1 &
[1] 25047
(base) root@kittyzero:
# sleep 5time=2026-03-11T13:14:02.047+08:00 level=INFO source=routes.go:1658 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-03-11T13:14:02.047+08:00 level=INFO source=routes.go:1660 msg="Ollama cloud disabled: false"
time=2026-03-11T13:14:02.048+08:00 level=INFO source=images.go:477 msg="total blobs: 7"
time=2026-03-11T13:14:02.048+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-11T13:14:02.049+08:00 level=INFO source=routes.go:1713 msg="Listening on 127.0.0.1:11434 (version 0.17.7)"
time=2026-03-11T13:14:02.049+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HIP_VISIBLE_DEVICES=0
time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.0.0
time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again"
time=2026-03-11T13:14:02.050+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35559"
time=2026-03-11T13:14:02.081+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45679"
time=2026-03-11T13:14:02.112+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34845"
time=2026-03-11T13:14:04.348+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1"
time=2026-03-11T13:14:04.348+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33835"

<!-- gh-comment-id:4036501970 --> @kittyzero520 commented on GitHub (Mar 11, 2026): (base) root@kittyzero:~# export LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm-7.2.0/lib:$LD_LIBRARY_PATH (base) root@kittyzero:~# export HSA_OVERRIDE_GFX_VERSION=11.0.0 (base) root@kittyzero:~# export HIP_VISIBLE_DEVICES=0 (base) root@kittyzero:~# pkill -9 ollama (base) root@kittyzero:~# ollama serve 2>&1 & [1] 25047 (base) root@kittyzero:~# sleep 5time=2026-03-11T13:14:02.047+08:00 level=INFO source=routes.go:1658 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-03-11T13:14:02.047+08:00 level=INFO source=routes.go:1660 msg="Ollama cloud disabled: false" time=2026-03-11T13:14:02.048+08:00 level=INFO source=images.go:477 msg="total blobs: 7" time=2026-03-11T13:14:02.048+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-11T13:14:02.049+08:00 level=INFO source=routes.go:1713 msg="Listening on 127.0.0.1:11434 (version 0.17.7)" time=2026-03-11T13:14:02.049+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HIP_VISIBLE_DEVICES=0 time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.0.0 time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again" time=2026-03-11T13:14:02.050+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35559" time=2026-03-11T13:14:02.081+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45679" time=2026-03-11T13:14:02.112+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34845" time=2026-03-11T13:14:04.348+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-11T13:14:04.348+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33835"
Author
Owner

@Jasdfgh commented on GitHub (Mar 12, 2026):

7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的(LLVM target gfx1100 + Radeon RX 那边写了),GPU 发现超时不太正常。

你这几个环境变量不太对:

  • HSA_OVERRIDE_GFX_VERSION的话,是给不在支持列表里的 GPU 用的,让它伪装成一个接近的架构。gfx1100 本身就在列表里,不需要 override。设了可能导致匹配出错?
  • HIP_VISIBLE_DEVICES的话,你只有一个 GPU,好像不需要设。而且,ollama 的文档(https://docs.ollama.com/gpu)里, 好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES,不是 HIP_VISIBLE_DEVICES。
  • 而且,ollama 日志也在警告这些 override 可能干扰检测。

我建议先把这些 env vars 全部 unset,重新启动试试。另外 0.17.8 RC 刚出,更新了 Linux ROCm v7 支持,安装方法在这:https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions

还有问题可以说下是什么时候开始不行的,中间有没有更新过驱动或者内核。

<!-- gh-comment-id:4045360524 --> @Jasdfgh commented on GitHub (Mar 12, 2026): 7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的(LLVM target gfx1100 + Radeon RX 那边写了),GPU 发现超时不太正常。 你这几个环境变量不太对: - HSA_OVERRIDE_GFX_VERSION的话,是给不在支持列表里的 GPU 用的,让它伪装成一个接近的架构。gfx1100 本身就在列表里,不需要 override。设了可能导致匹配出错? - HIP_VISIBLE_DEVICES的话,你只有一个 GPU,好像不需要设。而且,ollama 的文档(https://docs.ollama.com/gpu)里, 好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES,不是 HIP_VISIBLE_DEVICES。 - 而且,ollama 日志也在警告这些 override 可能干扰检测。 我建议先把这些 env vars 全部 unset,重新启动试试。另外 0.17.8 RC 刚出,更新了 Linux ROCm v7 支持,安装方法在这:https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。 还有问题可以说下是什么时候开始不行的,中间有没有更新过驱动或者内核。
Author
Owner

@kittyzero520 commented on GitHub (Mar 13, 2026):

7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的(LLVM target gfx1100 + Radeon RX 那边写了),GPU 发现超时不太正常。

你这几个环境变量不太对:

  • HSA_OVERRIDE_GFX_VERSION的话,是给不在支持列表里的 GPU 用的,让它伪装成一个接近的架构。gfx1100 本身就在列表里,不需要 override。设了可能导致匹配出错?
  • HIP_VISIBLE_DEVICES的话,你只有一个 GPU,好像不需要设。而且,ollama 的文档(https://docs.ollama.com/gpu)里, 好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES,不是 HIP_VISIBLE_DEVICES。
  • 而且,ollama 日志也在警告这些 override 可能干扰检测。

我建议先把这些 env vars 全部 unset,重新启动试试。另外 0.17.8 RC 刚出,更新了 Linux ROCm v7 支持,安装方法在这:https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。

还有问题可以说下是什么时候开始不行的,中间有没有更新过驱动或者内核。

发现一个问题,如果把增加下面参数,Ollama可以使用显卡;你的这个方案,我试下,感谢你的支撑!

Image

7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的(LLVM target gfx1100 + Radeon RX 那边写了),GPU 发现超时不太正常。

你这几个环境变量不太对:

  • HSA_OVERRIDE_GFX_VERSION的话,是给不在支持列表里的 GPU 用的,让它伪装成一个接近的架构。gfx1100 本身就在列表里,不需要 override。设了可能导致匹配出错?
  • HIP_VISIBLE_DEVICES的话,你只有一个 GPU,好像不需要设。而且,ollama 的文档(https://docs.ollama.com/gpu)里, 好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES,不是 HIP_VISIBLE_DEVICES。
  • 而且,ollama 日志也在警告这些 override 可能干扰检测。

我建议先把这些 env vars 全部 unset,重新启动试试。另外 0.17.8 RC 刚出,更新了 Linux ROCm v7 支持,安装方法在这:https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。

还有问题可以说下是什么时候开始不行的,中间有没有更新过驱动或者内核。

升级到最新的17就不行了

<!-- gh-comment-id:4055606958 --> @kittyzero520 commented on GitHub (Mar 13, 2026): > 7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的(LLVM target gfx1100 + Radeon RX 那边写了),GPU 发现超时不太正常。 > > 你这几个环境变量不太对: > > * HSA_OVERRIDE_GFX_VERSION的话,是给不在支持列表里的 GPU 用的,让它伪装成一个接近的架构。gfx1100 本身就在列表里,不需要 override。设了可能导致匹配出错? > * HIP_VISIBLE_DEVICES的话,你只有一个 GPU,好像不需要设。而且,ollama 的文档([https://docs.ollama.com/gpu)里,](https://docs.ollama.com/gpu%EF%BC%89%E9%87%8C%EF%BC%8C) 好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES,不是 HIP_VISIBLE_DEVICES。 > * 而且,ollama 日志也在警告这些 override 可能干扰检测。 > > 我建议先把这些 env vars 全部 unset,重新启动试试。另外 0.17.8 RC 刚出,更新了 Linux ROCm v7 支持,安装方法在这:[https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。](https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions%E3%80%82) > > 还有问题可以说下是什么时候开始不行的,中间有没有更新过驱动或者内核。 发现一个问题,如果把增加下面参数,Ollama可以使用显卡;你的这个方案,我试下,感谢你的支撑! <img width="1196" height="956" alt="Image" src="https://github.com/user-attachments/assets/926ce6bb-b2a7-4499-af6e-0c041f8dd9fa" /> > 7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的(LLVM target gfx1100 + Radeon RX 那边写了),GPU 发现超时不太正常。 > > 你这几个环境变量不太对: > > * HSA_OVERRIDE_GFX_VERSION的话,是给不在支持列表里的 GPU 用的,让它伪装成一个接近的架构。gfx1100 本身就在列表里,不需要 override。设了可能导致匹配出错? > * HIP_VISIBLE_DEVICES的话,你只有一个 GPU,好像不需要设。而且,ollama 的文档([https://docs.ollama.com/gpu)里,](https://docs.ollama.com/gpu%EF%BC%89%E9%87%8C%EF%BC%8C) 好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES,不是 HIP_VISIBLE_DEVICES。 > * 而且,ollama 日志也在警告这些 override 可能干扰检测。 > > 我建议先把这些 env vars 全部 unset,重新启动试试。另外 0.17.8 RC 刚出,更新了 Linux ROCm v7 支持,安装方法在这:[https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。](https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions%E3%80%82) > > 还有问题可以说下是什么时候开始不行的,中间有没有更新过驱动或者内核。 升级到最新的17就不行了
Author
Owner

@Jasdfgh commented on GitHub (Mar 15, 2026):

看到了,你那两个命令,是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama?看看是不是 0.17 引入的问题。

<!-- gh-comment-id:4063217936 --> @Jasdfgh commented on GitHub (Mar 15, 2026): 看到了,你那两个命令,是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama?看看是不是 0.17 引入的问题。
Author
Owner

@kittyzero520 commented on GitHub (Mar 16, 2026):

看到了,你那两个命令,是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama?看看是不是 0.17 引入的问题。

1.之前版本不记得了
2.审计0.18版本可以加载qwen3.5 9b模型了,但是27B直接让ssh丢失连接,服务器无法访问了

Image Image
<!-- gh-comment-id:4065176842 --> @kittyzero520 commented on GitHub (Mar 16, 2026): > 看到了,你那两个命令,是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama?看看是不是 0.17 引入的问题。 1.之前版本不记得了 2.审计0.18版本可以加载qwen3.5 9b模型了,但是27B直接让ssh丢失连接,服务器无法访问了 <img width="2400" height="1560" alt="Image" src="https://github.com/user-attachments/assets/ea08432d-e024-422c-8804-e0897018e02e" /> <img width="2400" height="1560" alt="Image" src="https://github.com/user-attachments/assets/73fee515-b800-40e4-9f56-3c56cc5fd576" />
Author
Owner

@kittyzero520 commented on GitHub (Mar 16, 2026):

看到了,你那两个命令,是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama?看看是不是 0.17 引入的问题。

1.之前版本不记得了 2.审计0.18版本可以加载qwen3.5 9b模型了,但是27B直接让ssh丢失连接,服务器无法访问了,我看ram 内存16G 突然吃满了

Image Image

<!-- gh-comment-id:4065180860 --> @kittyzero520 commented on GitHub (Mar 16, 2026): > > 看到了,你那两个命令,是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama?看看是不是 0.17 引入的问题。 > > 1.之前版本不记得了 2.审计0.18版本可以加载qwen3.5 9b模型了,但是27B直接让ssh丢失连接,服务器无法访问了,我看ram 内存16G 突然吃满了 > > <img alt="Image" width="2000" height="1560" src="https://private-user-images.githubusercontent.com/33313886/563933945-ea08432d-e024-422c-8804-e0897018e02e.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzM2Mzk4OTIsIm5iZiI6MTc3MzYzOTU5MiwicGF0aCI6Ii8zMzMxMzg4Ni81NjM5MzM5NDUtZWEwODQzMmQtZTAyNC00MjJjLTg4MDQtZTA4OTcwMThlMDJlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAzMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMzE2VDA1Mzk1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE2MTg0M2FhM2NmNTNlZTFmNDNkYzY5MjM3ZjhmNDJmM2I3NjFhZTRmMjczNTJkMjc2MDExOTBlZTVlZjBiMTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Bsr23D7TFPQMovcJpe56d5QXvtZGt78U9JbiYzeogmc"> <img alt="Image" width="2000" height="1560" src="https://private-user-images.githubusercontent.com/33313886/563934004-73fee515-b800-40e4-9f56-3c56cc5fd576.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzM2Mzk4OTIsIm5iZiI6MTc3MzYzOTU5MiwicGF0aCI6Ii8zMzMxMzg4Ni81NjM5MzQwMDQtNzNmZWU1MTUtYjgwMC00MGU0LTlmNTYtM2M1NmNjNWZkNTc2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAzMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMzE2VDA1Mzk1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZkN2VhNWZiNzFlMWY5MGY4ZGM3ZTZlOGQ1NTQzZTc3YTkwYjY1MmU0MGZkODEzOGIxMTg0NTIwYTEwZWZmZDMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.h8l7wd7lUBY_Z4rV7JgytwYnRs85wSJgnTe01Xl7mRU">
Author
Owner

@kittyzero520 commented on GitHub (Mar 16, 2026):

看到了,你那两个命令,是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama?看看是不是 0.17 引入的问题。

1.之前版本不记得了
2.现在0.18版本可以加载qwen3.5 9b模型了,但是27B直接让ssh丢失连接,服务器无法访问了,而且ram 内存16G 突然吃满了

Image Image

<!-- gh-comment-id:4065182601 --> @kittyzero520 commented on GitHub (Mar 16, 2026): > > > 看到了,你那两个命令,是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama?看看是不是 0.17 引入的问题。 1.之前版本不记得了 2.现在0.18版本可以加载qwen3.5 9b模型了,但是27B直接让ssh丢失连接,服务器无法访问了,而且ram 内存16G 突然吃满了 > > <img alt="Image" width="2000" height="1560" src="https://private-user-images.githubusercontent.com/33313886/563933945-ea08432d-e024-422c-8804-e0897018e02e.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzM2Mzk4OTIsIm5iZiI6MTc3MzYzOTU5MiwicGF0aCI6Ii8zMzMxMzg4Ni81NjM5MzM5NDUtZWEwODQzMmQtZTAyNC00MjJjLTg4MDQtZTA4OTcwMThlMDJlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAzMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMzE2VDA1Mzk1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE2MTg0M2FhM2NmNTNlZTFmNDNkYzY5MjM3ZjhmNDJmM2I3NjFhZTRmMjczNTJkMjc2MDExOTBlZTVlZjBiMTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Bsr23D7TFPQMovcJpe56d5QXvtZGt78U9JbiYzeogmc"> <img alt="Image" width="2000" height="1560" src="https://private-user-images.githubusercontent.com/33313886/563934004-73fee515-b800-40e4-9f56-3c56cc5fd576.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzM2Mzk4OTIsIm5iZiI6MTc3MzYzOTU5MiwicGF0aCI6Ii8zMzMxMzg4Ni81NjM5MzQwMDQtNzNmZWU1MTUtYjgwMC00MGU0LTlmNTYtM2M1NmNjNWZkNTc2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAzMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMzE2VDA1Mzk1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZkN2VhNWZiNzFlMWY5MGY4ZGM3ZTZlOGQ1NTQzZTc3YTkwYjY1MmU0MGZkODEzOGIxMTg0NTIwYTEwZWZmZDMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.h8l7wd7lUBY_Z4rV7JgytwYnRs85wSJgnTe01Xl7mRU">
Author
Owner

@Jasdfgh commented on GitHub (Mar 18, 2026):

从 ollama ps 看 GPU 是在用的(100% GPU),9B 和 8B 都正常。27B 的问题,我觉得应该是 VRAM 不够了,模型本身 24GB 加上 32768 context 的 KV cache 超过了你 24G 显存,溢出到系统内存,你系统 RAM 只有 15.5G 所以直接 OOM 了。

试一下降 context,比如 OLLAMA_CONTEXT_LENGTH=4096 ollama run qwen3.5:27b,或者用小一点的量化版本?

<!-- gh-comment-id:4079875740 --> @Jasdfgh commented on GitHub (Mar 18, 2026): 从 ollama ps 看 GPU 是在用的(100% GPU),9B 和 8B 都正常。27B 的问题,我觉得应该是 VRAM 不够了,模型本身 24GB 加上 32768 context 的 KV cache 超过了你 24G 显存,溢出到系统内存,你系统 RAM 只有 15.5G 所以直接 OOM 了。 试一下降 context,比如 OLLAMA_CONTEXT_LENGTH=4096 ollama run qwen3.5:27b,或者用小一点的量化版本?
Author
Owner

@kittyzero520 commented on GitHub (Mar 18, 2026):

从 ollama ps 看 GPU 是在用的(100% GPU),9B 和 8B 都正常。27B 的问题,我觉得应该是 VRAM 不够了,模型本身 24GB 加上 32768 context 的 KV cache 超过了你 24G 显存,溢出到系统内存,你系统 RAM 只有 15.5G 所以直接 OOM 了。

试一下降 context,比如 OLLAMA_CONTEXT_LENGTH=4096 ollama run qwen3.5:27b,或者用小一点的量化版本?

我试一下

<!-- gh-comment-id:4082871894 --> @kittyzero520 commented on GitHub (Mar 18, 2026): > 从 ollama ps 看 GPU 是在用的(100% GPU),9B 和 8B 都正常。27B 的问题,我觉得应该是 VRAM 不够了,模型本身 24GB 加上 32768 context 的 KV cache 超过了你 24G 显存,溢出到系统内存,你系统 RAM 只有 15.5G 所以直接 OOM 了。 > > 试一下降 context,比如 OLLAMA_CONTEXT_LENGTH=4096 ollama run qwen3.5:27b,或者用小一点的量化版本? 我试一下
Author
Owner

@Jasdfgh commented on GitHub (Mar 30, 2026):

如何了?后来降了 context size 或者换了小量化版本之后有好转吗?

<!-- gh-comment-id:4151823089 --> @Jasdfgh commented on GitHub (Mar 30, 2026): 如何了?后来降了 context size 或者换了小量化版本之后有好转吗?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35286