[GH-ISSUE #14731] 有显卡有驱动就是无法调用 #35286

New Issue

GiteaMirror · 2026-04-22T19:40:58-05:00

GiteaMirror commented

2026-04-22 19:40:58 -05:00

Originally created by @kittyzero520 on GitHub (Mar 9, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14731

What is the issue?

无法调用显卡

Relevant log output

(base) root@kittyzero:~# ollama ps
NAME              ID              SIZE      PROCESSOR    CONTEXT    UNTIL              
qwen3.5:latest    6488c96fa5fa    8.5 GB    100% CPU     4096       3 minutes from now  

(base) root@kittyzero:~# rocminfo | grep gfx
  Name:                    gfx1100                            
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Name:                    amdgcn-amd-amdhsa--gfx11-generic   
(base) root@kittyzero:~# dpkg -l|grep rocm
ii  rocm                                             7.2.0.70200-43~24.04                     amd64        Radeon Open Compute (ROCm) software stack meta package
ii  rocm-cmake                                       0.14.0.70200-43~24.04                    amd64        rocm-cmake built using CMake
ii  rocm-core                                        7.2.0.70200-43~24.04                     amd64        ROCm Runtime software stack
ii  rocm-dbgapi                                      0.77.4.70200-43~24.04                    amd64        Library to provide AMD GPU debugger API
ii  rocm-debug-agent                                 2.1.0.70200-43~24.04                     amd64        Radeon Open Compute Debug Agent (ROCdebug-agent)
ii  rocm-developer-tools                             7.2.0.70200-43~24.04                     amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-device-libs                                 1.0.0.70200-43~24.04                     amd64        Radeon Open Compute - device libraries
ii  rocm-gdb                                         16.3.70200-43~24.04                      amd64        ROCgdb
ii  rocm-hip                                         7.2.0.70200-43~24.04                     amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-llvm                                        22.0.0.26014.70200-43~24.04              amd64        ROCm core compiler
ii  rocm-opencl                                      2.0.0.70200-43~24.04                     amd64        clr built using CMake
ii  rocm-opencl-dev                                  2.0.0.70200-43~24.04                     amd64        clr built using CMake
ii  rocm-opencl-sdk                                  7.2.0.70200-43~24.04                     amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-openmp                                      7.2.0.70200-43~24.04                     amd64        Radeon Open Compute (ROCm) OpenMP Software development Kit.
ii  rocm-smi-lib                                     7.8.0.70200-43~24.04                     amd64        AMD System Management libraries
ii  rocminfo

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

v0.17.7

Originally created by @kittyzero520 on GitHub (Mar 9, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14731 ### What is the issue? 无法调用显卡 ### Relevant log output ```shell (base) root@kittyzero:~# ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL qwen3.5:latest 6488c96fa5fa 8.5 GB 100% CPU 4096 3 minutes from now (base) root@kittyzero:~# rocminfo | grep gfx Name: gfx1100 Name: amdgcn-amd-amdhsa--gfx1100 Name: amdgcn-amd-amdhsa--gfx11-generic (base) root@kittyzero:~# dpkg -l|grep rocm ii rocm 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) software stack meta package ii rocm-cmake 0.14.0.70200-43~24.04 amd64 rocm-cmake built using CMake ii rocm-core 7.2.0.70200-43~24.04 amd64 ROCm Runtime software stack ii rocm-dbgapi 0.77.4.70200-43~24.04 amd64 Library to provide AMD GPU debugger API ii rocm-debug-agent 2.1.0.70200-43~24.04 amd64 Radeon Open Compute Debug Agent (ROCdebug-agent) ii rocm-developer-tools 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) Runtime software stack ii rocm-device-libs 1.0.0.70200-43~24.04 amd64 Radeon Open Compute - device libraries ii rocm-gdb 16.3.70200-43~24.04 amd64 ROCgdb ii rocm-hip 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) Runtime software stack ii rocm-llvm 22.0.0.26014.70200-43~24.04 amd64 ROCm core compiler ii rocm-opencl 2.0.0.70200-43~24.04 amd64 clr built using CMake ii rocm-opencl-dev 2.0.0.70200-43~24.04 amd64 clr built using CMake ii rocm-opencl-sdk 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) Runtime software stack ii rocm-openmp 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) OpenMP Software development Kit. ii rocm-smi-lib 7.8.0.70200-43~24.04 amd64 AMD System Management libraries ii rocminfo ``` ### OS Linux ### GPU AMD ### CPU Intel ### Ollama version v0.17.7

GiteaMirror added the bug label 2026-04-22 19:40:58 -05:00

GiteaMirror commented

2026-04-22 19:40:59 -05:00

@kittyzero520 commented on GitHub (Mar 9, 2026):

@kittyzero520 commented on GitHub (Mar 9, 2026): <img width="2396" height="952" alt="Image" src="https://github.com/user-attachments/assets/ab301be4-6f19-4452-be1c-48efed8f84b3" />

GiteaMirror commented

2026-04-22 19:40:59 -05:00

@kittyzero520 commented on GitHub (Mar 9, 2026):

@kittyzero520 commented on GitHub (Mar 9, 2026): <img width="1796" height="100" alt="Image" src="https://github.com/user-attachments/assets/253be2d8-384a-4047-85a2-49d9a3e88ca3" />

GiteaMirror commented

2026-04-22 19:41:01 -05:00

@rick-github commented on GitHub (Mar 9, 2026):

What GPU card?

@rick-github commented on GitHub (Mar 9, 2026): What GPU card?

GiteaMirror commented

2026-04-22 19:41:01 -05:00

@kittyzero520 commented on GitHub (Mar 9, 2026):

What GPU card?

amd 7900xtx 24G

(base) root@kittyzero:~# rocminfo | grep gfx
Name: gfx1100
Name: amdgcn-amd-amdhsa--gfx1100
Name: amdgcn-amd-amdhsa--gfx11-generic

@kittyzero520 commented on GitHub (Mar 9, 2026): > What GPU card? amd 7900xtx 24G (base) root@kittyzero:~# rocminfo | grep gfx Name: gfx1100 Name: amdgcn-amd-amdhsa--gfx1100 Name: amdgcn-amd-amdhsa--gfx11-generic

GiteaMirror commented

2026-04-22 19:41:02 -05:00

@rick-github commented on GitHub (Mar 9, 2026):

Try the Vulkan accelerator.

@rick-github commented on GitHub (Mar 9, 2026): Try the [Vulkan](https://docs.ollama.com/gpu#vulkan-gpu-support) accelerator.

GiteaMirror commented

2026-04-22 19:41:04 -05:00

@kittyzero520 commented on GitHub (Mar 9, 2026):

What GPU card?

time=2026-03-09T23:06:45.482+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33345"
time=2026-03-09T23:06:45.549+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44173"
time=2026-03-09T23:06:49.208+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1"
time=2026-03-09T23:06:49.208+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36395"
time=2026-03-09T23:06:49.251+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42499"
time=2026-03-09T23:07:19.252+08:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:GPU-6c19dc557b10e676]" error="failed to finish discovery before timeout"
time=2026-03-09T23:07:19.252+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="15.5 GiB" available="14.1 GiB"
time=2026-03-09T23:07:19.252+08:00 level=INFO source=routes.go:1763 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096
[GIN] 2026/03/09 - 23:09:16 | 200 | 3.24016ms | 127.0.0.1 | HEAD "/"
[GIN] 2026/03/09 - 23:09:16 | 200 | 789.821µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/03/09 - 23:09:26 | 200 | 28.565µs | 127.0.0.1 | HEAD "/"
[GIN] 2026/03/09 - 23:09:26 | 200 | 394.170705ms | 127.0.0.1 | POST "/api/show"
[GIN] 2026/03/09 - 23:09:27 | 200 | 437.20574ms | 127.0.0.1 | POST "/api/show"
time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 46523"
time=2026-03-09T23:09:28.159+08:00 level=INFO source=sched.go:489 msg="system memory" total="15.5 GiB" free="14.1 GiB" free_swap="14.9 GiB"
time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=33 requested=-1
time=2026-03-09T23:09:28.176+08:00 level=INFO source=runner.go:1429 msg="starting ollama engine"
time=2026-03-09T23:09:28.176+08:00 level=INFO source=runner.go:1464 msg="Server listening on 127.0.0.1:46523"
time=2026-03-09T23:09:28.181+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-09T23:09:28.277+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-sandybridge.so
time=2026-03-09T23:09:28.286+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-03-09T23:09:28.980+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/33 layers to GPU"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="6.1 GiB"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.4 GiB"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="433.7 MiB"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:272 msg="total memory" size="7.9 GiB"
time=2026-03-09T23:09:30.368+08:00 level=INFO source=sched.go:565 msg="loaded runners" count=1
time=2026-03-09T23:09:30.371+08:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-03-09T23:09:30.371+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-09T23:09:46.201+08:00 level=INFO source=server.go:1388 msg="llama runner started in 18.04 seconds"
[GIN] 2026/03/09 - 23:09:46 | 200 | 18.98180016s | 127.0.0.1 | POST "/api/generate"

@kittyzero520 commented on GitHub (Mar 9, 2026): > What GPU card? time=2026-03-09T23:06:45.482+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33345" time=2026-03-09T23:06:45.549+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44173" time=2026-03-09T23:06:49.208+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-09T23:06:49.208+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36395" time=2026-03-09T23:06:49.251+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42499" time=2026-03-09T23:07:19.252+08:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:GPU-6c19dc557b10e676]" error="failed to finish discovery before timeout" time=2026-03-09T23:07:19.252+08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="15.5 GiB" available="14.1 GiB" time=2026-03-09T23:07:19.252+08:00 level=INFO source=routes.go:1763 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 [GIN] 2026/03/09 - 23:09:16 | 200 | 3.24016ms | 127.0.0.1 | HEAD "/" [GIN] 2026/03/09 - 23:09:16 | 200 | 789.821µs | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/09 - 23:09:26 | 200 | 28.565µs | 127.0.0.1 | HEAD "/" [GIN] 2026/03/09 - 23:09:26 | 200 | 394.170705ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/03/09 - 23:09:27 | 200 | 437.20574ms | 127.0.0.1 | POST "/api/show" time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 46523" time=2026-03-09T23:09:28.159+08:00 level=INFO source=sched.go:489 msg="system memory" total="15.5 GiB" free="14.1 GiB" free_swap="14.9 GiB" time=2026-03-09T23:09:28.159+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=33 requested=-1 time=2026-03-09T23:09:28.176+08:00 level=INFO source=runner.go:1429 msg="starting ollama engine" time=2026-03-09T23:09:28.176+08:00 level=INFO source=runner.go:1464 msg="Server listening on 127.0.0.1:46523" time=2026-03-09T23:09:28.181+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-09T23:09:28.277+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-sandybridge.so time=2026-03-09T23:09:28.286+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-03-09T23:09:28.980+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-09T23:09:30.368+08:00 level=INFO source=runner.go:1302 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2026-03-09T23:09:30.368+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/33 layers to GPU" time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="6.1 GiB" time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.4 GiB" time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="433.7 MiB" time=2026-03-09T23:09:30.368+08:00 level=INFO source=device.go:272 msg="total memory" size="7.9 GiB" time=2026-03-09T23:09:30.368+08:00 level=INFO source=sched.go:565 msg="loaded runners" count=1 time=2026-03-09T23:09:30.371+08:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-03-09T23:09:30.371+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" time=2026-03-09T23:09:46.201+08:00 level=INFO source=server.go:1388 msg="llama runner started in 18.04 seconds" [GIN] 2026/03/09 - 23:09:46 | 200 | 18.98180016s | 127.0.0.1 | POST "/api/generate"

GiteaMirror commented

2026-04-22 19:41:06 -05:00

@kittyzero520 commented on GitHub (Mar 10, 2026):

Try the Vulkan accelerator.

The problem is that I’ve never opened this before, and it was normal before. Why isn’t it working all of a sudden?

@kittyzero520 commented on GitHub (Mar 10, 2026): > Try the [Vulkan](https://docs.ollama.com/gpu#vulkan-gpu-support) accelerator. The problem is that I’ve never opened this before, and it was normal before. Why isn’t it working all of a sudden?

GiteaMirror commented

2026-04-22 19:41:07 -05:00

@kittyzero520 commented on GitHub (Mar 11, 2026):

(base) root@kittyzero:# export LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm-7.2.0/lib:$LD_LIBRARY_PATH
(base) root@kittyzero:# export HSA_OVERRIDE_GFX_VERSION=11.0.0
(base) root@kittyzero:# export HIP_VISIBLE_DEVICES=0
(base) root@kittyzero:# pkill -9 ollama
(base) root@kittyzero:# ollama serve 2>&1 &
[1] 25047
(base) root@kittyzero:# sleep 5time=2026-03-11T13:14:02.047+08:00 level=INFO source=routes.go:1658 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-03-11T13:14:02.047+08:00 level=INFO source=routes.go:1660 msg="Ollama cloud disabled: false"
time=2026-03-11T13:14:02.048+08:00 level=INFO source=images.go:477 msg="total blobs: 7"
time=2026-03-11T13:14:02.048+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-11T13:14:02.049+08:00 level=INFO source=routes.go:1713 msg="Listening on 127.0.0.1:11434 (version 0.17.7)"
time=2026-03-11T13:14:02.049+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HIP_VISIBLE_DEVICES=0
time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.0.0
time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again"
time=2026-03-11T13:14:02.050+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35559"
time=2026-03-11T13:14:02.081+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45679"
time=2026-03-11T13:14:02.112+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34845"
time=2026-03-11T13:14:04.348+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1"
time=2026-03-11T13:14:04.348+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33835"

@kittyzero520 commented on GitHub (Mar 11, 2026): (base) root@kittyzero:~# export LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm-7.2.0/lib:$LD_LIBRARY_PATH (base) root@kittyzero:~# export HSA_OVERRIDE_GFX_VERSION=11.0.0 (base) root@kittyzero:~# export HIP_VISIBLE_DEVICES=0 (base) root@kittyzero:~# pkill -9 ollama (base) root@kittyzero:~# ollama serve 2>&1 & [1] 25047 (base) root@kittyzero:~# sleep 5time=2026-03-11T13:14:02.047+08:00 level=INFO source=routes.go:1658 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-03-11T13:14:02.047+08:00 level=INFO source=routes.go:1660 msg="Ollama cloud disabled: false" time=2026-03-11T13:14:02.048+08:00 level=INFO source=images.go:477 msg="total blobs: 7" time=2026-03-11T13:14:02.048+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-11T13:14:02.049+08:00 level=INFO source=routes.go:1713 msg="Listening on 127.0.0.1:11434 (version 0.17.7)" time=2026-03-11T13:14:02.049+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HIP_VISIBLE_DEVICES=0 time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.0.0 time=2026-03-11T13:14:02.049+08:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again" time=2026-03-11T13:14:02.050+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35559" time=2026-03-11T13:14:02.081+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45679" time=2026-03-11T13:14:02.112+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34845" time=2026-03-11T13:14:04.348+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-11T13:14:04.348+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33835"

GiteaMirror commented

2026-04-22 19:41:08 -05:00

@Jasdfgh commented on GitHub (Mar 12, 2026):

7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的（LLVM target gfx1100 + Radeon RX 那边写了），GPU 发现超时不太正常。

你这几个环境变量不太对：

HSA_OVERRIDE_GFX_VERSION的话，是给不在支持列表里的 GPU 用的，让它伪装成一个接近的架构。gfx1100 本身就在列表里，不需要 override。设了可能导致匹配出错？
HIP_VISIBLE_DEVICES的话，你只有一个 GPU，好像不需要设。而且，ollama 的文档（https://docs.ollama.com/gpu）里，好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES，不是 HIP_VISIBLE_DEVICES。
而且，ollama 日志也在警告这些 override 可能干扰检测。

我建议先把这些 env vars 全部 unset，重新启动试试。另外 0.17.8 RC 刚出，更新了 Linux ROCm v7 支持，安装方法在这：https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。

还有问题可以说下是什么时候开始不行的，中间有没有更新过驱动或者内核。

@Jasdfgh commented on GitHub (Mar 12, 2026): 7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的（LLVM target gfx1100 + Radeon RX 那边写了），GPU 发现超时不太正常。你这几个环境变量不太对： - HSA_OVERRIDE_GFX_VERSION的话，是给不在支持列表里的 GPU 用的，让它伪装成一个接近的架构。gfx1100 本身就在列表里，不需要 override。设了可能导致匹配出错？ - HIP_VISIBLE_DEVICES的话，你只有一个 GPU，好像不需要设。而且，ollama 的文档（https://docs.ollama.com/gpu）里，好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES，不是 HIP_VISIBLE_DEVICES。 - 而且，ollama 日志也在警告这些 override 可能干扰检测。我建议先把这些 env vars 全部 unset，重新启动试试。另外 0.17.8 RC 刚出，更新了 Linux ROCm v7 支持，安装方法在这：https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。还有问题可以说下是什么时候开始不行的，中间有没有更新过驱动或者内核。

GiteaMirror commented

2026-04-22 19:41:09 -05:00

@kittyzero520 commented on GitHub (Mar 13, 2026):

7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的（LLVM target gfx1100 + Radeon RX 那边写了），GPU 发现超时不太正常。

你这几个环境变量不太对：

HSA_OVERRIDE_GFX_VERSION的话，是给不在支持列表里的 GPU 用的，让它伪装成一个接近的架构。gfx1100 本身就在列表里，不需要 override。设了可能导致匹配出错？

HIP_VISIBLE_DEVICES的话，你只有一个 GPU，好像不需要设。而且，ollama 的文档（https://docs.ollama.com/gpu）里，好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES，不是 HIP_VISIBLE_DEVICES。

而且，ollama 日志也在警告这些 override 可能干扰检测。

我建议先把这些 env vars 全部 unset，重新启动试试。另外 0.17.8 RC 刚出，更新了 Linux ROCm v7 支持，安装方法在这：https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。

还有问题可以说下是什么时候开始不行的，中间有没有更新过驱动或者内核。

发现一个问题，如果把增加下面参数，Ollama可以使用显卡；你的这个方案，我试下，感谢你的支撑！

7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的（LLVM target gfx1100 + Radeon RX 那边写了），GPU 发现超时不太正常。

你这几个环境变量不太对：

HSA_OVERRIDE_GFX_VERSION的话，是给不在支持列表里的 GPU 用的，让它伪装成一个接近的架构。gfx1100 本身就在列表里，不需要 override。设了可能导致匹配出错？

HIP_VISIBLE_DEVICES的话，你只有一个 GPU，好像不需要设。而且，ollama 的文档（https://docs.ollama.com/gpu）里，好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES，不是 HIP_VISIBLE_DEVICES。

而且，ollama 日志也在警告这些 override 可能干扰检测。

我建议先把这些 env vars 全部 unset，重新启动试试。另外 0.17.8 RC 刚出，更新了 Linux ROCm v7 支持，安装方法在这：https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。

还有问题可以说下是什么时候开始不行的，中间有没有更新过驱动或者内核。

升级到最新的17就不行了

@kittyzero520 commented on GitHub (Mar 13, 2026): > 7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的（LLVM target gfx1100 + Radeon RX 那边写了），GPU 发现超时不太正常。 > > 你这几个环境变量不太对： > > * HSA_OVERRIDE_GFX_VERSION的话，是给不在支持列表里的 GPU 用的，让它伪装成一个接近的架构。gfx1100 本身就在列表里，不需要 override。设了可能导致匹配出错？ > * HIP_VISIBLE_DEVICES的话，你只有一个 GPU，好像不需要设。而且，ollama 的文档（[https://docs.ollama.com/gpu）里，](https://docs.ollama.com/gpu%EF%BC%89%E9%87%8C%EF%BC%8C) 好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES，不是 HIP_VISIBLE_DEVICES。 > * 而且，ollama 日志也在警告这些 override 可能干扰检测。 > > 我建议先把这些 env vars 全部 unset，重新启动试试。另外 0.17.8 RC 刚出，更新了 Linux ROCm v7 支持，安装方法在这：[https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。](https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions%E3%80%82) > > 还有问题可以说下是什么时候开始不行的，中间有没有更新过驱动或者内核。发现一个问题，如果把增加下面参数，Ollama可以使用显卡；你的这个方案，我试下，感谢你的支撑！ <img width="1196" height="956" alt="Image" src="https://github.com/user-attachments/assets/926ce6bb-b2a7-4499-af6e-0c041f8dd9fa" /> > 7900 XTX (gfx1100) 在 ollama的Linux 支持列表(https://docs.ollama.com/gpu) 里是原生支持的（LLVM target gfx1100 + Radeon RX 那边写了），GPU 发现超时不太正常。 > > 你这几个环境变量不太对： > > * HSA_OVERRIDE_GFX_VERSION的话，是给不在支持列表里的 GPU 用的，让它伪装成一个接近的架构。gfx1100 本身就在列表里，不需要 override。设了可能导致匹配出错？ > * HIP_VISIBLE_DEVICES的话，你只有一个 GPU，好像不需要设。而且，ollama 的文档（[https://docs.ollama.com/gpu）里，](https://docs.ollama.com/gpu%EF%BC%89%E9%87%8C%EF%BC%8C) 好像AMD GPU 选择用的是 ROCR_VISIBLE_DEVICES，不是 HIP_VISIBLE_DEVICES。 > * 而且，ollama 日志也在警告这些 override 可能干扰检测。 > > 我建议先把这些 env vars 全部 unset，重新启动试试。另外 0.17.8 RC 刚出，更新了 Linux ROCm v7 支持，安装方法在这：[https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions。](https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions%E3%80%82) > > 还有问题可以说下是什么时候开始不行的，中间有没有更新过驱动或者内核。升级到最新的17就不行了

GiteaMirror commented

2026-04-22 19:41:10 -05:00

@Jasdfgh commented on GitHub (Mar 15, 2026):

看到了，你那两个命令，是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama？看看是不是 0.17 引入的问题。

@Jasdfgh commented on GitHub (Mar 15, 2026): 看到了，你那两个命令，是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama？看看是不是 0.17 引入的问题。

GiteaMirror commented

2026-04-22 19:41:11 -05:00

@kittyzero520 commented on GitHub (Mar 16, 2026):

看到了，你那两个命令，是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama？看看是不是 0.17 引入的问题。

1.之前版本不记得了
2.审计0.18版本可以加载qwen3.5 9b模型了，但是27B直接让ssh丢失连接，服务器无法访问了

@kittyzero520 commented on GitHub (Mar 16, 2026): > 看到了，你那两个命令，是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama？看看是不是 0.17 引入的问题。 1.之前版本不记得了 2.审计0.18版本可以加载qwen3.5 9b模型了，但是27B直接让ssh丢失连接，服务器无法访问了 <img width="2400" height="1560" alt="Image" src="https://github.com/user-attachments/assets/ea08432d-e024-422c-8804-e0897018e02e" /> <img width="2400" height="1560" alt="Image" src="https://github.com/user-attachments/assets/73fee515-b800-40e4-9f56-3c56cc5fd576" />

GiteaMirror commented

2026-04-22 19:41:12 -05:00

@kittyzero520 commented on GitHub (Mar 16, 2026):

看到了，你那两个命令，是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama？看看是不是 0.17 引入的问题。

1.之前版本不记得了 2.审计0.18版本可以加载qwen3.5 9b模型了，但是27B直接让ssh丢失连接，服务器无法访问了，我看ram 内存16G 突然吃满了

@kittyzero520 commented on GitHub (Mar 16, 2026): > > 看到了，你那两个命令，是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama？看看是不是 0.17 引入的问题。 > > 1.之前版本不记得了 2.审计0.18版本可以加载qwen3.5 9b模型了，但是27B直接让ssh丢失连接，服务器无法访问了，我看ram 内存16G 突然吃满了 > > <img alt="Image" width="2000" height="1560" src="https://private-user-images.githubusercontent.com/33313886/563933945-ea08432d-e024-422c-8804-e0897018e02e.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzM2Mzk4OTIsIm5iZiI6MTc3MzYzOTU5MiwicGF0aCI6Ii8zMzMxMzg4Ni81NjM5MzM5NDUtZWEwODQzMmQtZTAyNC00MjJjLTg4MDQtZTA4OTcwMThlMDJlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAzMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMzE2VDA1Mzk1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE2MTg0M2FhM2NmNTNlZTFmNDNkYzY5MjM3ZjhmNDJmM2I3NjFhZTRmMjczNTJkMjc2MDExOTBlZTVlZjBiMTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Bsr23D7TFPQMovcJpe56d5QXvtZGt78U9JbiYzeogmc"> <img alt="Image" width="2000" height="1560" src="https://private-user-images.githubusercontent.com/33313886/563934004-73fee515-b800-40e4-9f56-3c56cc5fd576.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzM2Mzk4OTIsIm5iZiI6MTc3MzYzOTU5MiwicGF0aCI6Ii8zMzMxMzg4Ni81NjM5MzQwMDQtNzNmZWU1MTUtYjgwMC00MGU0LTlmNTYtM2M1NmNjNWZkNTc2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAzMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMzE2VDA1Mzk1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZkN2VhNWZiNzFlMWY5MGY4ZGM3ZTZlOGQ1NTQzZTc3YTkwYjY1MmU0MGZkODEzOGIxMTg0NTIwYTEwZWZmZDMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.h8l7wd7lUBY_Z4rV7JgytwYnRs85wSJgnTe01Xl7mRU">

GiteaMirror commented

2026-04-22 19:41:13 -05:00

@kittyzero520 commented on GitHub (Mar 16, 2026):

看到了，你那两个命令，是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama？看看是不是 0.17 引入的问题。

1.之前版本不记得了
2.现在0.18版本可以加载qwen3.5 9b模型了，但是27B直接让ssh丢失连接，服务器无法访问了，而且ram 内存16G 突然吃满了

@kittyzero520 commented on GitHub (Mar 16, 2026): > > > 看到了，你那两个命令，是切到 Vulkan 绕过去了。这能用但走的不是 ROCm 路径。ROCm 那边 GPU discovery 超时的问题还在。你之前 ROCm 能正常用的时候是哪个版本的 ollama？看看是不是 0.17 引入的问题。 1.之前版本不记得了 2.现在0.18版本可以加载qwen3.5 9b模型了，但是27B直接让ssh丢失连接，服务器无法访问了，而且ram 内存16G 突然吃满了 > > <img alt="Image" width="2000" height="1560" src="https://private-user-images.githubusercontent.com/33313886/563933945-ea08432d-e024-422c-8804-e0897018e02e.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzM2Mzk4OTIsIm5iZiI6MTc3MzYzOTU5MiwicGF0aCI6Ii8zMzMxMzg4Ni81NjM5MzM5NDUtZWEwODQzMmQtZTAyNC00MjJjLTg4MDQtZTA4OTcwMThlMDJlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAzMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMzE2VDA1Mzk1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE2MTg0M2FhM2NmNTNlZTFmNDNkYzY5MjM3ZjhmNDJmM2I3NjFhZTRmMjczNTJkMjc2MDExOTBlZTVlZjBiMTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Bsr23D7TFPQMovcJpe56d5QXvtZGt78U9JbiYzeogmc"> <img alt="Image" width="2000" height="1560" src="https://private-user-images.githubusercontent.com/33313886/563934004-73fee515-b800-40e4-9f56-3c56cc5fd576.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzM2Mzk4OTIsIm5iZiI6MTc3MzYzOTU5MiwicGF0aCI6Ii8zMzMxMzg4Ni81NjM5MzQwMDQtNzNmZWU1MTUtYjgwMC00MGU0LTlmNTYtM2M1NmNjNWZkNTc2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAzMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMzE2VDA1Mzk1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZkN2VhNWZiNzFlMWY5MGY4ZGM3ZTZlOGQ1NTQzZTc3YTkwYjY1MmU0MGZkODEzOGIxMTg0NTIwYTEwZWZmZDMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.h8l7wd7lUBY_Z4rV7JgytwYnRs85wSJgnTe01Xl7mRU">

GiteaMirror commented

2026-04-22 19:41:14 -05:00

@Jasdfgh commented on GitHub (Mar 18, 2026):

从 ollama ps 看 GPU 是在用的（100% GPU），9B 和 8B 都正常。27B 的问题，我觉得应该是 VRAM 不够了，模型本身 24GB 加上 32768 context 的 KV cache 超过了你 24G 显存，溢出到系统内存，你系统 RAM 只有 15.5G 所以直接 OOM 了。

试一下降 context，比如 OLLAMA_CONTEXT_LENGTH=4096 ollama run qwen3.5:27b，或者用小一点的量化版本？

@Jasdfgh commented on GitHub (Mar 18, 2026): 从 ollama ps 看 GPU 是在用的（100% GPU），9B 和 8B 都正常。27B 的问题，我觉得应该是 VRAM 不够了，模型本身 24GB 加上 32768 context 的 KV cache 超过了你 24G 显存，溢出到系统内存，你系统 RAM 只有 15.5G 所以直接 OOM 了。试一下降 context，比如 OLLAMA_CONTEXT_LENGTH=4096 ollama run qwen3.5:27b，或者用小一点的量化版本？

GiteaMirror commented

2026-04-22 19:41:15 -05:00

@kittyzero520 commented on GitHub (Mar 18, 2026):

从 ollama ps 看 GPU 是在用的（100% GPU），9B 和 8B 都正常。27B 的问题，我觉得应该是 VRAM 不够了，模型本身 24GB 加上 32768 context 的 KV cache 超过了你 24G 显存，溢出到系统内存，你系统 RAM 只有 15.5G 所以直接 OOM 了。

试一下降 context，比如 OLLAMA_CONTEXT_LENGTH=4096 ollama run qwen3.5:27b，或者用小一点的量化版本？

我试一下

@kittyzero520 commented on GitHub (Mar 18, 2026): > 从 ollama ps 看 GPU 是在用的（100% GPU），9B 和 8B 都正常。27B 的问题，我觉得应该是 VRAM 不够了，模型本身 24GB 加上 32768 context 的 KV cache 超过了你 24G 显存，溢出到系统内存，你系统 RAM 只有 15.5G 所以直接 OOM 了。 > > 试一下降 context，比如 OLLAMA_CONTEXT_LENGTH=4096 ollama run qwen3.5:27b，或者用小一点的量化版本？我试一下

GiteaMirror commented

2026-04-22 19:41:17 -05:00

@Jasdfgh commented on GitHub (Mar 30, 2026):

如何了？后来降了 context size 或者换了小量化版本之后有好转吗？

@Jasdfgh commented on GitHub (Mar 30, 2026): 如何了？后来降了 context size 或者换了小量化版本之后有好转吗？

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#35286