[GH-ISSUE #10516] No response from Ollama although GPU is at 100% #53431

Open
opened 2026-04-29 03:08:36 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @slashgob on GitHub (May 1, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10516

What is the issue?

Trying to run a small model (llama3.2 2gb), seems to load fine but I get no response from ollama. Using btop I can see GPU is at 100% usage, but still no response from chat message. It just stays at 100% usage.

Nobara OS (Fedora spin)
CPU: AMD 5950x
Ram: 32GiB
GPU: AMD 6900 XT

Installed latest Ollama with curl script.

ROCm installed...

─❯ dnf list --installed | grep 'rocm'
hipcc.x86_64 18-10.rocm6.2.1.fc41 nobara
hsakmt.x86_64 1.0.6-46.rocm6.2.1.fc41 nobara
hsakmt-devel.x86_64 1.0.6-46.rocm6.2.1.fc41 nobara
python3-torch-rocm-gfx9.x86_64 2.4.0-10.fc41 nobara
python3-torchaudio-rocm-gfx9.x86_64 2.4.1-2.fc41 nobara
rocm-clinfo.x86_64 6.2.1-5.fc41 nobara
rocm-cmake.noarch 6.2.0-1.fc41 nobara
rocm-comgr.x86_64 18-10.rocm6.2.1.fc41 nobara
rocm-comgr-devel.x86_64 18-10.rocm6.2.1.fc41 nobara
rocm-core.x86_64 6.2.0-1.fc41 nobara
rocm-device-libs.x86_64 18-10.rocm6.2.1.fc41 nobara
rocm-hip.x86_64 6.2.1-5.fc41 nobara
rocm-hip-devel.x86_64 6.2.1-5.fc41 nobara
rocm-meta.x86_64 6.2.1-7.copr.fc41 nobara-updates
rocm-opencl.x86_64 6.2.1-5.fc41 nobara
rocm-opencl-devel.x86_64 6.2.1-5.fc41 nobara
rocm-rpm-macros.x86_64 6.2-1.fc41 nobara
rocm-rpm-macros-modules.x86_64 6.2-1.fc41 nobara
rocm-runtime.x86_64 6.2.1-2.fc41 nobara
rocm-runtime-devel.x86_64 6.2.1-2.fc41 nobara
rocm-smi.x86_64 6.2.1-1.fc41 nobara
rocminfo.x86_64 6.2.1-1.fc41 nobara

Relevant log output

May 01 10:07:17 nobara-pc ollama[10218]: print_info: LF token         = 198 'Ċ'
May 01 10:07:17 nobara-pc ollama[10218]: print_info: EOG token        = 128008 '<|eom_id|>'
May 01 10:07:17 nobara-pc ollama[10218]: print_info: EOG token        = 128009 '<|eot_id|>'
May 01 10:07:17 nobara-pc ollama[10218]: print_info: max token length = 256
May 01 10:07:17 nobara-pc ollama[10218]: load_tensors: loading model tensors, this can take a while... (mmap = true)
May 01 10:07:17 nobara-pc ollama[10218]: load_tensors: offloading 28 repeating layers to GPU
May 01 10:07:17 nobara-pc ollama[10218]: load_tensors: offloading output layer to GPU
May 01 10:07:17 nobara-pc ollama[10218]: load_tensors: offloaded 29/29 layers to GPU
May 01 10:07:17 nobara-pc ollama[10218]: load_tensors:        ROCm0 model buffer size =  1918.35 MiB
May 01 10:07:17 nobara-pc ollama[10218]: load_tensors:   CPU_Mapped model buffer size =   308.23 MiB
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: constructing llama_context
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_seq_max     = 4
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_ctx         = 8192
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_ctx_per_seq = 2048
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_batch       = 2048
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_ubatch      = 512
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: causal_attn   = 1
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: flash_attn    = 0
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: freq_base     = 500000.0
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: freq_scale    = 1
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
May 01 10:07:53 nobara-pc ollama[10218]: llama_context:  ROCm_Host  output buffer size =     2.00 MiB
May 01 10:07:53 nobara-pc ollama[10218]: init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1
May 01 10:07:53 nobara-pc ollama[10218]: init:      ROCm0 KV buffer size =   896.00 MiB
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: KV self size  =  896.00 MiB, K (f16):  448.00 MiB, V (f16):  448.00 MiB
May 01 10:07:53 nobara-pc ollama[10218]: llama_context:      ROCm0 compute buffer size =   424.00 MiB
May 01 10:07:53 nobara-pc ollama[10218]: llama_context:  ROCm_Host compute buffer size =    22.01 MiB
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: graph nodes  = 958
May 01 10:07:53 nobara-pc ollama[10218]: llama_context: graph splits = 2
May 01 10:07:54 nobara-pc ollama[10218]: time=2025-05-01T10:07:54.048+01:00 level=INFO source=server.go:619 msg="llama runner started in 37.87 seconds"
May 01 10:07:54 nobara-pc ollama[10218]: [GIN] 2025/05/01 - 10:07:54 | 200 | 38.161397178s |       127.0.0.1 | POST     "/api/generate"
May 01 10:08:41 nobara-pc ollama[10218]: time=2025-05-01T10:08:41.548+01:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 01 10:27:36 nobara-pc ollama[10218]: [GIN] 2025/05/01 - 10:27:36 | 200 |        18m54s |       127.0.0.1 | POST     "/api/chat"
May 01 10:27:42 nobara-pc ollama[10218]: time=2025-05-01T10:27:42.764+01:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
May 01 10:27:45 nobara-pc ollama[10218]: [GIN] 2025/05/01 - 10:27:45 | 200 |  2.613102748s |       127.0.0.1 | POST     "/api/chat"
May 01 10:28:03 nobara-pc ollama[10218]: [GIN] 2025/05/01 - 10:28:03 | 200 |       23.79µs |       127.0.0.1 | HEAD     "/"
May 01 10:28:03 nobara-pc ollama[10218]: [GIN] 2025/05/01 - 10:28:03 | 200 |      70.242µs |       127.0.0.1 | GET      "/api/ps"

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.6.6

Originally created by @slashgob on GitHub (May 1, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10516 ### What is the issue? Trying to run a small model (llama3.2 2gb), seems to load fine but I get no response from ollama. Using btop I can see GPU is at 100% usage, but still no response from chat message. It just stays at 100% usage. Nobara OS (Fedora spin) CPU: AMD 5950x Ram: 32GiB GPU: AMD 6900 XT Installed latest Ollama with curl script. ROCm installed... ─❯ dnf list --installed | grep 'rocm' hipcc.x86_64 18-10.rocm6.2.1.fc41 nobara hsakmt.x86_64 1.0.6-46.rocm6.2.1.fc41 nobara hsakmt-devel.x86_64 1.0.6-46.rocm6.2.1.fc41 nobara python3-torch-rocm-gfx9.x86_64 2.4.0-10.fc41 nobara python3-torchaudio-rocm-gfx9.x86_64 2.4.1-2.fc41 nobara rocm-clinfo.x86_64 6.2.1-5.fc41 nobara rocm-cmake.noarch 6.2.0-1.fc41 nobara rocm-comgr.x86_64 18-10.rocm6.2.1.fc41 nobara rocm-comgr-devel.x86_64 18-10.rocm6.2.1.fc41 nobara rocm-core.x86_64 6.2.0-1.fc41 nobara rocm-device-libs.x86_64 18-10.rocm6.2.1.fc41 nobara rocm-hip.x86_64 6.2.1-5.fc41 nobara rocm-hip-devel.x86_64 6.2.1-5.fc41 nobara rocm-meta.x86_64 6.2.1-7.copr.fc41 nobara-updates rocm-opencl.x86_64 6.2.1-5.fc41 nobara rocm-opencl-devel.x86_64 6.2.1-5.fc41 nobara rocm-rpm-macros.x86_64 6.2-1.fc41 nobara rocm-rpm-macros-modules.x86_64 6.2-1.fc41 nobara rocm-runtime.x86_64 6.2.1-2.fc41 nobara rocm-runtime-devel.x86_64 6.2.1-2.fc41 nobara rocm-smi.x86_64 6.2.1-1.fc41 nobara rocminfo.x86_64 6.2.1-1.fc41 nobara ### Relevant log output ```shell May 01 10:07:17 nobara-pc ollama[10218]: print_info: LF token = 198 'Ċ' May 01 10:07:17 nobara-pc ollama[10218]: print_info: EOG token = 128008 '<|eom_id|>' May 01 10:07:17 nobara-pc ollama[10218]: print_info: EOG token = 128009 '<|eot_id|>' May 01 10:07:17 nobara-pc ollama[10218]: print_info: max token length = 256 May 01 10:07:17 nobara-pc ollama[10218]: load_tensors: loading model tensors, this can take a while... (mmap = true) May 01 10:07:17 nobara-pc ollama[10218]: load_tensors: offloading 28 repeating layers to GPU May 01 10:07:17 nobara-pc ollama[10218]: load_tensors: offloading output layer to GPU May 01 10:07:17 nobara-pc ollama[10218]: load_tensors: offloaded 29/29 layers to GPU May 01 10:07:17 nobara-pc ollama[10218]: load_tensors: ROCm0 model buffer size = 1918.35 MiB May 01 10:07:17 nobara-pc ollama[10218]: load_tensors: CPU_Mapped model buffer size = 308.23 MiB May 01 10:07:53 nobara-pc ollama[10218]: llama_context: constructing llama_context May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_seq_max = 4 May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_ctx = 8192 May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_ctx_per_seq = 2048 May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_batch = 2048 May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_ubatch = 512 May 01 10:07:53 nobara-pc ollama[10218]: llama_context: causal_attn = 1 May 01 10:07:53 nobara-pc ollama[10218]: llama_context: flash_attn = 0 May 01 10:07:53 nobara-pc ollama[10218]: llama_context: freq_base = 500000.0 May 01 10:07:53 nobara-pc ollama[10218]: llama_context: freq_scale = 1 May 01 10:07:53 nobara-pc ollama[10218]: llama_context: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized May 01 10:07:53 nobara-pc ollama[10218]: llama_context: ROCm_Host output buffer size = 2.00 MiB May 01 10:07:53 nobara-pc ollama[10218]: init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1 May 01 10:07:53 nobara-pc ollama[10218]: init: ROCm0 KV buffer size = 896.00 MiB May 01 10:07:53 nobara-pc ollama[10218]: llama_context: KV self size = 896.00 MiB, K (f16): 448.00 MiB, V (f16): 448.00 MiB May 01 10:07:53 nobara-pc ollama[10218]: llama_context: ROCm0 compute buffer size = 424.00 MiB May 01 10:07:53 nobara-pc ollama[10218]: llama_context: ROCm_Host compute buffer size = 22.01 MiB May 01 10:07:53 nobara-pc ollama[10218]: llama_context: graph nodes = 958 May 01 10:07:53 nobara-pc ollama[10218]: llama_context: graph splits = 2 May 01 10:07:54 nobara-pc ollama[10218]: time=2025-05-01T10:07:54.048+01:00 level=INFO source=server.go:619 msg="llama runner started in 37.87 seconds" May 01 10:07:54 nobara-pc ollama[10218]: [GIN] 2025/05/01 - 10:07:54 | 200 | 38.161397178s | 127.0.0.1 | POST "/api/generate" May 01 10:08:41 nobara-pc ollama[10218]: time=2025-05-01T10:08:41.548+01:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 01 10:27:36 nobara-pc ollama[10218]: [GIN] 2025/05/01 - 10:27:36 | 200 | 18m54s | 127.0.0.1 | POST "/api/chat" May 01 10:27:42 nobara-pc ollama[10218]: time=2025-05-01T10:27:42.764+01:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 May 01 10:27:45 nobara-pc ollama[10218]: [GIN] 2025/05/01 - 10:27:45 | 200 | 2.613102748s | 127.0.0.1 | POST "/api/chat" May 01 10:28:03 nobara-pc ollama[10218]: [GIN] 2025/05/01 - 10:28:03 | 200 | 23.79µs | 127.0.0.1 | HEAD "/" May 01 10:28:03 nobara-pc ollama[10218]: [GIN] 2025/05/01 - 10:28:03 | 200 | 70.242µs | 127.0.0.1 | GET "/api/ps" ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.6.6
GiteaMirror added the bugamdlinux labels 2026-04-29 03:08:42 -05:00
Author
Owner

@mmbossoni commented on GitHub (Jun 1, 2025):

This is a problem with Nobara kernel handheld patch.

<!-- gh-comment-id:2927367493 --> @mmbossoni commented on GitHub (Jun 1, 2025): This is a problem with Nobara kernel handheld patch.
Author
Owner

@dhiltgen commented on GitHub (Jul 5, 2025):

If I had to guess, I'd say there's a bug or problem inside ROCm, maybe the driver, possibly leading the model to get stuck in a loop.

You can try setting AMD_LOG_LEVEL=3 along with OLLAMA_DEBUG=1 to get more verbose logging. dmesg might also have logs from the amdgpu driver.

<!-- gh-comment-id:3040305444 --> @dhiltgen commented on GitHub (Jul 5, 2025): If I had to guess, I'd say there's a bug or problem inside ROCm, maybe the driver, possibly leading the model to get stuck in a loop. You can try setting `AMD_LOG_LEVEL=3` along with `OLLAMA_DEBUG=1` to get more verbose logging. `dmesg` might also have logs from the `amdgpu` driver.
Author
Owner

@mmbossoni commented on GitHub (Jul 6, 2025):

@dhiltgen , you can see logs at https://github.com/ollama/ollama/issues/10919

The problem apparently lies with nobara kernel patch,
18217d5f97/baseos/kernel/6.14/0001-handheld.patch (L357-L358)

That being said, apparently the "doorbell" change is not on 6.15
I'll retest on 6.15.4 that apparently got rid of this change

<!-- gh-comment-id:3041795534 --> @mmbossoni commented on GitHub (Jul 6, 2025): @dhiltgen , you can see logs at https://github.com/ollama/ollama/issues/10919 The problem apparently lies with nobara kernel patch, https://github.com/Nobara-Project/rpm-sources/blob/18217d5f97bdf6b371ac5b4bccf724ad07373cd6/baseos/kernel/6.14/0001-handheld.patch#L357-L358 That being said, apparently the "doorbell" change is not on 6.15 I'll retest on 6.15.4 that apparently got rid of this change
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53431