[GH-ISSUE #11697] Memory access fault with gpt-oss:20b #33501

Closed
opened 2026-04-22 16:15:48 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @Expro on GitHub (Aug 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11697

What is the issue?

Prompting gpt-oss:20b on 0.11.1 results in error:

Memory access fault by GPU node-1 (Agent handle: 0x7f3b0c66f890) on address 0x7f360519e000. Reason: Page not present or supervisor privilege.

Relevant log output

time=2025-08-05T20:35:49.174Z level=INFO source=ggml.go:367 msg="offloading 24 repeating layers to GPU"
time=2025-08-05T20:35:49.174Z level=INFO source=ggml.go:373 msg="offloading output layer to GPU"
time=2025-08-05T20:35:49.174Z level=INFO source=ggml.go:378 msg="offloaded 25/25 layers to GPU"
time=2025-08-05T20:35:49.174Z level=INFO source=ggml.go:381 msg="model weights" buffer=ROCm0 size="11.7 GiB"
time=2025-08-05T20:35:49.174Z level=INFO source=ggml.go:381 msg="model weights" buffer=CPU size="1.1 GiB"
time=2025-08-05T20:35:49.188Z level=INFO source=ggml.go:672 msg="compute graph" backend=ROCm0 buffer_type=ROCm0 size="2.1 GiB"
time=2025-08-05T20:35:49.188Z level=INFO source=ggml.go:672 msg="compute graph" backend=CPU buffer_type=CPU size="5.6 MiB"
time=2025-08-05T20:36:11.960Z level=INFO source=server.go:637 msg="llama runner started in 25.11 seconds"
Memory access fault by GPU node-1 (Agent handle: 0x7fedb466f890) on address 0x7fec0d423000. Reason: Page not present or supervisor privilege.
time=2025-08-05T20:36:34.898Z level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:33281/completion\": EOF"
[GIN] 2025/08/05 - 20:36:34 | 200 | 48.888797725s |      10.244.2.0 | POST     "/api/chat"
time=2025-08-05T20:36:35.320Z level=WARN source=server.go:517 msg="llama runner process no longer running" sys=134 string="signal: aborted (core dumped)"

OS

Docker

GPU

AMD

CPU

AMD

Ollama version

0.11.1

Originally created by @Expro on GitHub (Aug 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11697 ### What is the issue? Prompting gpt-oss:20b on 0.11.1 results in error: Memory access fault by GPU node-1 (Agent handle: 0x7f3b0c66f890) on address 0x7f360519e000. Reason: Page not present or supervisor privilege. ### Relevant log output ```shell time=2025-08-05T20:35:49.174Z level=INFO source=ggml.go:367 msg="offloading 24 repeating layers to GPU" time=2025-08-05T20:35:49.174Z level=INFO source=ggml.go:373 msg="offloading output layer to GPU" time=2025-08-05T20:35:49.174Z level=INFO source=ggml.go:378 msg="offloaded 25/25 layers to GPU" time=2025-08-05T20:35:49.174Z level=INFO source=ggml.go:381 msg="model weights" buffer=ROCm0 size="11.7 GiB" time=2025-08-05T20:35:49.174Z level=INFO source=ggml.go:381 msg="model weights" buffer=CPU size="1.1 GiB" time=2025-08-05T20:35:49.188Z level=INFO source=ggml.go:672 msg="compute graph" backend=ROCm0 buffer_type=ROCm0 size="2.1 GiB" time=2025-08-05T20:35:49.188Z level=INFO source=ggml.go:672 msg="compute graph" backend=CPU buffer_type=CPU size="5.6 MiB" time=2025-08-05T20:36:11.960Z level=INFO source=server.go:637 msg="llama runner started in 25.11 seconds" Memory access fault by GPU node-1 (Agent handle: 0x7fedb466f890) on address 0x7fec0d423000. Reason: Page not present or supervisor privilege. time=2025-08-05T20:36:34.898Z level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:33281/completion\": EOF" [GIN] 2025/08/05 - 20:36:34 | 200 | 48.888797725s | 10.244.2.0 | POST "/api/chat" time=2025-08-05T20:36:35.320Z level=WARN source=server.go:517 msg="llama runner process no longer running" sys=134 string="signal: aborted (core dumped)" ``` ### OS Docker ### GPU AMD ### CPU AMD ### Ollama version 0.11.1
GiteaMirror added the bug label 2026-04-22 16:15:48 -05:00
Author
Owner

@dtori commented on GitHub (Aug 5, 2025):

Same
$ ollama run gpt-oss:20b

Hello
Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details

(ubuntu linux, rtx3090)

<!-- gh-comment-id:3156713476 --> @dtori commented on GitHub (Aug 5, 2025): Same $ ollama run gpt-oss:20b >>> Hello Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details (ubuntu linux, rtx3090)
Author
Owner

@ayylmaonade commented on GitHub (Aug 5, 2025):

+1
Having the same issue here.
I found a temporary workaround: You have to disable flash attention. Just serve ollama like this; `OLLAMA_FLASH_ATTENTION=0 ollama serve** until they solve it. 7900 XTX, Arch Linux, Kernel 6.15.9.

<!-- gh-comment-id:3156867376 --> @ayylmaonade commented on GitHub (Aug 5, 2025): +1 Having the same issue here. **I found a temporary workaround**: You have to disable flash attention. Just serve ollama like this; `OLLAMA_FLASH_ATTENTION=0 ollama serve** until they solve it. 7900 XTX, Arch Linux, Kernel 6.15.9.
Author
Owner

@AlKhrulev commented on GitHub (Aug 5, 2025):

+1 Same problem as @dtori on M4 Pro 48 Gb with Ollama 0.11.0 running as a brew service

<!-- gh-comment-id:3156877096 --> @AlKhrulev commented on GitHub (Aug 5, 2025): +1 Same problem as @dtori on M4 Pro 48 Gb with Ollama 0.11.0 running as a brew service
Author
Owner

@pdevine commented on GitHub (Aug 6, 2025):

This should be fixed in 0.11.2.

<!-- gh-comment-id:3157076599 --> @pdevine commented on GitHub (Aug 6, 2025): This should be fixed in `0.11.2`.
Author
Owner

@AlKhrulev commented on GitHub (Aug 6, 2025):

This should be fixed in 0.11.2.

yup, can confirm that it is working now. Thanks!

<!-- gh-comment-id:3157088417 --> @AlKhrulev commented on GitHub (Aug 6, 2025): > This should be fixed in `0.11.2`. yup, can confirm that it is working now. Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33501