[GH-ISSUE #10229] Severe Performance Drop and Desktop UI Stuttering on RX 7600 XT with High VRAM Usage (ROCm on Ubuntu) #53223

Open
opened 2026-04-29 02:24:06 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @tedliosu on GitHub (Apr 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10229

What is the issue?

Hi team, I’ve encountered a consistent issue when running Ollama on my AMD RX 7600 XT system using the ROCm amdgpu driver on Ubuntu.

When the total combined VRAM usage (including model layers and system processes like the desktop compositor) exceeds ~90%, the system begins to exhibit significant display stuttering and occasional graphical artifacts. At the same time, Ollama’s inference throughput drops sharply — for example, from ~1.8 tokens/sec down to as low as 0.4 tokens/sec.

This behavior does not occur on my RTX A4000 system under similar load and workflow, so it seems specific to how the ROCm amdgpu driver handles high VRAM pressure, rather than being a model-related issue. As noted in this Reddit thread, even approaching full VRAM usage on amdgpu systems can lead to serious system performance degradation.

I’m not suggesting Ollama implement AMD-specific logic, but it may help to offer a user-configurable global VRAM margin, or a way to cap GPU layer offloading more conservatively, especially on systems that don’t gracefully handle high VRAM pressure.

I’d be happy to provide additional system specs, logs, or even a video recording of the issue being reproduced if that would help. (I’d likely have to record externally using a smartphone, since a tool like OBS probably wouldn’t capture the stuttering and artifacts accurately.) Just wanted to raise this in case others running ROCm on Ubuntu are encountering similar problems.

Relevant log output


OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.6.4

Originally created by @tedliosu on GitHub (Apr 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10229 ### What is the issue? Hi team, I’ve encountered a consistent issue when running Ollama on my AMD RX 7600 XT system using the ROCm amdgpu driver on Ubuntu. When the total combined VRAM usage (including model layers and system processes like the desktop compositor) exceeds ~90%, the system begins to exhibit significant display stuttering and occasional graphical artifacts. At the same time, Ollama’s inference throughput drops sharply — for example, from ~1.8 tokens/sec down to as low as 0.4 tokens/sec. This behavior does not occur on my RTX A4000 system under similar load and workflow, so it seems specific to how the ROCm amdgpu driver handles high VRAM pressure, rather than being a model-related issue. As noted in [this Reddit thread](https://www.reddit.com/r/linux_gaming/comments/1ccwxgy/comment/l18484p/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button), even approaching full VRAM usage on amdgpu systems can lead to serious system performance degradation. I’m not suggesting Ollama implement AMD-specific logic, but it may help to offer a user-configurable global VRAM margin, or a way to cap GPU layer offloading more conservatively, especially on systems that don’t gracefully handle high VRAM pressure. I’d be happy to provide additional system specs, logs, or even a video recording of the issue being reproduced if that would help. (I’d likely have to record externally using a smartphone, since a tool like OBS probably wouldn’t capture the stuttering and artifacts accurately.) Just wanted to raise this in case others running ROCm on Ubuntu are encountering similar problems. ### Relevant log output ```shell ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.6.4
GiteaMirror added the bug label 2026-04-29 02:24:06 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 11, 2025):

Does setting OLLAMA_GPU_OVERHEAD help?

<!-- gh-comment-id:2796489347 --> @rick-github commented on GitHub (Apr 11, 2025): Does setting [`OLLAMA_GPU_OVERHEAD`](https://github.com/ollama/ollama/blob/ef65174df23fb2efb499a18d7071348cc0ec58da/envconfig/config.go#L244) help?
Author
Owner

@tedliosu commented on GitHub (Apr 11, 2025):

Does setting OLLAMA_GPU_OVERHEAD help?

Hi @rick-github — thank you for the suggestion! Setting the OLLAMA_GPU_OVERHEAD environment variable via a systemd override worked excellently on my end.

After adding:

[Service]
Environment="OLLAMA_GPU_OVERHEAD=1889785611"

to /etc/systemd/system/ollama.service.d/override.conf and restarting the service, I’m now seeing significantly improved behavior on my RX 7600 XT system running Ubuntu with the ROCm amdgpu driver.

  • Total system VRAM usage during llama3.3 inference now stays comfortably under 14,000 MB (including desktop compositor)
  • The stuttering and display artifacting issues are gone
  • Inference speed remains at the previously tuned max (~1.7 tokens/sec) without needing to manually adjust num_gpu via the shell

Given how well this worked, I think it would be very helpful if the OLLAMA_GPU_OVERHEAD variable were mentioned in the official Linux documentation or at least in a README section for advanced configuration. It’s a particularly useful option for platforms (like AMD GPUs on Linux) where hitting >90% total system VRAM usage can lead to severe instability due to how the drivers handle memory pressure.

Thanks again for the quick pointer — this fix made a big difference in my local deployment. 👍

Feel free to close the issue unless there’s any interest in tracking better documentation or surfacing this setting more visibly for future users.

<!-- gh-comment-id:2798136853 --> @tedliosu commented on GitHub (Apr 11, 2025): > Does setting [`OLLAMA_GPU_OVERHEAD`](https://github.com/ollama/ollama/blob/ef65174df23fb2efb499a18d7071348cc0ec58da/envconfig/config.go#L244) help? Hi @rick-github — thank you for the suggestion! Setting the `OLLAMA_GPU_OVERHEAD` environment variable via a systemd override worked excellently on my end. After adding: ```ini [Service] Environment="OLLAMA_GPU_OVERHEAD=1889785611" ``` to `/etc/systemd/system/ollama.service.d/override.conf` and restarting the service, I’m now seeing significantly improved behavior on my RX 7600 XT system running Ubuntu with the ROCm `amdgpu` driver. - Total system VRAM usage during `llama3.3` inference now stays comfortably under 14,000 MB (including desktop compositor) - The stuttering and display artifacting issues are gone - Inference speed remains at the previously tuned max (~1.7 tokens/sec) **without needing to manually adjust `num_gpu` via the shell** Given how well this worked, I think it would be very helpful if the `OLLAMA_GPU_OVERHEAD` variable were mentioned in the [official Linux documentation](https://github.com/ollama/ollama/blob/main/docs/linux.md) or at least in a README section for advanced configuration. It’s a particularly useful option for platforms (like AMD GPUs on Linux) where hitting >90% total system VRAM usage can lead to severe instability due to how the drivers handle memory pressure. Thanks again for the quick pointer — this fix made a big difference in my local deployment. 👍 Feel free to close the issue unless there’s any interest in tracking better documentation or surfacing this setting more visibly for future users.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53223