[GH-ISSUE #2767] Fully unload GPU memory on NVIDIA non-VMM GPUs when idle #48180

Closed
opened 2026-04-28 07:03:27 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @dhiltgen on GitHub (Feb 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2767

Originally assigned to: @dhiltgen on GitHub.

The fix for #1848 works for VMM GPUs, but still leaves remaining memory allocations for non-VMM GPUs.

Originally created by @dhiltgen on GitHub (Feb 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2767 Originally assigned to: @dhiltgen on GitHub. The fix for #1848 works for VMM GPUs, but still leaves remaining memory allocations for non-VMM GPUs.
GiteaMirror added the bug label 2026-04-28 07:03:27 -05:00
Author
Owner

@dhiltgen commented on GitHub (Mar 20, 2024):

This should be resolved by #3218

<!-- gh-comment-id:2009966340 --> @dhiltgen commented on GitHub (Mar 20, 2024): This should be resolved by #3218
Author
Owner

@oldgithubman commented on GitHub (Mar 30, 2024):

Not fixed for me. Before updating, ollama didn't use any (significant, at least) memory on startup. Now, the instance mapped to my 1080 Ti (11 GiB) is using 136 MiB and the instances mapped to my 1070 Ti's (8 GiB) are using 100 MiB each. This is before loading any models. Not too cool

<!-- gh-comment-id:2027900442 --> @oldgithubman commented on GitHub (Mar 30, 2024): Not fixed for me. Before updating, ollama didn't use any (significant, at least) memory on startup. Now, the instance mapped to my 1080 Ti (11 GiB) is using 136 MiB and the instances mapped to my 1070 Ti's (8 GiB) are using 100 MiB each. This is before loading any models. Not too cool
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48180