[GH-ISSUE #4895] Add "use_mmap" to environment variable #65128

Open
opened 2026-05-03 19:48:24 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @sisi399 on GitHub (Jun 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4895

I recently discovered the potential benefits of the --no-mmap option, particularly for specific system configurations, such as PCs or laptops equipped with only 8GB of system RAM and a GPU with VRAM of 6GB or more, capable of loading entire models onto it.

Loading models with mmap can render the use of 8B models nearly impossible, as it can cause RAM usage to spike to 99% and remain there, often leading to complete freezing of the PC and requiring a hard reset.

Disabling mmap allows users to load 8B models while still having half of the RAM available for other tasks. The only drawback is a slightly longer initial model load time (around 5-10 seconds in my case), which I believe is a worthwhile trade-off. Quick benchmarks even suggest that --no-mmap might be slightly faster for generating tokens.

Now, here's the rationale for adding an environment variable.

Many frontends/UIs utilize Ollama, but a significant portion of them lack the toggle option to set the "nommap". By introducing an environment variable, I can globally set "nommap" and ensure that any frontends will load models with the nommap flag enabled.

Originally created by @sisi399 on GitHub (Jun 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4895 I recently discovered the potential benefits of the --no-mmap option, particularly for specific system configurations, such as PCs or laptops equipped with only 8GB of system RAM and a GPU with VRAM of 6GB or more, capable of loading entire models onto it. Loading models with mmap can render the use of 8B models nearly impossible, as it can cause RAM usage to spike to 99% and remain there, often leading to complete freezing of the PC and requiring a hard reset. Disabling mmap allows users to load 8B models while still having half of the RAM available for other tasks. The only drawback is a slightly longer initial model load time (around 5-10 seconds in my case), which I believe is a worthwhile trade-off. Quick benchmarks even suggest that --no-mmap might be slightly faster for generating tokens. Now, here's the rationale for adding an environment variable. Many frontends/UIs utilize Ollama, but a significant portion of them lack the toggle option to set the "nommap". By introducing an environment variable, I can globally set "nommap" and ensure that any frontends will load models with the nommap flag enabled.
GiteaMirror added the feature request label 2026-05-03 19:48:24 -05:00
Author
Owner

@UmutAlihan commented on GitHub (Sep 6, 2024):

It seems "mmap" configuration is only available via Ollama pip package which is on the client-side. It would be very useful if web ca provide this as an arg to the server command such as:

"ollama server --use-mmap False" or something

<!-- gh-comment-id:2334036065 --> @UmutAlihan commented on GitHub (Sep 6, 2024): It seems "mmap" configuration is only available via Ollama pip package which is on the client-side. It would be very useful if web ca provide this as an arg to the server command such as: "ollama server --use-mmap False" or something
Author
Owner

@juangon commented on GitHub (Oct 26, 2024):

It would be great having this option server side globally

<!-- gh-comment-id:2439389424 --> @juangon commented on GitHub (Oct 26, 2024): It would be great having this option server side globally
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65128