[GH-ISSUE #7757] Memory usage higher than LM Studio for similar model #30713

Open
opened 2026-04-22 10:36:53 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @dalisoft on GitHub (Nov 20, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7757

What is the issue?

Memory usage is different for two services but both services use llama.cpp under the hood (not sure, but i think it is). I am not using MLX backend as it's slower on my machine.

lm-studio (GGUF model)

https://github.com/user-attachments/assets/f283a317-c65d-44f9-ba43-37d49f0cb5ec

ollama

https://github.com/user-attachments/assets/81fa67f7-19ae-46dc-aec9-2ed5a3d59b2f

Env

OLLAMA_NOHISTORY=1
OLLAMA_FLASH_ATTENTION=1
OLLAMA_KEEP_ALIVE=5m
OLLAMA_MAX_QUEUE=512
OLLAMA_MAX_LOADED_MODELS=1
OLLAMA_HOST=127.0.0.1
OLLAMA_NUM_PARALLEL=1

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.4.x

Originally created by @dalisoft on GitHub (Nov 20, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7757 ### What is the issue? Memory usage is different for two services but both services use `llama.cpp` under the hood (not sure, but i think it is). I am not using MLX backend as it's slower on my machine. ### lm-studio (GGUF model) https://github.com/user-attachments/assets/f283a317-c65d-44f9-ba43-37d49f0cb5ec ### ollama https://github.com/user-attachments/assets/81fa67f7-19ae-46dc-aec9-2ed5a3d59b2f #### Env ```bash OLLAMA_NOHISTORY=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_KEEP_ALIVE=5m OLLAMA_MAX_QUEUE=512 OLLAMA_MAX_LOADED_MODELS=1 OLLAMA_HOST=127.0.0.1 OLLAMA_NUM_PARALLEL=1 ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.4.x
GiteaMirror added the bug label 2026-04-22 10:36:53 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30713