[GH-ISSUE #12091] Inadequate memory usage of mistral-small3.2:latest #33793

Closed
opened 2026-04-22 16:48:40 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @andyceo on GitHub (Aug 26, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12091

What is the issue?

ollama run mistral-small3.2:latest

then

ollama ps
mistral-small3.2:latest 5a408ab55df5 26 GB 40%/60% CPU/GPU 4096 Forever

For such context size, model should use 16-17 Gb...

Details:
ollama version is 0.11.7
flash attention enabled, context quantization q8_0

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.11.7

Originally created by @andyceo on GitHub (Aug 26, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12091 ### What is the issue? ollama run mistral-small3.2:latest then ollama ps mistral-small3.2:latest 5a408ab55df5 26 GB 40%/60% CPU/GPU 4096 Forever For such context size, model should use 16-17 Gb... Details: ollama version is 0.11.7 flash attention enabled, context quantization q8_0 ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.11.7
GiteaMirror added the bug label 2026-04-22 16:48:40 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 26, 2025):

The projector graph (for the vision model) is nearly 9GB, resulting in a large memory footprint.

<!-- gh-comment-id:3225901526 --> @rick-github commented on GitHub (Aug 26, 2025): The projector graph (for the vision model) is nearly 9GB, resulting in a large memory footprint.
Author
Owner

@andyceo commented on GitHub (Aug 26, 2025):

Not sure I understand....

ollama show --modelfile mistral-small3.2:latest

gives only one FROM:

FROM /usr/share/ollama/.ollama/models/blobs/sha256-41a5b0c36a28a3a0480ce2e4007d3a21e3298be70e2b9a103960581412997dca

no other imports for graphs.

And besides projector graph also exists for https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF, and it use less memory (but recent ollama versions not works on with it: https://github.com/unslothai/unsloth/issues/3218#issue-3357077792)

Want investigate what happens recently related to vision

<!-- gh-comment-id:3225948710 --> @andyceo commented on GitHub (Aug 26, 2025): Not sure I understand.... ollama show --modelfile mistral-small3.2:latest gives only one FROM: FROM /usr/share/ollama/.ollama/models/blobs/sha256-41a5b0c36a28a3a0480ce2e4007d3a21e3298be70e2b9a103960581412997dca no other imports for graphs. And besides projector graph also exists for https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF, and it use less memory (but recent ollama versions not works on with it: https://github.com/unslothai/unsloth/issues/3218#issue-3357077792) Want investigate what happens recently related to vision
Author
Owner

@rick-github commented on GitHub (Aug 26, 2025):

The mistral3 architecture models use a fused GGUF file in ollama, both text and vision weights are in the one file. You can see this with ollama show -v mistral-small3.2:latest.

<!-- gh-comment-id:3226008903 --> @rick-github commented on GitHub (Aug 26, 2025): The mistral3 architecture models use a fused GGUF file in ollama, both text and vision weights are in the one file. You can see this with `ollama show -v mistral-small3.2:latest`.
Author
Owner

@iosub commented on GitHub (Aug 27, 2025):

The mistral3 architecture models use a fused GGUF file in ollama, both text and vision weights are in the one file. You can see this with ollama show -v mistral-small3.2:latest.

Hi, I have the same issue https://github.com/unslothai/unsloth/issues/3218#issuecomment-3229354239

What do you need to check the issue? Logs from version 0.11.13 and 0.11.17?

Thank you

<!-- gh-comment-id:3229368980 --> @iosub commented on GitHub (Aug 27, 2025): > The mistral3 architecture models use a fused GGUF file in ollama, both text and vision weights are in the one file. You can see this with `ollama show -v mistral-small3.2:latest`. Hi, I have the same issue https://github.com/unslothai/unsloth/issues/3218#issuecomment-3229354239 What do you need to check the issue? Logs from version 0.11.13 and 0.11.17? Thank you
Author
Owner

@rick-github commented on GitHub (Aug 28, 2025):

Hi, I have the same issue https://github.com/unslothai/unsloth/issues/3218#issuecomment-3229354239

The mistral3 architecture models use a fused GGUF file in ollama, both text and vision weights are in the one file. The unsloth model comes in two GGUF files and the vision model is not supported.

<!-- gh-comment-id:3235243942 --> @rick-github commented on GitHub (Aug 28, 2025): > Hi, I have the same issue https://github.com/unslothai/unsloth/issues/3218#issuecomment-3229354239 The mistral3 architecture models use a fused GGUF file in ollama, both text and vision weights are in the one file. The unsloth model comes in two GGUF files and the vision model is not supported.
Author
Owner

@iosub commented on GitHub (Aug 30, 2025):

Hi, I have the same issue unslothai/unsloth#3218 (comment)

The mistral3 architecture models use a fused GGUF file in ollama, both text and vision weights are in the one file. The unsloth model comes in two GGUF files and the vision model is not supported.

But it was working before 0.11.14

<!-- gh-comment-id:3238752540 --> @iosub commented on GitHub (Aug 30, 2025): > > Hi, I have the same issue [unslothai/unsloth#3218 (comment)](https://github.com/unslothai/unsloth/issues/3218#issuecomment-3229354239) > > The mistral3 architecture models use a fused GGUF file in ollama, both text and vision weights are in the one file. The unsloth model comes in two GGUF files and the vision model is not supported. But it was working before 0.11.14
Author
Owner

@rick-github commented on GitHub (Aug 30, 2025):

Your're right. The architecture for hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:UD-Q4_K_XL is llama, not mistral3. so the new engine doesn't come into play. The switch to the new memory system in 0.11.5 appears to have impacted the operation of the projector in some models (eg, unsloth/mistral and minicpm-v but not llava, granite3.2-vision or llama3.2-vision).

<!-- gh-comment-id:3238877768 --> @rick-github commented on GitHub (Aug 30, 2025): Your're right. The architecture for hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:UD-Q4_K_XL is llama, not mistral3. so the new engine doesn't come into play. The switch to the new memory system in 0.11.5 appears to have impacted the operation of the projector in some models (eg, unsloth/mistral and minicpm-v but not llava, granite3.2-vision or llama3.2-vision).
Author
Owner

@iosub commented on GitHub (Aug 30, 2025):

Your're right. The architecture for hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:UD-Q4_K_XL is llama, not mistral3. so the new engine doesn't come into play. The switch to the new memory system in 0.11.5 appears to have impacted the operation of the projector in some models (eg, unsloth/mistral and minicpm-v but not llava, granite3.2-vision or llama3.2-vision).

So it's a know bug? Anything I can help with log etc?

<!-- gh-comment-id:3238896229 --> @iosub commented on GitHub (Aug 30, 2025): > Your're right. The architecture for hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:UD-Q4_K_XL is llama, not mistral3. so the new engine doesn't come into play. The switch to the new memory system in 0.11.5 appears to have impacted the operation of the projector in some models (eg, unsloth/mistral and minicpm-v but not llava, granite3.2-vision or llama3.2-vision). So it's a know bug? Anything I can help with log etc?
Author
Owner

@andyceo commented on GitHub (Sep 1, 2025):

One more case: https://github.com/ollama/ollama/issues/12139

<!-- gh-comment-id:3241607598 --> @andyceo commented on GitHub (Sep 1, 2025): One more case: https://github.com/ollama/ollama/issues/12139
Author
Owner

@rick-github commented on GitHub (Sep 13, 2025):

https://github.com/ollama/ollama/pull/12168 has been merged which fixes the problem with hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:UD-Q4_K_XL not seeing images, should be available in 0.11.11.

<!-- gh-comment-id:3287213316 --> @rick-github commented on GitHub (Sep 13, 2025): https://github.com/ollama/ollama/pull/12168 has been merged which fixes the problem with hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:UD-Q4_K_XL not seeing images, should be available in 0.11.11.
Author
Owner

@andyceo commented on GitHub (Sep 16, 2025):

I tested and confirming that the latest Ollama 0.11.11 can read and describe images using hf.co/unsloth/Devstral-Small-2507-GGUF:UD-Q4_K_XL, so I mark that issue as closed.

mistral-small3.2:latest is working and read images, too, memory usage is still high:

NAME                       ID              SIZE     PROCESSOR          CONTEXT    UNTIL
mistral-small3.2:latest    5a408ab55df5    25 GB    45%/55% CPU/GPU    4096       Forever

For such context size normal vram usage is around 17 Gb I suppose, but, as was mentioned here earlier, such behavior for mistral-small3.2 is normal due to it architecture.

Unsloth devstral (which I mentioned earlier) has much lower memory footprint and has vision support (name of the modal is different because of my local changes):

NAME                       ID              SIZE     PROCESSOR    CONTEXT    UNTIL
unsloth/devstral:latest    fb052cda940a    27 GB    100% GPU     45396      Forever

Many thanks to all involved people and authors of the patch! Nice work!

<!-- gh-comment-id:3299610787 --> @andyceo commented on GitHub (Sep 16, 2025): I tested and confirming that the latest Ollama 0.11.11 can read and describe images using hf.co/unsloth/Devstral-Small-2507-GGUF:UD-Q4_K_XL, so I mark that issue as closed. mistral-small3.2:latest is working and read images, too, memory usage is still high: ``` NAME ID SIZE PROCESSOR CONTEXT UNTIL mistral-small3.2:latest 5a408ab55df5 25 GB 45%/55% CPU/GPU 4096 Forever ``` For such context size normal vram usage is around 17 Gb I suppose, but, as was mentioned here earlier, such behavior for mistral-small3.2 is normal due to it architecture. Unsloth devstral (which I mentioned earlier) has much lower memory footprint and has vision support (name of the modal is different because of my local changes): ``` NAME ID SIZE PROCESSOR CONTEXT UNTIL unsloth/devstral:latest fb052cda940a 27 GB 100% GPU 45396 Forever ``` Many thanks to all involved people and authors of the patch! Nice work!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33793