[GH-ISSUE #11445] Q3_K_S or IQ3_XXS or SMALLER mistralai/Mistral-Small-3.2-24B-Instruct-2506 #7558

Closed
opened 2026-04-12 19:39:23 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @mirage335 on GitHub (Jul 16, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11445

If newer versions of Mistral offer some improvements (for that matter older versions of Mistral with vision support may no longer be installable with Q3_K_S quantization since ollama apparently no longer supports doing such small quantization itself), then it might be helpful to pull one of these quantized models with vision support .

Preferably it would be nice to just pull these from the local GGUF , mmproj, etc, file per Modelfile specifications.

So far, Mistral has been the only vision LLM I have tried that has a chance of adequately summarizing screenshots (eg. from VSCode IDE) to build an AI assistant around.

Many if not all of the best laptops (eg. Lenovo P1 Gen6) have only 16GB VRAM with their RTX 4090 card, so this is an important use case to use these laptops as developer workstations, vision sensors, etc, forwarding text from a Vision LLM instead of whole images.

https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
https://huggingface.co/bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF
https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF

Originally created by @mirage335 on GitHub (Jul 16, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11445 If newer versions of Mistral offer some improvements (for that matter older versions of Mistral with vision support may no longer be installable with Q3_K_S quantization since ollama apparently no longer supports doing such small quantization itself), then it might be helpful to pull one of these quantized models _with vision support_ . Preferably it would be nice to just pull these from the local GGUF , mmproj, etc, file per Modelfile specifications. So far, Mistral has been the only vision LLM I have tried that has a chance of adequately summarizing screenshots (eg. from VSCode IDE) to build an AI assistant around. Many if not all of the best laptops (eg. Lenovo P1 Gen6) have only 16GB VRAM with their RTX 4090 card, so this is an important use case to use these laptops as developer workstations, vision sensors, etc, forwarding text from a Vision LLM instead of whole images. https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506 https://huggingface.co/bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF
GiteaMirror added the model label 2026-04-12 19:39:23 -05:00
Author
Owner

@mirage335 commented on GitHub (Jan 10, 2026):

Qwen-3-VL has been doing ok for these use cases, so this may be less important now.

<!-- gh-comment-id:3732742033 --> @mirage335 commented on GitHub (Jan 10, 2026): Qwen-3-VL has been doing ok for these use cases, so this may be less important now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7558