[GH-ISSUE #12313] [Feature Request] Q6_K for Qwen2.5-VL-32B to fill 32 GB VRAM gap (single-file GGUF preferred) #70239

Open
opened 2026-05-04 20:45:31 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @xl2014 on GitHub (Sep 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12313

  1. 32 GB VRAM Utilization Gap

    Q4_K_M ≈ 21 GB → 30 %+ idle, wasted compute
    Q8_0 ≈ 35 GB → exceeds single 32 GB GPU, forces offload or smaller batch
    Q6_K lands at 28–30 GB, perfectly filling 32 GB VRAM while keeping quality

  2. Missing Official Option
    Ollama library only ships q4_K_M and q8_0 tags; no Q6_K middle ground, leaving 32 GB owners stuck between “too empty” and “too large”.

  3. Implementation Request

    Tag name: qwen2.5-vl:32b-q6_k
    Bundle upstream language + vision_encoder into single GGUF (like llava:34b-v1.6-q6_k) for one-command install
    Users with 32 GB GPUs can then ollama pull and immediately run a vision model that uses almost all VRAM—no manual merge needed.

Summary
Adding Q6_K closes the 32 GB VRAM gap in one shot and gives a seamless single-file experience—two birds with one stone.

Originally created by @xl2014 on GitHub (Sep 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12313 1. 32 GB VRAM Utilization Gap Q4_K_M ≈ 21 GB → 30 %+ idle, wasted compute Q8_0 ≈ 35 GB → exceeds single 32 GB GPU, forces offload or smaller batch Q6_K lands at 28–30 GB, perfectly filling 32 GB VRAM while keeping quality 2. Missing Official Option Ollama library only ships q4_K_M and q8_0 tags; no Q6_K middle ground, leaving 32 GB owners stuck between “too empty” and “too large”. 3. Implementation Request Tag name: qwen2.5-vl:32b-q6_k Bundle upstream language + vision_encoder into single GGUF (like llava:34b-v1.6-q6_k) for one-command install Users with 32 GB GPUs can then ollama pull and immediately run a vision model that uses almost all VRAM—no manual merge needed. Summary Adding Q6_K closes the 32 GB VRAM gap in one shot and gives a seamless single-file experience—two birds with one stone.
GiteaMirror added the feature request label 2026-05-04 20:45:31 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70239