[GH-ISSUE #14978] Intel Arc 770: Q4_K_M and other quantization formats produce gibberish or hang #56142

Open
opened 2026-04-29 10:18:55 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @elnfh898978 on GitHub (Mar 20, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14978

What is the issue?

Summary

Intel Arc 770 (16GB) works perfectly with certain quantization formats but produces gibberish, hangs, or repeats output with others when using Vulkan GPU acceleration via Ollama served through Alpaca.
Working Models (Q4_0 and Q8_0 quantizations)

llama3.2:1b
hermes3:8b
Wizard Vicuna Uncensored 13b
samantha mistral 7b
llama2 uncensored 7b

Non-Working Models (Q4_K_M quantization)

llama3.2:3b — produces dense stream of gibberish
llama3.1:8b — produces dense stream of gibberish
Gemma3:12b — hangs, produces no output
Gemma3N:7b — repeats same sentence over and over

Key Findings

The issue is quantization-specific: Q4_K_M quantizations fail, while Q4_0 and Q8_0 work perfectly
Not a size issue: Wizard Vicuna 13b (Q4_0) works fine, while Gemma3 12b (Q4_K_M) hangs
Not an OS issue: All models work correctly on CPU (though slower)
Mysterious pattern: llama3.2:1b works but llama3.2:3b doesn't, despite being similar models
Vision models affected: Gemma3 (vision model) exhibits different failure modes than text models

Environment

Hardware: Intel Arc 770 16GB
OS: Bazzite (immutable/atomic)
Ollama served through: Alpaca Flatpak
GPU acceleration: Vulkan
Ollama v 0.18.2
Vulkan Instance Version: 1.4.341 

Expected Behavior

All quantization formats should produce coherent output or fail gracefully, not produce gibberish or hang.
Actual Behavior

Q4_K_M quantizations produce invalid output or hang entirely on Intel Arc 770.

Relevant log output


OS

Linux

GPU

Intel

CPU

Intel

Ollama version

0.18.2

Originally created by @elnfh898978 on GitHub (Mar 20, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14978 ### What is the issue? Summary Intel Arc 770 (16GB) works perfectly with certain quantization formats but produces gibberish, hangs, or repeats output with others when using Vulkan GPU acceleration via Ollama served through Alpaca. Working Models (Q4_0 and Q8_0 quantizations) llama3.2:1b hermes3:8b Wizard Vicuna Uncensored 13b samantha mistral 7b llama2 uncensored 7b Non-Working Models (Q4_K_M quantization) llama3.2:3b — produces dense stream of gibberish llama3.1:8b — produces dense stream of gibberish Gemma3:12b — hangs, produces no output Gemma3N:7b — repeats same sentence over and over Key Findings The issue is quantization-specific: Q4_K_M quantizations fail, while Q4_0 and Q8_0 work perfectly Not a size issue: Wizard Vicuna 13b (Q4_0) works fine, while Gemma3 12b (Q4_K_M) hangs Not an OS issue: All models work correctly on CPU (though slower) Mysterious pattern: llama3.2:1b works but llama3.2:3b doesn't, despite being similar models Vision models affected: Gemma3 (vision model) exhibits different failure modes than text models Environment Hardware: Intel Arc 770 16GB OS: Bazzite (immutable/atomic) Ollama served through: Alpaca Flatpak GPU acceleration: Vulkan Ollama v 0.18.2 Vulkan Instance Version: 1.4.341 Expected Behavior All quantization formats should produce coherent output or fail gracefully, not produce gibberish or hang. Actual Behavior Q4_K_M quantizations produce invalid output or hang entirely on Intel Arc 770. ### Relevant log output ```shell ``` ### OS Linux ### GPU Intel ### CPU Intel ### Ollama version 0.18.2
GiteaMirror added the bug label 2026-04-29 10:18:55 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56142