[GH-ISSUE #15248] Vulkan/i915: gemma4:26b produces garbled output on Intel Arc Arrow Lake-P iGPU (regression); gemma4:e4b alloc_tensor_range failure #35513

Open
opened 2026-04-22 20:04:54 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @fjwood69 on GitHub (Apr 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15248

What is the issue?

Environment

OS Ubuntu 24.04.4 LTS, kernel 6.17.0-19-generic
CPU Intel Core Ultra 5 225H
GPU Intel Arc (Arrow Lake-P) iGPU, device 0x7d51
Driver i915 (not xe)
Vulkan Mesa ANV 25.2.8, Intel(R) Graphics (ARL), Vulkan 1.4
Ollama ollama/ollama:latest (container, via Podman)
Vulkan ICD Mounted from host: -v /usr/share/vulkan:/usr/share/vulkan:ro

Issue 1 — gemma4:e4b: hard buffer allocation failure

gemma4:e4b fails to load on Vulkan with a hard allocation error:

alloc_tensor_range: failed to allocate Vulkan0 buffer of size 5637144576
offloading output layer to CPU
offloaded 42/43 layers to GPU
Model weights fall back to CPU while KV cache remains on GPU. All output is garbage (multilingual/corrupted tokens). The model is unusable..

Issue 2 — gemma4:26b: worked then regressed

gemma4:26b initially loaded and ran correctly:

offloaded 31/31 layers to GPU
model weights device=Vulkan0
Output was clean at ~2.6 tok/s. The following day (no host driver or kernel changes confirmed via dpkg logs), the same model on the same container produced garbled output identical to the e4b failure pattern. Removed.

qwen2.5-coder:7b (dense, non-MoE) continues to work correctly on the same setup.

Relevant log output


OS

Linux

GPU

Intel

CPU

Intel

Ollama version

0.20.0

Originally created by @fjwood69 on GitHub (Apr 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15248 ### What is the issue? **Environment** OS Ubuntu 24.04.4 LTS, kernel 6.17.0-19-generic CPU Intel Core Ultra 5 225H GPU Intel Arc (Arrow Lake-P) iGPU, device 0x7d51 Driver i915 (not xe) Vulkan Mesa ANV 25.2.8, Intel(R) Graphics (ARL), Vulkan 1.4 Ollama ollama/ollama:latest (container, via Podman) Vulkan ICD Mounted from host: -v /usr/share/vulkan:/usr/share/vulkan:ro **Issue 1** — gemma4:e4b: hard buffer allocation failure gemma4:e4b fails to load on Vulkan with a hard allocation error: alloc_tensor_range: failed to allocate Vulkan0 buffer of size 5637144576 offloading output layer to CPU offloaded 42/43 layers to GPU Model weights fall back to CPU while KV cache remains on GPU. All output is garbage (multilingual/corrupted tokens). The model is unusable.. **Issue 2** — gemma4:26b: worked then regressed gemma4:26b initially loaded and ran correctly: offloaded 31/31 layers to GPU model weights device=Vulkan0 Output was clean at ~2.6 tok/s. The following day (no host driver or kernel changes confirmed via dpkg logs), the same model on the same container produced garbled output identical to the e4b failure pattern. Removed. qwen2.5-coder:7b (dense, non-MoE) continues to work correctly on the same setup. ### Relevant log output ```shell ``` ### OS Linux ### GPU Intel ### CPU Intel ### Ollama version 0.20.0
GiteaMirror added the bug label 2026-04-22 20:04:54 -05:00
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15248
Analyzed: 2026-04-18T18:22:51.358203

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274310772 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15248 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15248 **Analyzed**: 2026-04-18T18:22:51.358203 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35513