[GH-ISSUE #15714] Gemma4 CUDA error: unspecified launch failure / illegal memory access on single GPU (RTX 3080 Ti) #35779

Open
opened 2026-04-22 20:27:32 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @syed-asim on GitHub (Apr 20, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15714

What is the issue?

Environment

  • Ollama version: 0.20.0
  • OS: Ubuntu (Linux)
  • GPU: NVIDIA GeForce RTX 3080 Ti Laptop GPU (16GB VRAM)
  • CUDA: 12.4, Driver: 550.120
  • Model: gemma4:latest (8B Q4_K_M)

Problem

Every inference request to gemma4 crashes with a CUDA error,
both text-only and image prompts.

Observed behavior

The vision encoder runs on EVERY request including text-only prompts.
Crash always occurs at the same location: computeBatch → Tensor.Floats
→ ggml_backend_sched_synchronize

Text-only error: CUDA error: an illegal memory access was encountered
Image prompt error: CUDA error: unspecified launch failure

Key log evidence

Text-only prompt ("How to automate SaaS sales using autonomous agents powered by Gemma 4") still triggers vision encoder.

Steps to reproduce

  1. ollama pull gemma4
  2. ollama run gemma4 "hello"
  3. Crashes every time

Notes

  • GGML_CUDA_NO_GRAPHS=1 and GGML_CUDA_FORCE_MMQ=1 do not reach
    the runner subprocess and have no effect
  • Model loads successfully (43/43 layers offloaded to GPU)
  • VRAM usage is fine: 8.9GB weights + 224MB KV + 354MB compute = ~10GB of 16GB

Relevant log output

vision: decode    elapsed=808µs   bounds=(0,0)-(2048,2048)
vision: preprocess  size=[768 768]
vision: patches   total=2304
vision: encoded   shape=[2560 256]
→ CUDA error: an illegal memory access was encountered

Real image prompt also crashes after successful encoding:
vision: decode    bounds=(0,0)-(306,165)
vision: encoded   shape=[2560 264]
→ CUDA error: unspecified launch failure

Crash goroutine (same every time):
github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify.func4()
ggml.go:833
github.com/ollama/ollama/ml/backend/ggml.(*Tensor).Floats(...)
ggml.go:1065
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(...)
runner.go:723

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.20.0

Originally created by @syed-asim on GitHub (Apr 20, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15714 ### What is the issue? ## Environment - Ollama version: 0.20.0 - OS: Ubuntu (Linux) - GPU: NVIDIA GeForce RTX 3080 Ti Laptop GPU (16GB VRAM) - CUDA: 12.4, Driver: 550.120 - Model: gemma4:latest (8B Q4_K_M) ## Problem Every inference request to gemma4 crashes with a CUDA error, both text-only and image prompts. ## Observed behavior The vision encoder runs on EVERY request including text-only prompts. Crash always occurs at the same location: computeBatch → Tensor.Floats → ggml_backend_sched_synchronize Text-only error: `CUDA error: an illegal memory access was encountered` Image prompt error: `CUDA error: unspecified launch failure` ## Key log evidence Text-only prompt ("How to automate SaaS sales using autonomous agents powered by Gemma 4") still triggers vision encoder. ## Steps to reproduce 1. `ollama pull gemma4` 2. `ollama run gemma4 "hello"` 3. Crashes every time ## Notes - GGML_CUDA_NO_GRAPHS=1 and GGML_CUDA_FORCE_MMQ=1 do not reach the runner subprocess and have no effect - Model loads successfully (43/43 layers offloaded to GPU) - VRAM usage is fine: 8.9GB weights + 224MB KV + 354MB compute = ~10GB of 16GB ### Relevant log output ```shell vision: decode elapsed=808µs bounds=(0,0)-(2048,2048) vision: preprocess size=[768 768] vision: patches total=2304 vision: encoded shape=[2560 256] → CUDA error: an illegal memory access was encountered Real image prompt also crashes after successful encoding: vision: decode bounds=(0,0)-(306,165) vision: encoded shape=[2560 264] → CUDA error: unspecified launch failure Crash goroutine (same every time): github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify.func4() ggml.go:833 github.com/ollama/ollama/ml/backend/ggml.(*Tensor).Floats(...) ggml.go:1065 github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(...) runner.go:723 ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.20.0
GiteaMirror added the bug label 2026-04-22 20:27:32 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35779