[GH-ISSUE #13749] ARM64 GPU crash when loading translategemma:27b (EOF during load), 4b works #55526

Closed
opened 2026-04-29 09:20:45 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @thewh1teagle on GitHub (Jan 16, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13749

What is the issue?

On an ARM64 host (DGX Spark), translategemma:27b consistently crashes during model load with:
"do load request: EOF" and "llama runner terminated (exit status 2)".

Environment:

  • ARM64 (DGX Spark)
  • ollama 0.14.2 (also tested 0.14.0-rc2 and 0.14.1)
  • GPU enabled
  • CPU-only mode works (OLLAMA_GPU=0)
  • translategemma:4b loads and runs fine on GPU
  • Larger model (27b) always fails during load/init

Observed behavior:

  • Model downloads successfully
  • Crash happens during /load phase (likely mmap / init)
  • Logs show native ARM64 runtime crash (runtime/asm_arm64.s)
  • No explicit OOM or CUDA error reported

journalctl logs attached.

log.txt

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.14.1

Originally created by @thewh1teagle on GitHub (Jan 16, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13749 ### What is the issue? On an ARM64 host (DGX Spark), translategemma:27b consistently crashes during model load with: "do load request: EOF" and "llama runner terminated (exit status 2)". Environment: - ARM64 (DGX Spark) - ollama 0.14.2 (also tested 0.14.0-rc2 and 0.14.1) - GPU enabled - CPU-only mode works (OLLAMA_GPU=0) - translategemma:4b loads and runs fine on GPU - Larger model (27b) always fails during load/init Observed behavior: - Model downloads successfully - Crash happens during /load phase (likely mmap / init) - Logs show native ARM64 runtime crash (runtime/asm_arm64.s) - No explicit OOM or CUDA error reported journalctl logs attached. [log.txt](https://github.com/user-attachments/files/24674840/log.txt) ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.14.1
GiteaMirror added the bug label 2026-04-29 09:20:45 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 16, 2026):

Not enough log.

journalctl -u ollama --no-pager --since "$(systemctl show ollama --property=ActiveEnterTimestamp --value)"
<!-- gh-comment-id:3760559966 --> @rick-github commented on GitHub (Jan 16, 2026): Not enough log. ``` journalctl -u ollama --no-pager --since "$(systemctl show ollama --property=ActiveEnterTimestamp --value)" ```
Author
Owner

@thewh1teagle commented on GitHub (Jan 16, 2026):

@rick-github thanks for checking! here's the complete log

large_log.txt

<!-- gh-comment-id:3760617364 --> @thewh1teagle commented on GitHub (Jan 16, 2026): @rick-github thanks for checking! here's the complete log [large_log.txt](https://github.com/user-attachments/files/24675180/large_log.txt)
Author
Owner

@rick-github commented on GitHub (Jan 16, 2026):

Jan 16 16:48:14 spark ollama[2038]: //ml/backend/ggml/ggml/src/ggml-cuda/im2col.cu:84: GGML_ASSERT(dst->type == GGML_TYPE_F16 || dst->type == GGML_TYPE_F32) failed

It looks like the model has been quantized to data types that are not supported. There was a previous issue also involving gemma3. @pdevine fixed it by pushing an update to the model. The un-quantized model (translategemma:27b-it-fp16) should work.

<!-- gh-comment-id:3760780379 --> @rick-github commented on GitHub (Jan 16, 2026): ``` Jan 16 16:48:14 spark ollama[2038]: //ml/backend/ggml/ggml/src/ggml-cuda/im2col.cu:84: GGML_ASSERT(dst->type == GGML_TYPE_F16 || dst->type == GGML_TYPE_F32) failed ``` It looks like the model has been quantized to data types that are not supported. There was a [previous issue](https://github.com/ollama/ollama/issues/10792#issuecomment-3304993113) also involving gemma3. @pdevine fixed it by pushing an update to the model. The un-quantized model (translategemma:27b-it-fp16) should work.
Author
Owner

@pdevine commented on GitHub (Jan 16, 2026):

The problem before was im2col (used by the vision part of the model) doesn't have a bf16 kernel so I had to swap bf16 for float16 tensors. @jmorganca is going to republish the model to fix the vision tensors.

<!-- gh-comment-id:3761518062 --> @pdevine commented on GitHub (Jan 16, 2026): The problem before was im2col (used by the vision part of the model) doesn't have a bf16 kernel so I had to swap bf16 for float16 tensors. @jmorganca is going to republish the model to fix the vision tensors.
Author
Owner

@rick-github commented on GitHub (Jan 20, 2026):

Models have been updated and this looks fixed. Old behaviour:

$ ollama run translategemma:27b-13749 hello
Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:36507/load": EOF

New behaviour:

$ ollama run translategemma:27b hello
Hello! How can I help you today?
<!-- gh-comment-id:3770486332 --> @rick-github commented on GitHub (Jan 20, 2026): Models have been updated and this looks fixed. Old behaviour: ```console $ ollama run translategemma:27b-13749 hello Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:36507/load": EOF ``` New behaviour: ```console $ ollama run translategemma:27b hello Hello! How can I help you today? ```
Author
Owner

@pdevine commented on GitHub (Jan 20, 2026):

I'll go ahead and close the issue. You do need to pull the image again unfortunately.

<!-- gh-comment-id:3770516281 --> @pdevine commented on GitHub (Jan 20, 2026): I'll go ahead and close the issue. You do need to pull the image again unfortunately.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55526