[GH-ISSUE #15398] gemma4 e2B does not work on Jetson Orin Nano #9848

Open
opened 2026-04-12 22:42:43 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @NebulaTurnip27 on GitHub (Apr 7, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15398

What is the issue?

I am trying to load gemma4 e2B on my jetson Orin Nano However, I get this failure below

jetson@jon11-jp62:~$ OLLAMA_DEBUG=1 ollama run gemma4:e2b
Error: 500 Internal Server Error: llama runner process has terminated: %!w(<nil>)
jetson@jon11-jp62:~$ OLLAMA_DEBUG=1 ollama run gemma4:e2b
Error: Post "http://127.0.0.1:11434/api/generate": EOF

I ensured I have enough memory to load this. only 500mb is used prior to running the commands above.
I tried the fixing the default context length but it does not seem to change anything. OLLAMA_CONTEXT_LENGTH.

jetson@jon11-jp62:~$ OLLAMA_CONTEXT_LENGTH=128 OLLAMA_DEBUG=1 ollama run gemma4:e2b
Error: Post "http://127.0.0.1:11434/api/generate": EOF

for context, I am able to load this model on this Jetson Orin Nano in Q8_0 and below with llama.cpp. memory used right after loading with llama.cpp is around 5.5 GB with Q8_0.
And if I used another jetson Orin with more memory the same model loads correctly but it seems to be using ~10GB of memory right after running the ollama run command.

Relevant log output

ollama_log.txt

OS

Linux

GPU

Nvidia

CPU

No response

Ollama version

0.20.2

Originally created by @NebulaTurnip27 on GitHub (Apr 7, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15398 ### What is the issue? I am trying to load gemma4 e2B on my jetson Orin Nano However, I get this failure below ``` jetson@jon11-jp62:~$ OLLAMA_DEBUG=1 ollama run gemma4:e2b Error: 500 Internal Server Error: llama runner process has terminated: %!w(<nil>) jetson@jon11-jp62:~$ OLLAMA_DEBUG=1 ollama run gemma4:e2b Error: Post "http://127.0.0.1:11434/api/generate": EOF ``` I ensured I have enough memory to load this. only 500mb is used prior to running the commands above. I tried the fixing the default context length but it does not seem to change anything. OLLAMA_CONTEXT_LENGTH. ``` jetson@jon11-jp62:~$ OLLAMA_CONTEXT_LENGTH=128 OLLAMA_DEBUG=1 ollama run gemma4:e2b Error: Post "http://127.0.0.1:11434/api/generate": EOF ``` for context, I am able to load this model on this Jetson Orin Nano in Q8_0 and below with llama.cpp. memory used right after loading with llama.cpp is around 5.5 GB with Q8_0. And if I used another jetson Orin with more memory the same model loads correctly but it seems to be using ~10GB of memory right after running the ollama run command. ### Relevant log output [ollama_log.txt](https://github.com/user-attachments/files/26545452/ollama_log.txt) ### OS Linux ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.20.2
GiteaMirror added the bug label 2026-04-12 22:42:43 -05:00
Author
Owner

@NebulaTurnip27 commented on GitHub (Apr 7, 2026):

cc @dhiltgen

<!-- gh-comment-id:4200911741 --> @NebulaTurnip27 commented on GitHub (Apr 7, 2026): cc @dhiltgen
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:4201048364 --> @rick-github commented on GitHub (Apr 7, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@NebulaTurnip27 commented on GitHub (Apr 7, 2026):

Server logs will aid in debugging.

Just edited with my log

<!-- gh-comment-id:4201079898 --> @NebulaTurnip27 commented on GitHub (Apr 7, 2026): > [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging. Just edited with my log
Author
Owner

@dhiltgen commented on GitHub (Apr 8, 2026):

I think what's going on here is this model is right up against the total system memory. The Ollama runner currently doesn't have mmap loading support, but via llama.cpp you're able to load with mmap enabled, so you're able to just barely get the model loaded. While Ollama is trying to load, during the loading process there's more overhead in system memory, and the system runs out of RAM and bumps up against the oom killer.

<!-- gh-comment-id:4207882296 --> @dhiltgen commented on GitHub (Apr 8, 2026): I think what's going on here is this model is right up against the total system memory. The Ollama runner currently doesn't have mmap loading support, but via llama.cpp you're able to load with mmap enabled, so you're able to just barely get the model loaded. While Ollama is trying to load, during the loading process there's more overhead in system memory, and the system runs out of RAM and bumps up against the oom killer.
Author
Owner

@NebulaTurnip27 commented on GitHub (Apr 9, 2026):

Thanks, that makes sense.

I think what I’m still trying to understand is whether this is mostly due to how the model is currently loaded in Ollama, rather than the model being inherently too large for this device. On this same Orin Nano, the equivalent q4_k_m model loads in llama.cpp at around 4.5 GB total, so it seems like it can fit once loaded, but Ollama may be hitting higher memory usage during the loading process itself.

Given that, do you think there is anything that can be done to have this model loaded on memory-constrained devices like the Orin Nano? I’m mostly trying to understand whether this is something that might become possible with future improvements.

I have seen many people try and fail with ollama so far.

<!-- gh-comment-id:4211317371 --> @NebulaTurnip27 commented on GitHub (Apr 9, 2026): Thanks, that makes sense. I think what I’m still trying to understand is whether this is mostly due to how the model is currently loaded in Ollama, rather than the model being inherently too large for this device. On this same Orin Nano, the equivalent q4_k_m model loads in llama.cpp at around 4.5 GB total, so it seems like it can fit once loaded, but Ollama may be hitting higher memory usage during the loading process itself. Given that, do you think there is anything that can be done to have this model loaded on memory-constrained devices like the Orin Nano? I’m mostly trying to understand whether this is something that might become possible with future improvements. I have seen many people try and fail with ollama so far.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9848