[GH-ISSUE #13989] Apparent incompatibility between Granite models and the new AMD RYZEN AI MAX+ 395 chipset #9147

Open
opened 2026-04-12 21:59:54 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @bradg-GMA on GitHub (Jan 31, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13989

What is the issue?

I recently purchased a new workstation with the AMD RYZEN AI MAX+ 395 with AVX512.

  • The llama runner process is crashing with exit status 2 when trying to load Granite models on my system.
  • Both granite4:latest and granite3-dense:2b models crash with "llama runner process has terminated: exit status 2"
  • This is a segmentation fault in the llama.cpp runner that loads the models
  • The issue appears to be specific to Granite models on the AMD RYZEN AI MAX+ 395 system

Relevant log output

just this: Error: 500 Internal Server Error: llama runner process has terminated: exit status 2

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @bradg-GMA on GitHub (Jan 31, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13989 ### What is the issue? I recently purchased a new workstation with the AMD RYZEN AI MAX+ 395 with AVX512. - The llama runner process is crashing with exit status 2 when trying to load Granite models on my system. - Both granite4:latest and granite3-dense:2b models crash with "llama runner process has terminated: exit status 2" - This is a segmentation fault in the llama.cpp runner that loads the models - The issue appears to be specific to Granite models on the AMD RYZEN AI MAX+ 395 system ### Relevant log output ```shell just this: Error: 500 Internal Server Error: llama runner process has terminated: exit status 2 ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the needs more infobug labels 2026-04-12 21:59:54 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 31, 2026):

Server logs may help in debugging.

<!-- gh-comment-id:3828047083 --> @rick-github commented on GitHub (Jan 31, 2026): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx) may help in debugging.
Author
Owner

@magnusahlden commented on GitHub (Jan 31, 2026):

check for "out of memory"

<!-- gh-comment-id:3828881653 --> @magnusahlden commented on GitHub (Jan 31, 2026): check for "out of memory"
Author
Owner

@bradg-GMA commented on GitHub (Jan 31, 2026):

ollama_api_test.log (62 bytes) Contains the API error response: "llama
runner process has terminated: exit status 2"

ollama_journal.log (4.3 KB) Contains the full crash stack trace from
systemd journal showing the model loading failure

On Sat, Jan 31, 2026 at 12:33 PM Magnus Ahlden @.***>
wrote:

magnusahlden left a comment (ollama/ollama#13989)
https://github.com/ollama/ollama/issues/13989#issuecomment-3828881653

check for "out of memory"


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/13989#issuecomment-3828881653,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/B47KXOILHN7WLROKW5ZRQQL4JTROZAVCNFSM6AAAAACTQDO3PCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQMRYHA4DCNRVGM
.
You are receiving this because you authored the thread.Message ID:
@.***>

<!-- gh-comment-id:3829322162 --> @bradg-GMA commented on GitHub (Jan 31, 2026): ollama_api_test.log (62 bytes) Contains the API error response: "llama runner process has terminated: exit status 2" ollama_journal.log (4.3 KB) Contains the full crash stack trace from systemd journal showing the model loading failure On Sat, Jan 31, 2026 at 12:33 PM Magnus Ahlden ***@***.***> wrote: > *magnusahlden* left a comment (ollama/ollama#13989) > <https://github.com/ollama/ollama/issues/13989#issuecomment-3828881653> > > check for "out of memory" > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/13989#issuecomment-3828881653>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/B47KXOILHN7WLROKW5ZRQQL4JTROZAVCNFSM6AAAAACTQDO3PCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQMRYHA4DCNRVGM> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
Author
Owner

@bradg-GMA commented on GitHub (Feb 2, 2026):

ollama_api_test.log
ollama_journal.log

I sent these in an e-mail. Here are a copy of the logs

<!-- gh-comment-id:3837010213 --> @bradg-GMA commented on GitHub (Feb 2, 2026): [ollama_api_test.log](https://github.com/user-attachments/files/25022441/ollama_api_test.log) [ollama_journal.log](https://github.com/user-attachments/files/25022440/ollama_journal.log) I sent these in an e-mail. Here are a copy of the logs
Author
Owner

@rick-github commented on GitHub (Feb 2, 2026):

No useful information here, need the full server log.

<!-- gh-comment-id:3837021636 --> @rick-github commented on GitHub (Feb 2, 2026): No useful information here, need the full server log.
Author
Owner

@magnusahlden commented on GitHub (Feb 10, 2026):

On my AMD Ryzen AI MAX+ 395 granite is really fast and works really well. My guess is that most users don't understand they have to set video memory allocation in the BIOS to run moderate -> large models. I suggest this issue to be closed

~ ❯ fastfetch

                                                              ┌──────────────────────Hardware──────────────────────┐
  ██████████████████████████████████████████████████████       PC: MS-S1 MAX (1.0)
  ██████████████████████████████████████████████████████      │ ├: AMD RYZEN AI MAX+ 395 (32) @ 5.19 GHz
  ████                     ████                     ████      │ ├: AMD Radeon 8060S Graphics [Integrated]
  ████                     ████                     ████      │ ├󱄄: 3440x1440 in 34", 60 Hz [External]
  ████    █████████████████████         ████████    ████      │ ├󰋊: 361.88 GiB / 1.86 TiB (19%) - btrfs
  ████    █████████████████████         ████████    ████      │ ├: 9.58 GiB / 30.98 GiB (31%)
  ████    ████                              ████    ████      └ └󰓡 : 2.67 MiB / 4.00 GiB (0%)
  ████    ████                              ████    ████      └────────────────────────────────────────────────────┘
  ████    ████                              ████    ████
  ████    ████                              ████    ████      ┌──────────────────────Software──────────────────────┐
  ████    ████                              ████    ████       OS: Omarchy 3.3.3
  ████    ████                              ████    ████      │ ├󰘬: master
  ████████████                              ████    ████      │ ├󰔫: stable
  ████████████                              ████    ████      │ ├: Linux 6.18.3-arch1-1
  ████    ████                              ████    ████      │ ├: Hyprland 0.53.1 (Wayland)
  ████    ████                              ████    ████      │ ├: ghostty 1.2.3-arch2
  ████    ████                              ████    ████      │ ├󰏖: 1029 (pacman)
  ████    ████                              ████    ████      │ ├󰸌: Hackerman ●●●●●●●●
  ████    ████                              ████    ████      └ └: iA Writer Mono S (12pt)
  ████    ████                              ████    ████      └────���───────────────────────────────────────────────┘
  ████    ██████████████████████████████████████    ████
  ████    ██████████████████████████████████████    ████      ┌────────────────Age / Uptime / Update───────────────┐
  ████                     ████                     ████      󱦟 OS Age: 38 days
  ████                     ████                     ████      󱫐 Uptime: 22 hours, 22 mins
  █████████████████████████████     ████████████████████       Update: Saturday, January 31 2026 at 18:07
  █████████████████████████████     ████████████████████      └────────────────────────────────────────────────────┘


~ ❯ ollama --version
ollama version is 0.13.5

~ ❯ time ollama run granite4:latest "Why is the number 42 funny for software developers?"
The number 42 has become a popular joke among software developers and tech enthusiasts, often referred to as "the answer to everything" from Douglas Adams' science fiction series, The Hitchhiker's Guide to the Galaxy. In the book, a supercomputer named Deep Thought calculates the optimal solution for
the Ultimate Question of Life, Universe, and Everything, which is humorously revealed to be 42.

The humor behind this number in software development stems from its randomness and lack of context. Developers often use it as an inside joke when faced with complex problems or when they can't find a straightforward solution to their code or bugs. By referencing the number 42, developers create a
humorous way to acknowledge the frustration that comes with debugging and problem-solving.

Additionally, the number has been adopted by several programming memes and jokes over time. For example:

1. "You have two cows... (and now you have the Internet of Things)" - This is a reference to the Unix philosophy of breaking down problems into smaller parts, which can lead to even more complex issues.

2. The Hitchhiker's Guide to the Galaxy meme: Developers might use memes or images from the book to express their frustration with coding challenges or when they feel overwhelmed by tasks at hand.

3. "The 42 Project" - A tongue-in-cheek reference to a hypothetical project that has no real purpose, often used to illustrate how some software projects can spiral out of control without clear direction or goals.

In summary, the number 42 is funny for software developers because it represents an abstract concept (the Ultimate Question) applied humorously in everyday programming scenarios. It also serves as a reminder that sometimes there are no straightforward answers and that we must embrace the complexity and
unpredictability of coding challenges.


real    0m5.218s
user    0m0.019s
sys     0m0.008s

@rick-github I suggest this issue should be closed.

<!-- gh-comment-id:3877333416 --> @magnusahlden commented on GitHub (Feb 10, 2026): On my AMD Ryzen AI MAX+ 395 granite is really fast and works really well. My guess is that most users don't understand they have to set video memory allocation in the BIOS to run moderate -> large models. I suggest this issue to be closed ``` ~ ❯ fastfetch ┌──────────────────────Hardware──────────────────────┐ ██████████████████████████████████████████████████████  PC: MS-S1 MAX (1.0) ██████████████████████████████████████████████████████ │ ├: AMD RYZEN AI MAX+ 395 (32) @ 5.19 GHz ████ ████ ████ │ ├: AMD Radeon 8060S Graphics [Integrated] ████ ████ ████ │ ├󱄄: 3440x1440 in 34", 60 Hz [External] ████ █████████████████████ ████████ ████ │ ├󰋊: 361.88 GiB / 1.86 TiB (19%) - btrfs ████ █████████████████████ ████████ ████ │ ├: 9.58 GiB / 30.98 GiB (31%) ████ ████ ████ ████ └ └󰓡 : 2.67 MiB / 4.00 GiB (0%) ████ ████ ████ ████ └────────────────────────────────────────────────────┘ ████ ████ ████ ████ ████ ████ ████ ████ ┌──────────────────────Software──────────────────────┐ ████ ████ ████ ████  OS: Omarchy 3.3.3 ████ ████ ████ ████ │ ├󰘬: master ████████████ ████ ████ │ ├󰔫: stable ████████████ ████ ████ │ ├: Linux 6.18.3-arch1-1 ████ ████ ████ ████ │ ├: Hyprland 0.53.1 (Wayland) ████ ████ ████ ████ │ ├: ghostty 1.2.3-arch2 ████ ████ ████ ████ │ ├󰏖: 1029 (pacman) ████ ████ ████ ████ │ ├󰸌: Hackerman ●●●●●●●● ████ ████ ████ ████ └ └: iA Writer Mono S (12pt) ████ ████ ████ ████ └────���───────────────────────────────────────────────┘ ████ ██████████████████████████████████████ ████ ████ ██████████████████████████████████████ ████ ┌────────────────Age / Uptime / Update───────────────┐ ████ ████ ████ 󱦟 OS Age: 38 days ████ ████ ████ 󱫐 Uptime: 22 hours, 22 mins █████████████████████████████ ████████████████████  Update: Saturday, January 31 2026 at 18:07 █████████████████████████████ ████████████████████ └────────────────────────────────────────────────────┘ ~ ❯ ollama --version ollama version is 0.13.5 ~ ❯ time ollama run granite4:latest "Why is the number 42 funny for software developers?" The number 42 has become a popular joke among software developers and tech enthusiasts, often referred to as "the answer to everything" from Douglas Adams' science fiction series, The Hitchhiker's Guide to the Galaxy. In the book, a supercomputer named Deep Thought calculates the optimal solution for the Ultimate Question of Life, Universe, and Everything, which is humorously revealed to be 42. The humor behind this number in software development stems from its randomness and lack of context. Developers often use it as an inside joke when faced with complex problems or when they can't find a straightforward solution to their code or bugs. By referencing the number 42, developers create a humorous way to acknowledge the frustration that comes with debugging and problem-solving. Additionally, the number has been adopted by several programming memes and jokes over time. For example: 1. "You have two cows... (and now you have the Internet of Things)" - This is a reference to the Unix philosophy of breaking down problems into smaller parts, which can lead to even more complex issues. 2. The Hitchhiker's Guide to the Galaxy meme: Developers might use memes or images from the book to express their frustration with coding challenges or when they feel overwhelmed by tasks at hand. 3. "The 42 Project" - A tongue-in-cheek reference to a hypothetical project that has no real purpose, often used to illustrate how some software projects can spiral out of control without clear direction or goals. In summary, the number 42 is funny for software developers because it represents an abstract concept (the Ultimate Question) applied humorously in everyday programming scenarios. It also serves as a reminder that sometimes there are no straightforward answers and that we must embrace the complexity and unpredictability of coding challenges. real 0m5.218s user 0m0.019s sys 0m0.008s ``` @rick-github I suggest this issue should be closed.
Author
Owner

@magnusahlden commented on GitHub (Feb 10, 2026):

@bradg-GMA make sure you configure the video memory allocation in bios to be fixed (dynamic is too slow when running large models such as llama4). I use a minisforum version of the a395 mini computer and it's super-fast for even larger models.

<!-- gh-comment-id:3877345929 --> @magnusahlden commented on GitHub (Feb 10, 2026): @bradg-GMA make sure you configure the video memory allocation in bios to be fixed (dynamic is too slow when running large models such as llama4). I use a minisforum version of the a395 mini computer and it's super-fast for even larger models.
Author
Owner

@pkoutsogilas commented on GitHub (Feb 12, 2026):

I get the same issue even when running a very small model like llama3.2:latest. I'm running Fedora 43 (6.18.9-200.fc43.x86_64) with ollama 0.15.6 on my Framework Desktop. I've allocated 96GB of memory to the iGPU in the BIOS so it's not dynamic.

I've included the complete service log. ollama-service.log

<!-- gh-comment-id:3889757136 --> @pkoutsogilas commented on GitHub (Feb 12, 2026): I get the same issue even when running a very small model like llama3.2:latest. I'm running Fedora 43 (6.18.9-200.fc43.x86_64) with ollama 0.15.6 on my Framework Desktop. I've allocated 96GB of memory to the iGPU in the BIOS so it's not dynamic. I've included the complete service log. [ollama-service.log](https://github.com/user-attachments/files/25256731/ollama-service.log)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9147