[GH-ISSUE #10705] Error: llama runner process has terminated: GGML_ASSERT(n_backends <= GGML_SCHED_MAX_BACKENDS) failed #53547

Closed
opened 2026-04-29 03:45:43 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @fahadshery on GitHub (May 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10705

What is the issue?

Hi,

I just found the similar issue here...https://github.com/ollama/ollama/issues/8920

I am using 16 Nvidia A16 GPUs...

I am using Ollama in Docker....

Where do I set GGML_SCHED_MAX_BACKENDS?

thanks

Relevant log output

# ollama run llama4
Error: llama runner process has terminated: GGML_ASSERT(n_backends <= GGML_SCHED_MAX_BACKENDS) failed
# nvidia-smi
Wed May 14 07:21:23 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A16                     Off |   00000000:1B:00.0 Off |                    0 |
|  0%   38C    P0             25W /   62W |      62MiB /  15356MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A16                     Off |   00000000:1C:00.0 Off |                    0 |
|  0%   31C    P8             14W /   62W |       3MiB /  15356MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

snipped....

OS

Docker

GPU

16 x Nvidia A16

CPU

No response

Ollama version

0.6.8

Originally created by @fahadshery on GitHub (May 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10705 ### What is the issue? Hi, I just found the similar issue here...https://github.com/ollama/ollama/issues/8920 I am using 16 `Nvidia A16` GPUs... I am using Ollama in Docker.... Where do I set `GGML_SCHED_MAX_BACKENDS`? thanks ### Relevant log output ```shell # ollama run llama4 Error: llama runner process has terminated: GGML_ASSERT(n_backends <= GGML_SCHED_MAX_BACKENDS) failed # nvidia-smi Wed May 14 07:21:23 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A16 Off | 00000000:1B:00.0 Off | 0 | | 0% 38C P0 25W / 62W | 62MiB / 15356MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA A16 Off | 00000000:1C:00.0 Off | 0 | | 0% 31C P8 14W / 62W | 3MiB / 15356MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ snipped.... ``` ### OS Docker ### GPU 16 x Nvidia A16 ### CPU _No response_ ### Ollama version 0.6.8
GiteaMirror added the bug label 2026-04-29 03:45:43 -05:00
Author
Owner

@rick-github commented on GitHub (May 14, 2025):

0aa8b371dd/ml/backend/ggml/ggml/src/ggml-backend.cpp (L611-L613)

<!-- gh-comment-id:2879354538 --> @rick-github commented on GitHub (May 14, 2025): https://github.com/ollama/ollama/blob/0aa8b371ddd24a2d0ce859903a9284e9544f5c78/ml/backend/ggml/ggml/src/ggml-backend.cpp#L611-L613
Author
Owner

@fahadshery commented on GitHub (May 14, 2025):

ggml-backend.cpp

Hi Rick,

where is the location of this file? I am using docker. is this something we could pass as env vars?

<!-- gh-comment-id:2880468347 --> @fahadshery commented on GitHub (May 14, 2025): > ggml-backend.cpp Hi Rick, where is the location of this file? I am using docker. is this something we could pass as `env` vars?
Author
Owner

@rick-github commented on GitHub (May 14, 2025):

The location of the file is the link in the code snippet. This is a compile time option, it can't be influenced by environment variables. If you want to change it, clone the repo, edit the file, and build the image container.

<!-- gh-comment-id:2880487376 --> @rick-github commented on GitHub (May 14, 2025): The location of the file is the link in the code snippet. This is a compile time option, it can't be influenced by environment variables. If you want to change it, clone the repo, edit the file, and [build](https://github.com/ollama/ollama/blob/main/docs/development.md#docker) the image container.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53547