[GH-ISSUE #13033] Issue: Ollama 0.12.10 fails on NVIDIA Jetson Thor (Regression from 0.12.9) #34395

New Issue

GiteaMirror · 2026-04-22T17:54:56-05:00

GiteaMirror commented

2026-04-22 17:54:56 -05:00

Originally created by @tokk-nv on GitHub (Nov 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13033

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Bug: Ollama 0.12.10 fails to run models on NVIDIA Jetson Thor with 500 Internal Server Error

When attempting to run any model with Ollama 0.12.10 on Jetson Thor, the command fails with:

Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF

Expected behavior: The model should load and run successfully, as it does in version 0.12.9.

Steps to reproduce:

Install Ollama 0.12.10 on NVIDIA Jetson Thor using: curl -fsSL https://ollama.com/install.sh | sh
Pull any model: ollama pull gemma3:4b
Try to run the model: ollama run gemma3:4b "test"
Error occurs: 500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF

Workaround: Downgrading to version 0.12.9 resolves the issue completely.

Relevant log output

jetson@jat04-iso0818:~$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
[sudo] password for jetson:
>>> Downloading Linux arm64 bundle
######################################################################## 100.0%
WARNING: Unsupported JetPack version detected.  GPU may not be supported
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA JetPack ready.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.

jetson@jat04-iso0818:~$ ollama pull gemma3:4b
pulling manifest
pulling aeda25e63ebd: 100% ▕███████████████████████████▏ 3.3 GB
pulling e0a42594d802: 100% ▕███████████████████████████▏  358 B
pulling dd084c7d92a3: 100% ▕███████████████████████████▏ 8.4 KB
pulling 3116c5225075: 100% ▕███████████████████████████▏   77 B
pulling b6ae5839783f: 100% ▕███████████████████████████▏  489 B
verifying sha256 digest
writing manifest
success

jetson@jat04-iso0818:~$ ollama run gemma3:4b "test"
Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF

jetson@jat04-iso0818:~$ # Uninstall current version
sudo systemctl stop ollama
sudo systemctl disable ollama
sudo rm -rf /usr/local/bin/ollama /etc/systemd/system/ollama.service

# Install working version 0.12.9
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.12.9 sh
Removed "/etc/systemd/system/default.target.wants/ollama.service".
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux arm64 bundle
######################################################################## 100.0%
WARNING: Unsupported JetPack version detected.  GPU may not be supported
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA JetPack ready.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.

jetson@jat04-iso0818:~$ ollama run gemma3:4b "test"
Okay! I see your test.

Is there anything specific you wanted me to do with this "test"? Do you want me to:

*   Respond with a simple acknowledgment? (Like I've done here)
*   Answer a question?
*   Try a different task?

Let me know!

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.12.10

Originally created by @tokk-nv on GitHub (Nov 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13033 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? **Bug**: Ollama 0.12.10 fails to run models on NVIDIA Jetson Thor with `500 Internal Server Error` When attempting to run any model with Ollama 0.12.10 on Jetson Thor, the command fails with: ``` Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF ``` **Expected behavior**: The model should load and run successfully, as it does in version 0.12.9. **Steps to reproduce**: 1. Install Ollama 0.12.10 on NVIDIA Jetson Thor using: `curl -fsSL https://ollama.com/install.sh | sh` 2. Pull any model: `ollama pull gemma3:4b` 3. Try to run the model: `ollama run gemma3:4b "test"` 4. Error occurs: `500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF` **Workaround**: Downgrading to version 0.12.9 resolves the issue completely. ### Relevant log output ```shell jetson@jat04-iso0818:~$ curl -fsSL https://ollama.com/install.sh | sh >>> Installing ollama to /usr/local [sudo] password for jetson: >>> Downloading Linux arm64 bundle ######################################################################## 100.0% WARNING: Unsupported JetPack version detected. GPU may not be supported >>> Creating ollama user... >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service. >>> NVIDIA JetPack ready. >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. jetson@jat04-iso0818:~$ ollama pull gemma3:4b pulling manifest pulling aeda25e63ebd: 100% ▕███████████████████████████▏ 3.3 GB pulling e0a42594d802: 100% ▕███████████████████████████▏ 358 B pulling dd084c7d92a3: 100% ▕███████████████████████████▏ 8.4 KB pulling 3116c5225075: 100% ▕███████████████████████████▏ 77 B pulling b6ae5839783f: 100% ▕███████████████████████████▏ 489 B verifying sha256 digest writing manifest success jetson@jat04-iso0818:~$ ollama run gemma3:4b "test" Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF jetson@jat04-iso0818:~$ # Uninstall current version sudo systemctl stop ollama sudo systemctl disable ollama sudo rm -rf /usr/local/bin/ollama /etc/systemd/system/ollama.service # Install working version 0.12.9 curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.12.9 sh Removed "/etc/systemd/system/default.target.wants/ollama.service". >>> Cleaning up old version at /usr/local/lib/ollama >>> Installing ollama to /usr/local >>> Downloading Linux arm64 bundle ######################################################################## 100.0% WARNING: Unsupported JetPack version detected. GPU may not be supported >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service. >>> NVIDIA JetPack ready. >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. jetson@jat04-iso0818:~$ ollama run gemma3:4b "test" Okay! I see your test. Is there anything specific you wanted me to do with this "test"? Do you want me to: * Respond with a simple acknowledgment? (Like I've done here) * Answer a question? * Try a different task? Let me know! ``` ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.12.10

GiteaMirror added the nvidia bug labels 2026-04-22 17:54:56 -05:00

GiteaMirror closed this issue

2026-04-22 17:54:57 -05:00

GiteaMirror commented

2026-04-22 17:54:58 -05:00

@rick-github commented on GitHub (Nov 10, 2025):

Server log may help in debugging.

@rick-github commented on GitHub (Nov 10, 2025): [Server log](https://docs.ollama.com/troubleshooting) may help in debugging.

GiteaMirror commented

2026-04-22 17:54:58 -05:00

@Yamakuzure commented on GitHub (Nov 10, 2025):

I get:

 $ ollama run devstral:24b
Error: 500 Internal Server Error: llama runner process has terminated: exit status 2

The log looks like Go crashes.
Full Log: https://pastebin.com/bs0A70US

Crash snippet:

graph_reserve: failed to allocate compute buffers
SIGSEGV: segmentation violation
PC=0x7f13fecf1ad6 m=0 sigcode=1 addr=0x55a97ab19578
signal arrived during cgo execution

goroutine 36 gp=0xc0005056c0 m=0 mp=0x55ac6e68fc80 [syscall]:
runtime.cgocall(0x55ac6d5faee0, 0xc0004eebf8)
	/usr/lib/go/src/runtime/cgocall.go:167 +0x4b fp=0xc0004eebd0 sp=0xc0004eeb98 pc=0x55ac6c8edaab
github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x55ac96c9e010, {0x1000, 0x200, 0x200, 0x1, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...})
	_cgo_gotypes.go:749 +0x4e fp=0xc0004eebf8 sp=0xc0004eebd0 pc=0x55ac6cca9d4e
github.com/ollama/ollama/llama.NewContextWithModel.func1(...)
	/data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/llama/llama.go:280
github.com/ollama/ollama/llama.NewContextWithModel(0xc000366510, {{0x1000, 0x200, 0x200, 0x1, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...}})
	/data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/llama/llama.go:280 +0x158 fp=0xc0004eed98 sp=0xc0004eebf8 pc=0x55ac6ccadb18
github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc000714320, {0x27, 0x0, 0x1, {0xc0003ce4a0, 0x2, 0x2}, 0xc000614160, 0x0}, {0x7fffbdc5b307, ...}, ...)
	/data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:797 +0x198 fp=0xc0004eeee0 sp=0xc0004eed98 pc=0x55ac6cd6be78
github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2()
	/data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:879 +0x175 fp=0xc0004eefe0 sp=0xc0004eeee0 pc=0x55ac6cd6cf15
runtime.goexit({})
	/usr/lib/go/src/runtime/asm_amd64.s:1693 +0x1 fp=0xc0004eefe8 sp=0xc0004eefe0 pc=0x55ac6c8f9061
created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 51
	/data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:879 +0x7ce

@Yamakuzure commented on GitHub (Nov 10, 2025): I get: ``` $ ollama run devstral:24b Error: 500 Internal Server Error: llama runner process has terminated: exit status 2 ``` The log looks like Go crashes. Full Log: https://pastebin.com/bs0A70US Crash snippet: ``` graph_reserve: failed to allocate compute buffers SIGSEGV: segmentation violation PC=0x7f13fecf1ad6 m=0 sigcode=1 addr=0x55a97ab19578 signal arrived during cgo execution goroutine 36 gp=0xc0005056c0 m=0 mp=0x55ac6e68fc80 [syscall]: runtime.cgocall(0x55ac6d5faee0, 0xc0004eebf8) /usr/lib/go/src/runtime/cgocall.go:167 +0x4b fp=0xc0004eebd0 sp=0xc0004eeb98 pc=0x55ac6c8edaab github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x55ac96c9e010, {0x1000, 0x200, 0x200, 0x1, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...}) _cgo_gotypes.go:749 +0x4e fp=0xc0004eebf8 sp=0xc0004eebd0 pc=0x55ac6cca9d4e github.com/ollama/ollama/llama.NewContextWithModel.func1(...) /data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/llama/llama.go:280 github.com/ollama/ollama/llama.NewContextWithModel(0xc000366510, {{0x1000, 0x200, 0x200, 0x1, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...}}) /data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/llama/llama.go:280 +0x158 fp=0xc0004eed98 sp=0xc0004eebf8 pc=0x55ac6ccadb18 github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc000714320, {0x27, 0x0, 0x1, {0xc0003ce4a0, 0x2, 0x2}, 0xc000614160, 0x0}, {0x7fffbdc5b307, ...}, ...) /data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:797 +0x198 fp=0xc0004eeee0 sp=0xc0004eed98 pc=0x55ac6cd6be78 github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2() /data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:879 +0x175 fp=0xc0004eefe0 sp=0xc0004eeee0 pc=0x55ac6cd6cf15 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1693 +0x1 fp=0xc0004eefe8 sp=0xc0004eefe0 pc=0x55ac6c8f9061 created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 51 /data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:879 +0x7ce ```

GiteaMirror commented

2026-04-22 17:54:58 -05:00

@rick-github commented on GitHub (Nov 10, 2025):

Attach the full log to this issue.

@rick-github commented on GitHub (Nov 10, 2025): Attach the full log to this issue.

GiteaMirror commented

2026-04-22 17:54:59 -05:00

@Yamakuzure commented on GitHub (Nov 10, 2025):

Okay... Just a side note, downgrading to ollama-0.12.9 fixed the issue.

 ~ $ ollama ps
NAME            ID              SIZE     PROCESSOR         CONTEXT    UNTIL              
devstral:24b    9bd74193e939    16 GB    6%/94% CPU/GPU    4096       4 minutes from now

server.log.zip

@Yamakuzure commented on GitHub (Nov 10, 2025): Okay... Just a side note, downgrading to ollama-0.12.9 fixed the issue. ``` ~ $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL devstral:24b 9bd74193e939 16 GB 6%/94% CPU/GPU 4096 4 minutes from now ``` [server.log.zip](https://github.com/user-attachments/files/23457177/server.log.zip)

GiteaMirror commented

2026-04-22 17:54:59 -05:00

@rick-github commented on GitHub (Nov 10, 2025):

time=2025-11-10T15:36:09.448+01:00 level=INFO source=server.go:522 msg=offload library=CUDA layers.requested=-1
 layers.model=41 layers.offload=39 layers.split=[39] memory.available="[14.4 GiB]" memory.gpu_overhead="0 B"
 memory.required.full="15.2 GiB" memory.required.partial="14.3 GiB" memory.required.kv="640.0 MiB"
 memory.required.allocations="[14.3 GiB]" memory.weights.total="13.0 GiB" memory.weights.repeating="12.5 GiB"
 memory.weights.nonrepeating="525.0 MiB" memory.graph.full="304.0 MiB" memory.graph.partial="801.0 MiB"

graph_reserve: failed to allocate compute buffers

Ollama loaded 39 of 41 layers into GPU, using 14.3G of 14.4G available, leaving 0.1G wiggle room. So it looks like ollama underestimated the amount of RAM required since it ultimately hit out-of-memory. The runner is using the old engine which has been known to be a little inaccurate with memory estimation. The model architecture is llama so you could try setting OLLAMA_NEW_ENGINE=1 in the server environment to use the more accurate new engine. Failing that, some mitigations for OOM can be found here.

@rick-github commented on GitHub (Nov 10, 2025): ``` time=2025-11-10T15:36:09.448+01:00 level=INFO source=server.go:522 msg=offload library=CUDA layers.requested=-1 layers.model=41 layers.offload=39 layers.split=[39] memory.available="[14.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="15.2 GiB" memory.required.partial="14.3 GiB" memory.required.kv="640.0 MiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="13.0 GiB" memory.weights.repeating="12.5 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="304.0 MiB" memory.graph.partial="801.0 MiB" graph_reserve: failed to allocate compute buffers ``` Ollama loaded 39 of 41 layers into GPU, using 14.3G of 14.4G available, leaving 0.1G wiggle room. So it looks like ollama underestimated the amount of RAM required since it ultimately hit out-of-memory. The runner is using the old engine which has been known to be a little inaccurate with memory estimation. The model architecture is `llama` so you could try setting `OLLAMA_NEW_ENGINE=1` in the server environment to use the more accurate new engine. Failing that, some mitigations for OOM can be found [here](https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288).

GiteaMirror commented

2026-04-22 17:54:59 -05:00

@Yamakuzure commented on GitHub (Nov 10, 2025):

It works with ollama 0.12.9

 $ nvidia-smi | grep ollama
|    0   N/A  N/A           22239    C+G   /usr/bin/ollama                       12166MiB |
 $ ollama ps
NAME            ID              SIZE     PROCESSOR          CONTEXT    UNTIL               
devstral:24b    9bd74193e939    22 GB    34%/66% CPU/GPU    32768      23 minutes from now    
 $ env | grep OLLAMA
OLLAMA_NEW_ESTIMATES=1
OLLAMA_FLASH_ATTENTION=1

So it looks like a regression to me...

Anyway, I will try adding OLLAMA_NEW_ENGINE=1 to env tomorrow.
Thank you very much for the information!

@Yamakuzure commented on GitHub (Nov 10, 2025): It works with ollama 0.12.9 ``` $ nvidia-smi | grep ollama | 0 N/A N/A 22239 C+G /usr/bin/ollama 12166MiB | $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL devstral:24b 9bd74193e939 22 GB 34%/66% CPU/GPU 32768 23 minutes from now $ env | grep OLLAMA OLLAMA_NEW_ESTIMATES=1 OLLAMA_FLASH_ATTENTION=1 ``` So it looks like a regression to me... Anyway, I will try adding `OLLAMA_NEW_ENGINE=1` to env tomorrow. Thank you very much for the information!

GiteaMirror commented

2026-04-22 17:55:00 -05:00

@rick-github commented on GitHub (Nov 10, 2025):

FYI OLLAMA_NEW_ESTIMATES is no longer an ollama configuration variable, new estimates are enabled by setting OLLAMA_NEW_ENGINE=1.

@rick-github commented on GitHub (Nov 10, 2025): FYI `OLLAMA_NEW_ESTIMATES` is no longer an ollama configuration variable, new estimates are enabled by setting `OLLAMA_NEW_ENGINE=1`.

GiteaMirror commented

2026-04-22 17:55:00 -05:00

@Yamakuzure commented on GitHub (Nov 10, 2025):

Setting OLLAMA_NEW_ENGINE=1 did the trick!

 $ ollama --version
ollama version is 0.12.10
 $ ollama ps
NAME          ID              SIZE     PROCESSOR          CONTEXT    UNTIL              
cogito:32b    0b4aab772f57    21 GB    31%/69% CPU/GPU    4096       4 minutes from now

Thank you very much!

@Yamakuzure commented on GitHub (Nov 10, 2025): Setting `OLLAMA_NEW_ENGINE=1` did the trick! ``` $ ollama --version ollama version is 0.12.10 $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL cogito:32b 0b4aab772f57 21 GB 31%/69% CPU/GPU 4096 4 minutes from now ``` Thank you very much!

GiteaMirror commented

2026-04-22 17:55:01 -05:00

@acochrane commented on GitHub (Nov 11, 2025):

I have this issue as well
Even with OLLAMA_NEW_ENGINE=1

Maybe it has something to do with CUDA arch, 1100 is not included, but that's the Thor arch. See below.

2025-11-11T16:17:38.935821-07:00 granite ollama[559458]: Device 0: NVIDIA Thor, compute capability 11.0, VMM: yes, ID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600
2025-11-11T16:17:38.947558-07:00 granite ollama[559458]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
2025-11-11T16:17:38.947730-07:00 granite ollama[559458]: time=2025-11-11T16:17:38.947-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
2025-11-11T16:17:39.117307-07:00 granite ollama[559458]: CUDA error: an internal operation failed

Full syslog snip at pastebin

@acochrane commented on GitHub (Nov 11, 2025): I have this issue as well Even with OLLAMA_NEW_ENGINE=1 Maybe it has something to do with CUDA arch, 1100 is not included, but that's the Thor arch. See below. > 2025-11-11T16:17:38.935821-07:00 granite ollama[559458]: Device 0: NVIDIA Thor, compute capability 11.0, VMM: yes, ID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 > 2025-11-11T16:17:38.947558-07:00 granite ollama[559458]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so > 2025-11-11T16:17:38.947730-07:00 granite ollama[559458]: time=2025-11-11T16:17:38.947-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) > 2025-11-11T16:17:39.117307-07:00 granite ollama[559458]: CUDA error: an internal operation failed Full syslog snip at [pastebin](https://pastebin.com/6SUx0XNj)

GiteaMirror commented

2026-04-22 17:55:01 -05:00

@dhiltgen commented on GitHub (Nov 14, 2025):

@acochrane your logs aren't complete - they're missing the initial startup where we do GPU discovery. My hunch is you're running an older driver (before 580) which is causing us to fall back to CUDA v12 instead of using v13. We only support CC 11 on v13 currently. If that is the case, and you upgrade to driver 580 or newer, it should start working.

@dhiltgen commented on GitHub (Nov 14, 2025): @acochrane your logs aren't complete - they're missing the initial startup where we do GPU discovery. My hunch is you're running an older driver (before 580) which is causing us to fall back to CUDA v12 instead of using v13. We only support CC 11 on v13 currently. If that is the case, and you upgrade to driver 580 or newer, it should start working.

GiteaMirror commented

2026-04-22 17:55:01 -05:00

@dhiltgen commented on GitHub (Nov 14, 2025):

@tokk-nv can you provide server logs so we can see what might be going wrong?

@dhiltgen commented on GitHub (Nov 14, 2025): @tokk-nv can you provide server logs so we can see what might be going wrong?

GiteaMirror commented

2026-04-22 17:55:02 -05:00

@acochrane commented on GitHub (Nov 19, 2025):

Sorry about the incomplete logs @dhiltgen. I was messing around with the new version 0.12.11 and noticed something really weird.
It looks like setting the OLLAMA_HOST=0.0.0.0 environment variable (through the systemd/system file) somehow pushes ollama to load the cuda_v12 driver, while leaving that environment variable unspecified (in the system file) allows it to load the cuda_v13 ggml.

See the pastbin for a log showing a start with OLLAMA_HOST unset loading as default at 127.0.0.1:11434 followed by a system restart with the environment variable set to 0.0.0.0, loading as 0.0.0.0:11434. The first one loads cuda_v13 and runs fine, the second loads cuda_v12 and crashes.

Not really sure if there's anything I can try on my side to force the external listener to load the cuda_v13. Any ideas?

@acochrane commented on GitHub (Nov 19, 2025): Sorry about the incomplete logs @dhiltgen. I was messing around with the new version 0.12.11 and noticed something really weird. It looks like setting the OLLAMA_HOST=0.0.0.0 environment variable (through the systemd/system file) somehow pushes ollama to load the cuda_v12 driver, while leaving that environment variable unspecified (in the system file) allows it to load the cuda_v13 ggml. See the [pastbin](https://pastebin.com/WsBEWNvk) for a log showing a start with OLLAMA_HOST unset loading as default at 127.0.0.1:11434 followed by a system restart with the environment variable set to 0.0.0.0, loading as 0.0.0.0:11434. The first one loads cuda_v13 and runs fine, the second loads cuda_v12 and crashes. Not really sure if there's anything I can try on my side to force the external listener to load the cuda_v13. Any ideas?

GiteaMirror commented

2026-04-22 17:55:03 -05:00

@acochrane commented on GitHub (Nov 20, 2025):

If I move the cuda_v12 directory from its install location the runner loads cuda_v13 in either 'internal listener' or external listener' case and so far works fine.

mv /usr/local/lib/ollama/cuda_v12 $HOME/Downloads/

Again, this is on the jetson thor, not recommending, but it gets me passed Error: 500

@acochrane commented on GitHub (Nov 20, 2025): If I move the cuda_v12 directory from its install location the runner loads cuda_v13 in either 'internal listener' or external listener' case and so far works fine. `mv /usr/local/lib/ollama/cuda_v12 $HOME/Downloads/` Again, this is on the jetson thor, not recommending, but it gets me passed `Error: 500`

GiteaMirror commented

2026-04-22 17:55:03 -05:00

@dhiltgen commented on GitHub (Nov 20, 2025):

@acochrane can you set OLLAMA_DEBUG=2 and share the server startup logs for your failure case?

@dhiltgen commented on GitHub (Nov 20, 2025): @acochrane can you set OLLAMA_DEBUG=2 and share the server startup logs for your failure case?

GiteaMirror commented

2026-04-22 17:55:04 -05:00

@acochrane commented on GitHub (Nov 20, 2025):

Here it is.
pastebin

@acochrane commented on GitHub (Nov 20, 2025): Here it is. [pastebin](https://pastebin.com/vsXfzmi5)

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

dhiltgen/llama-runner

parth-launch-codex-app

hoyyeva/anthropic-local-image-path

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#34395