[GH-ISSUE #13033] Issue: Ollama 0.12.10 fails on NVIDIA Jetson Thor (Regression from 0.12.9) #34395

Closed
opened 2026-04-22 17:54:56 -05:00 by GiteaMirror · 15 comments
Owner

Originally created by @tokk-nv on GitHub (Nov 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13033

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Bug: Ollama 0.12.10 fails to run models on NVIDIA Jetson Thor with 500 Internal Server Error

When attempting to run any model with Ollama 0.12.10 on Jetson Thor, the command fails with:

Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF

Expected behavior: The model should load and run successfully, as it does in version 0.12.9.

Steps to reproduce:

  1. Install Ollama 0.12.10 on NVIDIA Jetson Thor using: curl -fsSL https://ollama.com/install.sh | sh
  2. Pull any model: ollama pull gemma3:4b
  3. Try to run the model: ollama run gemma3:4b "test"
  4. Error occurs: 500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF

Workaround: Downgrading to version 0.12.9 resolves the issue completely.

Relevant log output

jetson@jat04-iso0818:~$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
[sudo] password for jetson:
>>> Downloading Linux arm64 bundle
######################################################################## 100.0%
WARNING: Unsupported JetPack version detected.  GPU may not be supported
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA JetPack ready.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.

jetson@jat04-iso0818:~$ ollama pull gemma3:4b
pulling manifest
pulling aeda25e63ebd: 100% ▕███████████████████████████▏ 3.3 GB
pulling e0a42594d802: 100% ▕███████████████████████████▏  358 B
pulling dd084c7d92a3: 100% ▕███████████████████████████▏ 8.4 KB
pulling 3116c5225075: 100% ▕███████████████████████████▏   77 B
pulling b6ae5839783f: 100% ▕███████████████████████████▏  489 B
verifying sha256 digest
writing manifest
success

jetson@jat04-iso0818:~$ ollama run gemma3:4b "test"
Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF

jetson@jat04-iso0818:~$ # Uninstall current version
sudo systemctl stop ollama
sudo systemctl disable ollama
sudo rm -rf /usr/local/bin/ollama /etc/systemd/system/ollama.service

# Install working version 0.12.9
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.12.9 sh
Removed "/etc/systemd/system/default.target.wants/ollama.service".
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux arm64 bundle
######################################################################## 100.0%
WARNING: Unsupported JetPack version detected.  GPU may not be supported
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA JetPack ready.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.

jetson@jat04-iso0818:~$ ollama run gemma3:4b "test"
Okay! I see your test.

Is there anything specific you wanted me to do with this "test"? Do you want me to:

*   Respond with a simple acknowledgment? (Like I've done here)
*   Answer a question?
*   Try a different task?

Let me know!

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.12.10

Originally created by @tokk-nv on GitHub (Nov 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13033 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? **Bug**: Ollama 0.12.10 fails to run models on NVIDIA Jetson Thor with `500 Internal Server Error` When attempting to run any model with Ollama 0.12.10 on Jetson Thor, the command fails with: ``` Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF ``` **Expected behavior**: The model should load and run successfully, as it does in version 0.12.9. **Steps to reproduce**: 1. Install Ollama 0.12.10 on NVIDIA Jetson Thor using: `curl -fsSL https://ollama.com/install.sh | sh` 2. Pull any model: `ollama pull gemma3:4b` 3. Try to run the model: `ollama run gemma3:4b "test"` 4. Error occurs: `500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF` **Workaround**: Downgrading to version 0.12.9 resolves the issue completely. ### Relevant log output ```shell jetson@jat04-iso0818:~$ curl -fsSL https://ollama.com/install.sh | sh >>> Installing ollama to /usr/local [sudo] password for jetson: >>> Downloading Linux arm64 bundle ######################################################################## 100.0% WARNING: Unsupported JetPack version detected. GPU may not be supported >>> Creating ollama user... >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service. >>> NVIDIA JetPack ready. >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. jetson@jat04-iso0818:~$ ollama pull gemma3:4b pulling manifest pulling aeda25e63ebd: 100% ▕███████████████████████████▏ 3.3 GB pulling e0a42594d802: 100% ▕███████████████████████████▏ 358 B pulling dd084c7d92a3: 100% ▕███████████████████████████▏ 8.4 KB pulling 3116c5225075: 100% ▕███████████████████████████▏ 77 B pulling b6ae5839783f: 100% ▕███████████████████████████▏ 489 B verifying sha256 digest writing manifest success jetson@jat04-iso0818:~$ ollama run gemma3:4b "test" Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:46759/load": EOF jetson@jat04-iso0818:~$ # Uninstall current version sudo systemctl stop ollama sudo systemctl disable ollama sudo rm -rf /usr/local/bin/ollama /etc/systemd/system/ollama.service # Install working version 0.12.9 curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.12.9 sh Removed "/etc/systemd/system/default.target.wants/ollama.service". >>> Cleaning up old version at /usr/local/lib/ollama >>> Installing ollama to /usr/local >>> Downloading Linux arm64 bundle ######################################################################## 100.0% WARNING: Unsupported JetPack version detected. GPU may not be supported >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service. >>> NVIDIA JetPack ready. >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. jetson@jat04-iso0818:~$ ollama run gemma3:4b "test" Okay! I see your test. Is there anything specific you wanted me to do with this "test"? Do you want me to: * Respond with a simple acknowledgment? (Like I've done here) * Answer a question? * Try a different task? Let me know! ``` ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.12.10
GiteaMirror added the nvidiabug labels 2026-04-22 17:54:56 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 10, 2025):

Server log may help in debugging.

<!-- gh-comment-id:3510207711 --> @rick-github commented on GitHub (Nov 10, 2025): [Server log](https://docs.ollama.com/troubleshooting) may help in debugging.
Author
Owner

@Yamakuzure commented on GitHub (Nov 10, 2025):

I get:

 $ ollama run devstral:24b
Error: 500 Internal Server Error: llama runner process has terminated: exit status 2

The log looks like Go crashes.
Full Log: https://pastebin.com/bs0A70US

Crash snippet:

graph_reserve: failed to allocate compute buffers
SIGSEGV: segmentation violation
PC=0x7f13fecf1ad6 m=0 sigcode=1 addr=0x55a97ab19578
signal arrived during cgo execution

goroutine 36 gp=0xc0005056c0 m=0 mp=0x55ac6e68fc80 [syscall]:
runtime.cgocall(0x55ac6d5faee0, 0xc0004eebf8)
	/usr/lib/go/src/runtime/cgocall.go:167 +0x4b fp=0xc0004eebd0 sp=0xc0004eeb98 pc=0x55ac6c8edaab
github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x55ac96c9e010, {0x1000, 0x200, 0x200, 0x1, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...})
	_cgo_gotypes.go:749 +0x4e fp=0xc0004eebf8 sp=0xc0004eebd0 pc=0x55ac6cca9d4e
github.com/ollama/ollama/llama.NewContextWithModel.func1(...)
	/data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/llama/llama.go:280
github.com/ollama/ollama/llama.NewContextWithModel(0xc000366510, {{0x1000, 0x200, 0x200, 0x1, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...}})
	/data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/llama/llama.go:280 +0x158 fp=0xc0004eed98 sp=0xc0004eebf8 pc=0x55ac6ccadb18
github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc000714320, {0x27, 0x0, 0x1, {0xc0003ce4a0, 0x2, 0x2}, 0xc000614160, 0x0}, {0x7fffbdc5b307, ...}, ...)
	/data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:797 +0x198 fp=0xc0004eeee0 sp=0xc0004eed98 pc=0x55ac6cd6be78
github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2()
	/data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:879 +0x175 fp=0xc0004eefe0 sp=0xc0004eeee0 pc=0x55ac6cd6cf15
runtime.goexit({})
	/usr/lib/go/src/runtime/asm_amd64.s:1693 +0x1 fp=0xc0004eefe8 sp=0xc0004eefe0 pc=0x55ac6c8f9061
created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 51
	/data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:879 +0x7ce
<!-- gh-comment-id:3512143701 --> @Yamakuzure commented on GitHub (Nov 10, 2025): I get: ``` $ ollama run devstral:24b Error: 500 Internal Server Error: llama runner process has terminated: exit status 2 ``` The log looks like Go crashes. Full Log: https://pastebin.com/bs0A70US Crash snippet: ``` graph_reserve: failed to allocate compute buffers SIGSEGV: segmentation violation PC=0x7f13fecf1ad6 m=0 sigcode=1 addr=0x55a97ab19578 signal arrived during cgo execution goroutine 36 gp=0xc0005056c0 m=0 mp=0x55ac6e68fc80 [syscall]: runtime.cgocall(0x55ac6d5faee0, 0xc0004eebf8) /usr/lib/go/src/runtime/cgocall.go:167 +0x4b fp=0xc0004eebd0 sp=0xc0004eeb98 pc=0x55ac6c8edaab github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x55ac96c9e010, {0x1000, 0x200, 0x200, 0x1, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...}) _cgo_gotypes.go:749 +0x4e fp=0xc0004eebf8 sp=0xc0004eebd0 pc=0x55ac6cca9d4e github.com/ollama/ollama/llama.NewContextWithModel.func1(...) /data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/llama/llama.go:280 github.com/ollama/ollama/llama.NewContextWithModel(0xc000366510, {{0x1000, 0x200, 0x200, 0x1, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...}}) /data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/llama/llama.go:280 +0x158 fp=0xc0004eed98 sp=0xc0004eebf8 pc=0x55ac6ccadb18 github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc000714320, {0x27, 0x0, 0x1, {0xc0003ce4a0, 0x2, 0x2}, 0xc000614160, 0x0}, {0x7fffbdc5b307, ...}, ...) /data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:797 +0x198 fp=0xc0004eeee0 sp=0xc0004eed98 pc=0x55ac6cd6be78 github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2() /data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:879 +0x175 fp=0xc0004eefe0 sp=0xc0004eeee0 pc=0x55ac6cd6cf15 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1693 +0x1 fp=0xc0004eefe8 sp=0xc0004eefe0 pc=0x55ac6c8f9061 created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 51 /data/portage/portage/sci-ml/ollama-0.12.10/work/ollama-0.12.10/runner/llamarunner/runner.go:879 +0x7ce ```
Author
Owner

@rick-github commented on GitHub (Nov 10, 2025):

Attach the full log to this issue.

<!-- gh-comment-id:3512192739 --> @rick-github commented on GitHub (Nov 10, 2025): Attach the full log to this issue.
Author
Owner

@Yamakuzure commented on GitHub (Nov 10, 2025):

Okay... Just a side note, downgrading to ollama-0.12.9 fixed the issue.

 ~ $ ollama ps
NAME            ID              SIZE     PROCESSOR         CONTEXT    UNTIL              
devstral:24b    9bd74193e939    16 GB    6%/94% CPU/GPU    4096       4 minutes from now    

server.log.zip

<!-- gh-comment-id:3512400830 --> @Yamakuzure commented on GitHub (Nov 10, 2025): Okay... Just a side note, downgrading to ollama-0.12.9 fixed the issue. ``` ~ $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL devstral:24b 9bd74193e939 16 GB 6%/94% CPU/GPU 4096 4 minutes from now ``` [server.log.zip](https://github.com/user-attachments/files/23457177/server.log.zip)
Author
Owner

@rick-github commented on GitHub (Nov 10, 2025):

time=2025-11-10T15:36:09.448+01:00 level=INFO source=server.go:522 msg=offload library=CUDA layers.requested=-1
 layers.model=41 layers.offload=39 layers.split=[39] memory.available="[14.4 GiB]" memory.gpu_overhead="0 B"
 memory.required.full="15.2 GiB" memory.required.partial="14.3 GiB" memory.required.kv="640.0 MiB"
 memory.required.allocations="[14.3 GiB]" memory.weights.total="13.0 GiB" memory.weights.repeating="12.5 GiB"
 memory.weights.nonrepeating="525.0 MiB" memory.graph.full="304.0 MiB" memory.graph.partial="801.0 MiB"

graph_reserve: failed to allocate compute buffers

Ollama loaded 39 of 41 layers into GPU, using 14.3G of 14.4G available, leaving 0.1G wiggle room. So it looks like ollama underestimated the amount of RAM required since it ultimately hit out-of-memory. The runner is using the old engine which has been known to be a little inaccurate with memory estimation. The model architecture is llama so you could try setting OLLAMA_NEW_ENGINE=1 in the server environment to use the more accurate new engine. Failing that, some mitigations for OOM can be found here.

<!-- gh-comment-id:3512484081 --> @rick-github commented on GitHub (Nov 10, 2025): ``` time=2025-11-10T15:36:09.448+01:00 level=INFO source=server.go:522 msg=offload library=CUDA layers.requested=-1 layers.model=41 layers.offload=39 layers.split=[39] memory.available="[14.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="15.2 GiB" memory.required.partial="14.3 GiB" memory.required.kv="640.0 MiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="13.0 GiB" memory.weights.repeating="12.5 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="304.0 MiB" memory.graph.partial="801.0 MiB" graph_reserve: failed to allocate compute buffers ``` Ollama loaded 39 of 41 layers into GPU, using 14.3G of 14.4G available, leaving 0.1G wiggle room. So it looks like ollama underestimated the amount of RAM required since it ultimately hit out-of-memory. The runner is using the old engine which has been known to be a little inaccurate with memory estimation. The model architecture is `llama` so you could try setting `OLLAMA_NEW_ENGINE=1` in the server environment to use the more accurate new engine. Failing that, some mitigations for OOM can be found [here](https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288).
Author
Owner

@Yamakuzure commented on GitHub (Nov 10, 2025):

It works with ollama 0.12.9

 $ nvidia-smi | grep ollama
|    0   N/A  N/A           22239    C+G   /usr/bin/ollama                       12166MiB |
 $ ollama ps
NAME            ID              SIZE     PROCESSOR          CONTEXT    UNTIL               
devstral:24b    9bd74193e939    22 GB    34%/66% CPU/GPU    32768      23 minutes from now    
 $ env | grep OLLAMA
OLLAMA_NEW_ESTIMATES=1
OLLAMA_FLASH_ATTENTION=1

So it looks like a regression to me...

Anyway, I will try adding OLLAMA_NEW_ENGINE=1 to env tomorrow.
Thank you very much for the information!

<!-- gh-comment-id:3512561586 --> @Yamakuzure commented on GitHub (Nov 10, 2025): It works with ollama 0.12.9 ``` $ nvidia-smi | grep ollama | 0 N/A N/A 22239 C+G /usr/bin/ollama 12166MiB | $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL devstral:24b 9bd74193e939 22 GB 34%/66% CPU/GPU 32768 23 minutes from now $ env | grep OLLAMA OLLAMA_NEW_ESTIMATES=1 OLLAMA_FLASH_ATTENTION=1 ``` So it looks like a regression to me... Anyway, I will try adding `OLLAMA_NEW_ENGINE=1` to env tomorrow. Thank you very much for the information!
Author
Owner

@rick-github commented on GitHub (Nov 10, 2025):

FYI OLLAMA_NEW_ESTIMATES is no longer an ollama configuration variable, new estimates are enabled by setting OLLAMA_NEW_ENGINE=1.

<!-- gh-comment-id:3512575513 --> @rick-github commented on GitHub (Nov 10, 2025): FYI `OLLAMA_NEW_ESTIMATES` is no longer an ollama configuration variable, new estimates are enabled by setting `OLLAMA_NEW_ENGINE=1`.
Author
Owner

@Yamakuzure commented on GitHub (Nov 10, 2025):

Setting OLLAMA_NEW_ENGINE=1 did the trick!

 $ ollama --version
ollama version is 0.12.10
 $ ollama ps
NAME          ID              SIZE     PROCESSOR          CONTEXT    UNTIL              
cogito:32b    0b4aab772f57    21 GB    31%/69% CPU/GPU    4096       4 minutes from now    

Thank you very much!

<!-- gh-comment-id:3512983209 --> @Yamakuzure commented on GitHub (Nov 10, 2025): Setting `OLLAMA_NEW_ENGINE=1` did the trick! ``` $ ollama --version ollama version is 0.12.10 $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL cogito:32b 0b4aab772f57 21 GB 31%/69% CPU/GPU 4096 4 minutes from now ``` Thank you very much!
Author
Owner

@acochrane commented on GitHub (Nov 11, 2025):

I have this issue as well
Even with OLLAMA_NEW_ENGINE=1

Maybe it has something to do with CUDA arch, 1100 is not included, but that's the Thor arch. See below.

2025-11-11T16:17:38.935821-07:00 granite ollama[559458]: Device 0: NVIDIA Thor, compute capability 11.0, VMM: yes, ID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600
2025-11-11T16:17:38.947558-07:00 granite ollama[559458]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
2025-11-11T16:17:38.947730-07:00 granite ollama[559458]: time=2025-11-11T16:17:38.947-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
2025-11-11T16:17:39.117307-07:00 granite ollama[559458]: CUDA error: an internal operation failed

Full syslog snip at pastebin

<!-- gh-comment-id:3519143220 --> @acochrane commented on GitHub (Nov 11, 2025): I have this issue as well Even with OLLAMA_NEW_ENGINE=1 Maybe it has something to do with CUDA arch, 1100 is not included, but that's the Thor arch. See below. > 2025-11-11T16:17:38.935821-07:00 granite ollama[559458]: Device 0: NVIDIA Thor, compute capability 11.0, VMM: yes, ID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 > 2025-11-11T16:17:38.947558-07:00 granite ollama[559458]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so > 2025-11-11T16:17:38.947730-07:00 granite ollama[559458]: time=2025-11-11T16:17:38.947-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) > 2025-11-11T16:17:39.117307-07:00 granite ollama[559458]: CUDA error: an internal operation failed Full syslog snip at [pastebin](https://pastebin.com/6SUx0XNj)
Author
Owner

@dhiltgen commented on GitHub (Nov 14, 2025):

@acochrane your logs aren't complete - they're missing the initial startup where we do GPU discovery. My hunch is you're running an older driver (before 580) which is causing us to fall back to CUDA v12 instead of using v13. We only support CC 11 on v13 currently. If that is the case, and you upgrade to driver 580 or newer, it should start working.

<!-- gh-comment-id:3530275999 --> @dhiltgen commented on GitHub (Nov 14, 2025): @acochrane your logs aren't complete - they're missing the initial startup where we do GPU discovery. My hunch is you're running an older driver (before 580) which is causing us to fall back to CUDA v12 instead of using v13. We only support CC 11 on v13 currently. If that is the case, and you upgrade to driver 580 or newer, it should start working.
Author
Owner

@dhiltgen commented on GitHub (Nov 14, 2025):

@tokk-nv can you provide server logs so we can see what might be going wrong?

<!-- gh-comment-id:3530279374 --> @dhiltgen commented on GitHub (Nov 14, 2025): @tokk-nv can you provide server logs so we can see what might be going wrong?
Author
Owner

@acochrane commented on GitHub (Nov 19, 2025):

Sorry about the incomplete logs @dhiltgen. I was messing around with the new version 0.12.11 and noticed something really weird.
It looks like setting the OLLAMA_HOST=0.0.0.0 environment variable (through the systemd/system file) somehow pushes ollama to load the cuda_v12 driver, while leaving that environment variable unspecified (in the system file) allows it to load the cuda_v13 ggml.

See the pastbin for a log showing a start with OLLAMA_HOST unset loading as default at 127.0.0.1:11434 followed by a system restart with the environment variable set to 0.0.0.0, loading as 0.0.0.0:11434. The first one loads cuda_v13 and runs fine, the second loads cuda_v12 and crashes.

Not really sure if there's anything I can try on my side to force the external listener to load the cuda_v13. Any ideas?

<!-- gh-comment-id:3554896490 --> @acochrane commented on GitHub (Nov 19, 2025): Sorry about the incomplete logs @dhiltgen. I was messing around with the new version 0.12.11 and noticed something really weird. It looks like setting the OLLAMA_HOST=0.0.0.0 environment variable (through the systemd/system file) somehow pushes ollama to load the cuda_v12 driver, while leaving that environment variable unspecified (in the system file) allows it to load the cuda_v13 ggml. See the [pastbin](https://pastebin.com/WsBEWNvk) for a log showing a start with OLLAMA_HOST unset loading as default at 127.0.0.1:11434 followed by a system restart with the environment variable set to 0.0.0.0, loading as 0.0.0.0:11434. The first one loads cuda_v13 and runs fine, the second loads cuda_v12 and crashes. Not really sure if there's anything I can try on my side to force the external listener to load the cuda_v13. Any ideas?
Author
Owner

@acochrane commented on GitHub (Nov 20, 2025):

If I move the cuda_v12 directory from its install location the runner loads cuda_v13 in either 'internal listener' or external listener' case and so far works fine.

mv /usr/local/lib/ollama/cuda_v12 $HOME/Downloads/

Again, this is on the jetson thor, not recommending, but it gets me passed Error: 500

<!-- gh-comment-id:3558251044 --> @acochrane commented on GitHub (Nov 20, 2025): If I move the cuda_v12 directory from its install location the runner loads cuda_v13 in either 'internal listener' or external listener' case and so far works fine. `mv /usr/local/lib/ollama/cuda_v12 $HOME/Downloads/` Again, this is on the jetson thor, not recommending, but it gets me passed `Error: 500`
Author
Owner

@dhiltgen commented on GitHub (Nov 20, 2025):

@acochrane can you set OLLAMA_DEBUG=2 and share the server startup logs for your failure case?

<!-- gh-comment-id:3558996854 --> @dhiltgen commented on GitHub (Nov 20, 2025): @acochrane can you set OLLAMA_DEBUG=2 and share the server startup logs for your failure case?
Author
Owner

@acochrane commented on GitHub (Nov 20, 2025):

Here it is.
pastebin

<!-- gh-comment-id:3559087556 --> @acochrane commented on GitHub (Nov 20, 2025): Here it is. [pastebin](https://pastebin.com/vsXfzmi5)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34395