[GH-ISSUE #4391] pre-built binary doesn't work on Jeston with JP6 GA system #49255

Closed
opened 2026-04-28 11:01:28 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @TadayukiOkada on GitHub (May 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4391

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I get this error if I run the pre-built binary on Jetson Orin with JP6 GA system installed:
source=sched.go:339 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_EXECUTION_FAILED\n current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:1848\n cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)\nGGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:60: !\"CUDA error\""

I built ollama from source and it runs fine on JP6 system. Also, pre-built binaries were working on JP5.1.3 system.

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

No response

Originally created by @TadayukiOkada on GitHub (May 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4391 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I get this error if I run the pre-built binary on Jetson Orin with JP6 GA system installed: `source=sched.go:339 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_EXECUTION_FAILED\n current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:1848\n cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)\nGGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:60: !\"CUDA error\""` I built ollama from source and it runs fine on JP6 system. Also, pre-built binaries were working on JP5.1.3 system. ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-28 11:01:28 -05:00
GiteaMirror added the nvidia label 2026-04-28 11:02:47 -05:00
Author
Owner

@hangxingliu commented on GitHub (May 25, 2024):

I also encountered this issue today on my AGX Orin after I tried using the latest pre-built ollama. However, everything works great like the past if I use the binary built by myself.

Here are some diagnosis, I hope them could help you to fix this issue:

Ollama's journal logs: 20240525-ollama-error-logs.log

The model that I tested: llama3:8b

My hardware info:

 - Module: NVIDIA Jetson AGX Orin (64GB ram)
 - SoC: tegra234
 - CUDA Arch BIN: 8.7
 
 - Machine: aarch64
 - System: Linux
 - Distribution: Ubuntu 22.04 Jammy Jellyfish
 - Release: 5.15.122-tegra
 - Jetpack: 6.0 DP
 
 - Python: 3.10.12
 - CUDA: 12.2.140
 - cuDNN: 8.9.4.25
 - TensorRT: 8.6.2.3
<!-- gh-comment-id:2130634535 --> @hangxingliu commented on GitHub (May 25, 2024): I also encountered this issue today on my AGX Orin after I tried using the latest pre-built ollama. However, everything works great like the past if I use the binary built by myself. Here are some diagnosis, I hope them could help you to fix this issue: Ollama's journal logs: [20240525-ollama-error-logs.log](https://github.com/ollama/ollama/files/15440993/20240525-ollama-error-logs.log) The model that I tested: `llama3:8b` My hardware info: ``` - Module: NVIDIA Jetson AGX Orin (64GB ram) - SoC: tegra234 - CUDA Arch BIN: 8.7 - Machine: aarch64 - System: Linux - Distribution: Ubuntu 22.04 Jammy Jellyfish - Release: 5.15.122-tegra - Jetpack: 6.0 DP - Python: 3.10.12 - CUDA: 12.2.140 - cuDNN: 8.9.4.25 - TensorRT: 8.6.2.3 ```
Author
Owner

@dhiltgen commented on GitHub (May 31, 2024):

We'll track Orin/JP6 support in #2408

<!-- gh-comment-id:2143034417 --> @dhiltgen commented on GitHub (May 31, 2024): We'll track Orin/JP6 support in #2408
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49255