[GH-ISSUE #1895] CUDA error 999: unknown error #1088

Closed
opened 2026-04-12 10:50:23 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @jmorganca on GitHub (Jan 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1895

Originally assigned to: @dhiltgen on GitHub.

ollama serve
2024/01/10 12:36:43 images.go:808: total blobs: 9
2024/01/10 12:36:43 images.go:815: total unused blobs removed: 0
2024/01/10 12:36:43 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.19)
2024/01/10 12:36:43 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
2024/01/10 12:36:43 gpu.go:35: Detecting GPU type
2024/01/10 12:36:43 gpu.go:54: Nvidia GPU detected
2024/01/10 12:36:43 gpu.go:84: CUDA Compute Capability detected: 7.5
[GIN] 2024/01/10 - 12:36:55 | 200 |      41.734µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/10 - 12:36:55 | 200 |     624.916µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/10 - 12:36:55 | 200 |     359.397µs |       127.0.0.1 | POST     "/api/show"
size 4109853248
filetype Q4_0
architecture llama
type 7B
name gguf
embd 4096
head 32
head_kv 8
gqa 4
2024/01/10 12:36:57 gpu.go:84: CUDA Compute Capability detected: 7.5
2024/01/10 12:36:57 llm.go:70: system memory bytes: 3681740391
2024/01/10 12:36:57 llm.go:71: required model bytes: 4109853248
2024/01/10 12:36:57 llm.go:72: required kv bytes: 268435456
2024/01/10 12:36:57 llm.go:73: required alloc bytes: 178956970
2024/01/10 12:36:57 llm.go:74: required total bytes: 4557245674
2024/01/10 12:36:57 gpu.go:84: CUDA Compute Capability detected: 7.5
2024/01/10 12:36:57 llm.go:114: splitting 3502783421 of available memory bytes into layers
2024/01/10 12:36:57 llm.go:116: bytes per layer: 136821522
2024/01/10 12:36:57 llm.go:118: total required with split: 3599495020
2024/01/10 12:36:57 shim_ext_server_linux.go:24: Updating PATH to /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/tmp/ollama22470349/cuda
Lazy loading /tmp/ollama22470349/cuda/libext_server.so library
2024/01/10 12:36:57 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama22470349/cuda/libext_server.so
2024/01/10 12:36:57 ext_server_common.go:136: Initializing internal llama server
...
CUDA error 999 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:495: unknown error
current device: -1876424368
GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:495: !"CUDA error"
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Vorgang nicht zulässig.
No stack.
The program is not being run.
SIGABRT: abort
PC=0x7fc40c29999b m=13 sigcode=18446744073709551610
signal arrived during cgo execution
Originally created by @jmorganca on GitHub (Jan 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1895 Originally assigned to: @dhiltgen on GitHub. ``` ollama serve 2024/01/10 12:36:43 images.go:808: total blobs: 9 2024/01/10 12:36:43 images.go:815: total unused blobs removed: 0 2024/01/10 12:36:43 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.19) 2024/01/10 12:36:43 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] 2024/01/10 12:36:43 gpu.go:35: Detecting GPU type 2024/01/10 12:36:43 gpu.go:54: Nvidia GPU detected 2024/01/10 12:36:43 gpu.go:84: CUDA Compute Capability detected: 7.5 [GIN] 2024/01/10 - 12:36:55 | 200 | 41.734µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/10 - 12:36:55 | 200 | 624.916µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/10 - 12:36:55 | 200 | 359.397µs | 127.0.0.1 | POST "/api/show" size 4109853248 filetype Q4_0 architecture llama type 7B name gguf embd 4096 head 32 head_kv 8 gqa 4 2024/01/10 12:36:57 gpu.go:84: CUDA Compute Capability detected: 7.5 2024/01/10 12:36:57 llm.go:70: system memory bytes: 3681740391 2024/01/10 12:36:57 llm.go:71: required model bytes: 4109853248 2024/01/10 12:36:57 llm.go:72: required kv bytes: 268435456 2024/01/10 12:36:57 llm.go:73: required alloc bytes: 178956970 2024/01/10 12:36:57 llm.go:74: required total bytes: 4557245674 2024/01/10 12:36:57 gpu.go:84: CUDA Compute Capability detected: 7.5 2024/01/10 12:36:57 llm.go:114: splitting 3502783421 of available memory bytes into layers 2024/01/10 12:36:57 llm.go:116: bytes per layer: 136821522 2024/01/10 12:36:57 llm.go:118: total required with split: 3599495020 2024/01/10 12:36:57 shim_ext_server_linux.go:24: Updating PATH to /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/tmp/ollama22470349/cuda Lazy loading /tmp/ollama22470349/cuda/libext_server.so library 2024/01/10 12:36:57 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama22470349/cuda/libext_server.so 2024/01/10 12:36:57 ext_server_common.go:136: Initializing internal llama server ... CUDA error 999 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:495: unknown error current device: -1876424368 GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:495: !"CUDA error" Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Vorgang nicht zulässig. No stack. The program is not being run. SIGABRT: abort PC=0x7fc40c29999b m=13 sigcode=18446744073709551610 signal arrived during cgo execution ```
GiteaMirror added the bugnvidia labels 2026-04-12 10:50:23 -05:00
Author
Owner

@ru4en commented on GitHub (Jan 14, 2024):

Looks likes some Nvidia driver weirdness found that if you reload the nvidia_uvm and nvidia drivers it might just work until it breaks again.

sudo rmmod nvidia_uvm
sudo rmmod nvidia
sudo modprobe nvidia
sudo modprobe nvidia_uvm

found the solution on https://stackoverflow.com/questions/58595291/runtime-error-999-when-trying-to-use-cuda-with-pytorch

<!-- gh-comment-id:1890815747 --> @ru4en commented on GitHub (Jan 14, 2024): Looks likes some Nvidia driver weirdness found that if you reload the nvidia_uvm and nvidia drivers it might just work until it breaks again. ``` sudo rmmod nvidia_uvm sudo rmmod nvidia sudo modprobe nvidia sudo modprobe nvidia_uvm ``` found the solution on https://stackoverflow.com/questions/58595291/runtime-error-999-when-trying-to-use-cuda-with-pytorch
Author
Owner

@nleve commented on GitHub (Jan 26, 2024):

Thanks @ru4en, sudo modprobe --remove nvidia-uvm && sudo modprobe nvidia-uvm fixed this for me without needing a reboot.

I noticed this occurred after my PC went to sleep. I saw someone else mention that as well in the comments on that SO post. Ollama was running when mine went to sleep, not sure if that matters.

Driver Version: 545.29.06, CUDA Version: 12.3, RTX 4090, running on Manjaro

<!-- gh-comment-id:1912447136 --> @nleve commented on GitHub (Jan 26, 2024): Thanks @ru4en, `sudo modprobe --remove nvidia-uvm && sudo modprobe nvidia-uvm` fixed this for me without needing a reboot. I noticed this occurred after my PC went to sleep. I saw someone else mention that as well in the comments on that SO post. Ollama was running when mine went to sleep, not sure if that matters. Driver Version: 545.29.06, CUDA Version: 12.3, RTX 4090, running on Manjaro
Author
Owner

@dhiltgen commented on GitHub (Jan 27, 2024):

We've recently added some pre-flight checking so that if initialization of the GPU fails we can gracefully fallback to CPU mode instead of crashing. I think that should largely mitigate this issue. If you're still seeing these on 0.1.22 or newer, please let us know.

<!-- gh-comment-id:1912868448 --> @dhiltgen commented on GitHub (Jan 27, 2024): We've recently added some pre-flight checking so that if initialization of the GPU fails we can gracefully fallback to CPU mode instead of crashing. I think that should largely mitigate this issue. If you're still seeing these on 0.1.22 or newer, please let us know.
Author
Owner

@nleve commented on GitHub (Jan 27, 2024):

Can confirm the fallback to CPU worked when this occurred for me.

<!-- gh-comment-id:1913024721 --> @nleve commented on GitHub (Jan 27, 2024): Can confirm the fallback to CPU worked when this occurred for me.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1088