[GH-ISSUE #3903] index 0 is out of range for type 'uvm_gpu_chunk_t *[*]' #2417

Closed
opened 2026-04-12 12:43:38 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @jferments on GitHub (Apr 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3903

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I am using Ollama on Ubuntu 24.04, and am getting the following error showing up in dmesg:

[ 48.005440] ------------[ cut here ]------------
[ 48.005441] UBSAN: array-index-out-of-bounds in build/nvidia/535.171.04/build/nvidia-uvm/uvm_pmm_gpu.c:2038:44
[ 48.005443] index 0 is out of range for type 'uvm_gpu_chunk_t []'
[ 48.005445] CPU: 24 PID: 3014 Comm: ollama Tainted: P O 6.8.0-31-generic #31-Ubuntu
[ 48.005447] Hardware name: ASUS System Product Name/Pro WS WRX90E-SAGE SE, BIOS 0404 12/20/2023
[ 48.005448] Call Trace:
[ 48.005449]
[ 48.005450] dump_stack_lvl+0x48/0x70
[ 48.005453] dump_stack+0x10/0x20
[ 48.005455] __ubsan_handle_out_of_bounds+0xc6/0x110
[ 48.005458] uvm_pmm_gpu_alloc+0x2f5/0x6d0 [nvidia_uvm]
[ 48.005490] phys_mem_allocate+0xac/0x230 [nvidia_uvm]
[ 48.005521] allocate_directory+0xb4/0x130 [nvidia_uvm]
[ 48.005548] ? allocate_directory+0xb4/0x130 [nvidia_uvm]
[ 48.005577] uvm_page_tree_init+0x133/0x450 [nvidia_uvm]
[ 48.005607] uvm_gpu_retain_by_uuid+0x19df/0x2b80 [nvidia_uvm]
[ 48.005639] uvm_va_space_register_gpu+0x47/0x740 [nvidia_uvm]
[ 48.005669] uvm_api_register_gpu+0x5a/0x90 [nvidia_uvm]
[ 48.005696] uvm_ioctl+0x1a26/0x1cd0 [nvidia_uvm]
[ 48.005724] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005726] ? xas_find+0x74/0x1e0
[ 48.005728] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005731] ? next_uptodate_folio+0xa9/0x320
[ 48.005734] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005736] ? filemap_map_pages+0x2fe/0x4c0
[ 48.005739] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005741] ? list_lru_add+0xd1/0x140
[ 48.005744] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005746] ? _raw_spin_lock_irqsave+0xe/0x20
[ 48.005748] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005750] ? thread_context_non_interrupt_add+0x13a/0x250 [nvidia_uvm]
[ 48.005780] uvm_unlocked_ioctl_entry.part.0+0x7b/0xf0 [nvidia_uvm]
[ 48.005808] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005811] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005813] ? handle_pte_fault+0x114/0x1d0
[ 48.005815] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005817] ? __handle_mm_fault+0x653/0x790
[ 48.005820] uvm_unlocked_ioctl_entry+0x6b/0x90 [nvidia_uvm]
[ 48.005847] __x64_sys_ioctl+0xa0/0xf0
[ 48.005850] x64_sys_call+0x143b/0x25c0
[ 48.005853] do_syscall_64+0x7f/0x180
[ 48.005855] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005857] ? handle_mm_fault+0xad/0x380
[ 48.005860] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005862] ? do_user_addr_fault+0x338/0x6b0
[ 48.005864] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005866] ? irqentry_exit_to_user_mode+0x7b/0x260
[ 48.005869] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005871] ? irqentry_exit+0x43/0x50
[ 48.005874] ? srso_alias_return_thunk+0x5/0xfbef5
[ 48.005876] ? exc_page_fault+0x94/0x1b0
[ 48.005879] entry_SYSCALL_64_after_hwframe+0x73/0x7b
[ 48.005882] RIP: 0033:0x74615dd24ded
[ 48.005887] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 48.005889] RSP: 002b:00007460f5fff2d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 48.005891] RAX: ffffffffffffffda RBX: 00007460ebd00860 RCX: 000074615dd24ded
[ 48.005892] RDX: 00007460f5fff370 RSI: 0000000000000025 RDI: 0000000000000008
[ 48.005894] RBP: 00007460f5fff320 R08: 00007460ebd008f0 R09: 0000000000000000
[ 48.005895] R10: 000074609c02dab0 R11: 0000000000000246 R12: 000074609c0370f6
[ 48.005896] R13: 00007460ebd008f0 R14: 00007460f5fff370 R15: 0000000000000008
[ 48.005900]
[ 48.005901] ---[ end trace ]---

I am using an AMD 7965WX CPU, 2 x RTX 4090 GPUs, and a Asus WRX90E-SAGE motherboard.

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.1.30

Originally created by @jferments on GitHub (Apr 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3903 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I am using Ollama on Ubuntu 24.04, and am getting the following error showing up in dmesg: [ 48.005440] ------------[ cut here ]------------ [ 48.005441] UBSAN: array-index-out-of-bounds in build/nvidia/535.171.04/build/nvidia-uvm/uvm_pmm_gpu.c:2038:44 [ 48.005443] index 0 is out of range for type 'uvm_gpu_chunk_t *[*]' [ 48.005445] CPU: 24 PID: 3014 Comm: ollama Tainted: P O 6.8.0-31-generic #31-Ubuntu [ 48.005447] Hardware name: ASUS System Product Name/Pro WS WRX90E-SAGE SE, BIOS 0404 12/20/2023 [ 48.005448] Call Trace: [ 48.005449] <TASK> [ 48.005450] dump_stack_lvl+0x48/0x70 [ 48.005453] dump_stack+0x10/0x20 [ 48.005455] __ubsan_handle_out_of_bounds+0xc6/0x110 [ 48.005458] uvm_pmm_gpu_alloc+0x2f5/0x6d0 [nvidia_uvm] [ 48.005490] phys_mem_allocate+0xac/0x230 [nvidia_uvm] [ 48.005521] allocate_directory+0xb4/0x130 [nvidia_uvm] [ 48.005548] ? allocate_directory+0xb4/0x130 [nvidia_uvm] [ 48.005577] uvm_page_tree_init+0x133/0x450 [nvidia_uvm] [ 48.005607] uvm_gpu_retain_by_uuid+0x19df/0x2b80 [nvidia_uvm] [ 48.005639] uvm_va_space_register_gpu+0x47/0x740 [nvidia_uvm] [ 48.005669] uvm_api_register_gpu+0x5a/0x90 [nvidia_uvm] [ 48.005696] uvm_ioctl+0x1a26/0x1cd0 [nvidia_uvm] [ 48.005724] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005726] ? xas_find+0x74/0x1e0 [ 48.005728] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005731] ? next_uptodate_folio+0xa9/0x320 [ 48.005734] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005736] ? filemap_map_pages+0x2fe/0x4c0 [ 48.005739] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005741] ? list_lru_add+0xd1/0x140 [ 48.005744] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005746] ? _raw_spin_lock_irqsave+0xe/0x20 [ 48.005748] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005750] ? thread_context_non_interrupt_add+0x13a/0x250 [nvidia_uvm] [ 48.005780] uvm_unlocked_ioctl_entry.part.0+0x7b/0xf0 [nvidia_uvm] [ 48.005808] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005811] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005813] ? handle_pte_fault+0x114/0x1d0 [ 48.005815] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005817] ? __handle_mm_fault+0x653/0x790 [ 48.005820] uvm_unlocked_ioctl_entry+0x6b/0x90 [nvidia_uvm] [ 48.005847] __x64_sys_ioctl+0xa0/0xf0 [ 48.005850] x64_sys_call+0x143b/0x25c0 [ 48.005853] do_syscall_64+0x7f/0x180 [ 48.005855] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005857] ? handle_mm_fault+0xad/0x380 [ 48.005860] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005862] ? do_user_addr_fault+0x338/0x6b0 [ 48.005864] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005866] ? irqentry_exit_to_user_mode+0x7b/0x260 [ 48.005869] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005871] ? irqentry_exit+0x43/0x50 [ 48.005874] ? srso_alias_return_thunk+0x5/0xfbef5 [ 48.005876] ? exc_page_fault+0x94/0x1b0 [ 48.005879] entry_SYSCALL_64_after_hwframe+0x73/0x7b [ 48.005882] RIP: 0033:0x74615dd24ded [ 48.005887] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 [ 48.005889] RSP: 002b:00007460f5fff2d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 48.005891] RAX: ffffffffffffffda RBX: 00007460ebd00860 RCX: 000074615dd24ded [ 48.005892] RDX: 00007460f5fff370 RSI: 0000000000000025 RDI: 0000000000000008 [ 48.005894] RBP: 00007460f5fff320 R08: 00007460ebd008f0 R09: 0000000000000000 [ 48.005895] R10: 000074609c02dab0 R11: 0000000000000246 R12: 000074609c0370f6 [ 48.005896] R13: 00007460ebd008f0 R14: 00007460f5fff370 R15: 0000000000000008 [ 48.005900] </TASK> [ 48.005901] ---[ end trace ]--- I am using an AMD 7965WX CPU, 2 x RTX 4090 GPUs, and a Asus WRX90E-SAGE motherboard. ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.1.30
GiteaMirror added the bugnvidia labels 2026-04-12 12:43:38 -05:00
Author
Owner

@dhiltgen commented on GitHub (May 1, 2024):

Can you explain a bit more about what happens? Does the system crash, hang or otherwise become unusable? Does Ollama work, or does it crash when this happens? Are you running the latest nvidia driver? (this feels like a driver bug at first glance)

<!-- gh-comment-id:2089212761 --> @dhiltgen commented on GitHub (May 1, 2024): Can you explain a bit more about what happens? Does the system crash, hang or otherwise become unusable? Does Ollama work, or does it crash when this happens? Are you running the latest nvidia driver? (this feels like a driver bug at first glance)
Author
Owner

@dhiltgen commented on GitHub (May 21, 2024):

Please make sure to update to the latest nvidia driver and if you're still having problems let us know and I'll re-open.

<!-- gh-comment-id:2123129456 --> @dhiltgen commented on GitHub (May 21, 2024): Please make sure to update to the latest nvidia driver and if you're still having problems let us know and I'll re-open.
Author
Owner

@ChristianHohlfeld commented on GitHub (Oct 2, 2024):

I’ve encountered the same issue while using Nvidia driver version 535.183.01 with CUDA 12.2 on an RTX 3080 (10GB). The problem consistently arises when prompting larger text. My system is running the latest version of Ubuntu Server, and despite updating to newer Nvidia drivers, the issue persists. When it happens, the system halts, and the HDMI display goes black. I'm currently using LLaMA 3.2 along with ollama:latest (official Docker container). Any advice or suggestions for a potential fix would be greatly appreciated.

<!-- gh-comment-id:2389807391 --> @ChristianHohlfeld commented on GitHub (Oct 2, 2024): I’ve encountered the same issue while using Nvidia driver version 535.183.01 with CUDA 12.2 on an RTX 3080 (10GB). The problem consistently arises when prompting larger text. My system is running the latest version of Ubuntu Server, and despite updating to newer Nvidia drivers, the issue persists. When it happens, the system halts, and the HDMI display goes black. I'm currently using LLaMA 3.2 along with ollama:latest (official Docker container). Any advice or suggestions for a potential fix would be greatly appreciated.
Author
Owner

@dongshimou commented on GitHub (Oct 12, 2024):

same error on ubuntu 24.04.
gpu: tesla p40
ollama: docker ollama/ollama
image

<!-- gh-comment-id:2408403171 --> @dongshimou commented on GitHub (Oct 12, 2024): same error on ubuntu 24.04. gpu: tesla p40 ollama: docker ollama/ollama ![image](https://github.com/user-attachments/assets/58e8df21-f67e-4f57-be15-cfe7c442a953)
Author
Owner

@dhiltgen commented on GitHub (Oct 17, 2024):

@ChristianHohlfeld @dongshimou driver 535 is an older driver, so it's quite possible NVIDIA has already fixed the bug you're hitting. Please upgrade to the latest drivers and if you're still seeing the crash, let us know.

<!-- gh-comment-id:2420317756 --> @dhiltgen commented on GitHub (Oct 17, 2024): @ChristianHohlfeld @dongshimou driver 535 is an older driver, so it's quite possible NVIDIA has already fixed the bug you're hitting. Please upgrade to the latest drivers and if you're still seeing the crash, let us know.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2417