[GH-ISSUE #1998] ggml-cuda.cu:7850: !"CUDA error" Aborted (core dumped) with 8 GPUs #1152

Closed
opened 2026-04-12 10:54:56 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @quanpinjie on GitHub (Jan 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1998

Originally assigned to: @jmorganca on GitHub.

image

Error: Post "http://127.0.0.1:11434/api/generate": EOF

GPU INFO:

Uploading image.png…

Originally created by @quanpinjie on GitHub (Jan 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1998 Originally assigned to: @jmorganca on GitHub. ![image](https://github.com/jmorganca/ollama/assets/2564119/d7deb42c-cbb7-4426-90f6-1cee8b9badf8) Error: Post "http://127.0.0.1:11434/api/generate": EOF GPU INFO: ![Uploading image.png…]()
GiteaMirror added the bugnvidia labels 2026-04-12 10:54:56 -05:00
Author
Owner

@quanpinjie commented on GitHub (Jan 15, 2024):

System: Kernel: 5.4.0-169-generic x86_64 bits: 64 compiler: gcc v: 9.4.0 Console: tty 6
Distro: Ubuntu 20.04.6 LTS (Focal Fossa)
Machine: Type: Server System: Powerleader product: PR4908WB v: Whitley serial:
Mobo: Powerleader model: 60WB32 v: 24003373 serial: UEFI: American Megatrends LLC. v: NKMH051061
date: 05/12/2023
CPU: Topology: 2x 24-Core model: Intel Xeon Gold 5318Y bits: 64 type: MT MCP SMP arch: N/A L2 cache: 72.0 MiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 404196
Speed: 800 MHz min/max: 800/2101 MHz Core speeds (MHz): 1: 800 2: 800 3: 799 4: 2591 5: 900 6: 800 7: 1300 8: 799
9: 800 10: 800 11: 801 12: 800 13: 2600 14: 800 15: 800 16: 799 17: 800 18: 800 19: 800 20: 802 21: 800 22: 800
23: 800 24: 2600 25: 801 26: 800 27: 799 28: 2589 29: 1321 30: 800 31: 800 32: 851 33: 801 34: 800 35: 800 36: 800
37: 800 38: 800 39: 800 40: 800 41: 800 42: 800 43: 800 44: 807 45: 800 46: 897 47: 2600 48: 2591 49: 800 50: 848
51: 992 52: 800 53: 1203 54: 800 55: 800 56: 2591 57: 1188 58: 900 59: 801 60: 1303 61: 799 62: 800 63: 801 64: 800
65: 801 66: 800 67: 800 68: 799 69: 801 70: 801 71: 800 72: 800 73: 800 74: 800 75: 800 76: 800 77: 802 78: 800
79: 1200 80: 800 81: 2600 82: 1129 83: 800 84: 800 85: 898 86: 800 87: 798 88: 802 89: 800 90: 801 91: 800 92: 801
93: 800 94: 799 95: 800 96: 800
Graphics: Device-1: ASPEED Graphics Family driver: ast v: kernel bus ID: 03:00.0
Device-2: NVIDIA driver: nvidia v: 535.146.02 bus ID: 4f:00.0
Device-3: NVIDIA driver: nvidia v: 535.146.02 bus ID: 50:00.0
Device-4: NVIDIA driver: nvidia v: 535.146.02 bus ID: 53:00.0
Device-5: NVIDIA driver: nvidia v: 535.146.02 bus ID: 57:00.0
Device-6: NVIDIA driver: nvidia v: 535.146.02 bus ID: 9c:00.0
Device-7: NVIDIA driver: nvidia v: 535.146.02 bus ID: 9d:00.0
Device-8: NVIDIA driver: nvidia v: 535.146.02 bus ID: a0:00.0
Device-9: NVIDIA driver: nvidia v: 535.146.02 bus ID: a4:00.0
Display: server: X.org 1.20.13 driver: modesetting,nvidia unloaded: fbdev,nouveau,vesa tty: 185x60
Message: Advanced graphics data unavailable in console. Try -G --display
Audio: Message: No Device data found.
Network: Device-1: Intel I350 Gigabit Network driver: igb v: 5.6.0-k port: 6020 bus ID: 17:00.0
IF: ens31f0 state: up speed: 1000 Mbps duplex: full mac:
Device-2: Intel I350 Gigabit Network driver: igb v: 5.6.0-k port: 6000 bus ID: 17:00.1
IF: ens31f1 state: down mac:
Device-3: Intel 82599ES 10-Gigabit SFI/SFP+ Network vendor: Gigabyte driver: ixgbe v: 5.1.0-k port: d020
bus ID: b1:00.0
IF: ens42f0 state: down mac:
Device-4: Intel 82599ES 10-Gigabit SFI/SFP+ Network vendor: Gigabyte driver: ixgbe v: 5.1.0-k port: d000
bus ID: b1:00.1
IF: ens42f1 state: down mac:
Device-5: American Megatrends type: USB driver: cdc_ether bus ID: 1-14.2:4
IF: enxa6e8da539412 state: down mac:
IF-ID-1: docker0 state: up speed: N/A duplex: N/A mac:
IF-ID-2: vetha4c6d60 state: up speed: 10000 Mbps duplex: full mac:
Drives: Local Storage: total: 3.49 TiB used: 1.33 TiB (38.1%)
ID-1: /dev/nvme0n1 vendor: Samsung model: MZQL23T8HCLS-00A07 size: 3.49 TiB
ID-2: /dev/nvme1n1 vendor: Samsung model: MZQL23T8HCLS-00A07 size: 3.49 TiB
Partition: ID-1: / size: 3.44 TiB used: 471.99 GiB (13.4%) fs: ext4 dev: /dev/nvme1n1p2
Sensors: System Temperatures: cpu: 40.0 C mobo: N/A
Fan Speeds (RPM): N/A
Info: Processes: 1402 Uptime: 12d 22h 19m Memory: 251.53 GiB used: 48.94 GiB (19.5%) Init: systemd runlevel: 5 Compilers:
gcc: 9.4.0 Shell: bash v: 5.0.17 inxi: 3.0.38

<!-- gh-comment-id:1891440119 --> @quanpinjie commented on GitHub (Jan 15, 2024): System: Kernel: 5.4.0-169-generic x86_64 bits: 64 compiler: gcc v: 9.4.0 Console: tty 6 Distro: Ubuntu 20.04.6 LTS (Focal Fossa) Machine: Type: Server System: Powerleader product: PR4908WB v: Whitley serial: <filter> Mobo: Powerleader model: 60WB32 v: 24003373 serial: <filter> UEFI: American Megatrends LLC. v: NKMH051061 date: 05/12/2023 CPU: Topology: 2x 24-Core model: Intel Xeon Gold 5318Y bits: 64 type: MT MCP SMP arch: N/A L2 cache: 72.0 MiB flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 404196 Speed: 800 MHz min/max: 800/2101 MHz Core speeds (MHz): 1: 800 2: 800 3: 799 4: 2591 5: 900 6: 800 7: 1300 8: 799 9: 800 10: 800 11: 801 12: 800 13: 2600 14: 800 15: 800 16: 799 17: 800 18: 800 19: 800 20: 802 21: 800 22: 800 23: 800 24: 2600 25: 801 26: 800 27: 799 28: 2589 29: 1321 30: 800 31: 800 32: 851 33: 801 34: 800 35: 800 36: 800 37: 800 38: 800 39: 800 40: 800 41: 800 42: 800 43: 800 44: 807 45: 800 46: 897 47: 2600 48: 2591 49: 800 50: 848 51: 992 52: 800 53: 1203 54: 800 55: 800 56: 2591 57: 1188 58: 900 59: 801 60: 1303 61: 799 62: 800 63: 801 64: 800 65: 801 66: 800 67: 800 68: 799 69: 801 70: 801 71: 800 72: 800 73: 800 74: 800 75: 800 76: 800 77: 802 78: 800 79: 1200 80: 800 81: 2600 82: 1129 83: 800 84: 800 85: 898 86: 800 87: 798 88: 802 89: 800 90: 801 91: 800 92: 801 93: 800 94: 799 95: 800 96: 800 Graphics: Device-1: ASPEED Graphics Family driver: ast v: kernel bus ID: 03:00.0 Device-2: NVIDIA driver: nvidia v: 535.146.02 bus ID: 4f:00.0 Device-3: NVIDIA driver: nvidia v: 535.146.02 bus ID: 50:00.0 Device-4: NVIDIA driver: nvidia v: 535.146.02 bus ID: 53:00.0 Device-5: NVIDIA driver: nvidia v: 535.146.02 bus ID: 57:00.0 Device-6: NVIDIA driver: nvidia v: 535.146.02 bus ID: 9c:00.0 Device-7: NVIDIA driver: nvidia v: 535.146.02 bus ID: 9d:00.0 Device-8: NVIDIA driver: nvidia v: 535.146.02 bus ID: a0:00.0 Device-9: NVIDIA driver: nvidia v: 535.146.02 bus ID: a4:00.0 Display: server: X.org 1.20.13 driver: modesetting,nvidia unloaded: fbdev,nouveau,vesa tty: 185x60 Message: Advanced graphics data unavailable in console. Try -G --display Audio: Message: No Device data found. Network: Device-1: Intel I350 Gigabit Network driver: igb v: 5.6.0-k port: 6020 bus ID: 17:00.0 IF: ens31f0 state: up speed: 1000 Mbps duplex: full mac: <filter> Device-2: Intel I350 Gigabit Network driver: igb v: 5.6.0-k port: 6000 bus ID: 17:00.1 IF: ens31f1 state: down mac: <filter> Device-3: Intel 82599ES 10-Gigabit SFI/SFP+ Network vendor: Gigabyte driver: ixgbe v: 5.1.0-k port: d020 bus ID: b1:00.0 IF: ens42f0 state: down mac: <filter> Device-4: Intel 82599ES 10-Gigabit SFI/SFP+ Network vendor: Gigabyte driver: ixgbe v: 5.1.0-k port: d000 bus ID: b1:00.1 IF: ens42f1 state: down mac: <filter> Device-5: American Megatrends type: USB driver: cdc_ether bus ID: 1-14.2:4 IF: enxa6e8da539412 state: down mac: <filter> IF-ID-1: docker0 state: up speed: N/A duplex: N/A mac: <filter> IF-ID-2: vetha4c6d60 state: up speed: 10000 Mbps duplex: full mac: <filter> Drives: Local Storage: total: 3.49 TiB used: 1.33 TiB (38.1%) ID-1: /dev/nvme0n1 vendor: Samsung model: MZQL23T8HCLS-00A07 size: 3.49 TiB ID-2: /dev/nvme1n1 vendor: Samsung model: MZQL23T8HCLS-00A07 size: 3.49 TiB Partition: ID-1: / size: 3.44 TiB used: 471.99 GiB (13.4%) fs: ext4 dev: /dev/nvme1n1p2 Sensors: System Temperatures: cpu: 40.0 C mobo: N/A Fan Speeds (RPM): N/A Info: Processes: 1402 Uptime: 12d 22h 19m Memory: 251.53 GiB used: 48.94 GiB (19.5%) Init: systemd runlevel: 5 Compilers: gcc: 9.4.0 Shell: bash v: 5.0.17 inxi: 3.0.38
Author
Owner

@dhiltgen commented on GitHub (Jan 26, 2024):

@quanpinjie can you share the server log?

<!-- gh-comment-id:1912717440 --> @dhiltgen commented on GitHub (Jan 26, 2024): @quanpinjie can you share the server log?
Author
Owner

@dhiltgen commented on GitHub (Mar 12, 2024):

If you're still facing the problem, please share the server log.

<!-- gh-comment-id:1992284378 --> @dhiltgen commented on GitHub (Mar 12, 2024): If you're still facing the problem, please share the server log.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1152