[GH-ISSUE #964] unbalanced vram usage on 2x3070 GPUs with coodbooga & nexusraven #62507

Closed
opened 2026-05-03 09:16:31 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @chymian on GitHub (Nov 1, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/964

Originally assigned to: @dhiltgen on GitHub.

running coodboga & nexusraven segfaults and makeing the host unresponsiv.
they load w/o pbls and crash "on the first token".
(zephyr works good)

I tried that with stock ollama 0.1.7, (linux install), docker & selfcompiled (516).

  • checked the sha256: ok
  • running them with llama-bench (self compiled), all models pass.

host:
ubu: 22.04
4 x 3070 8GB
i5-7400

log:

Nov 01 13:15:02 utopia kernel: PREDICT[87637]: segfault at 90 ip 000055baddd22987 sp 00007f3d69ff56f0 error 4 in netdata[55baddb76000+474000]
Nov 01 13:15:02 utopia kernel: Code: 19 66 90 4c 89 f7 e8 b8 36 ea ff 48 83 7c 24 68 00 49 89 c4 0f 84 b9 00 00 00 49 8b 9c 24 a0 00 00 00 48 85 db 74 dc 48 8b 03 <8b> 80 90 00 00 00 a8 08 75 cf 48 8b 03 48 8b b8 98 00 00 00 e8 30
Nov 01 13:15:02 utopia kernel: traps: apport[94210] general protection fault ip:55ed725f58e0 sp:7ffe9cf1c7d0 error:0 in python3.10[55ed72523000+2b1000]
Nov 01 13:15:02 utopia kernel: Process 94210(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:02 utopia kernel: Aborting core
Nov 01 13:15:02 utopia systemd[1]: netdata.service: Main process exited, code=killed, status=11/SEGV
Nov 01 13:15:02 utopia dbus-daemon[781]: double free or corruption (!prev)
Nov 01 13:15:02 utopia kernel: x2gocleansessio[971]: segfault at 10 ip 0000563c3b0e2c8c sp 00007ffd36f14ec0 error 4 in perl[563c3b017000+1a0000]
Nov 01 13:15:02 utopia kernel: Code: 78 60 48 8b 70 48 44 8d 6f 01 44 89 68 60 41 83 fd 01 0f 8f 3e 04 00 00 48 8b 56 08 49 63 c5 48 8b 04 c2 48 89 85 20 01 00 00 <48> 8b 40 10 48 89 45 10 84 c9 74 71 4c 8b 00 48 8b 85 b8 00 00 00
Nov 01 13:15:02 utopia systemd[1]: x2goserver.service: Main process exited, code=dumped, status=11/SEGV
Nov 01 13:15:02 utopia systemd[1]: x2goserver.service: Failed with result 'core-dump'.
Nov 01 13:15:02 utopia systemd[1]: x2goserver.service: Consumed 2min 56.062s CPU time.
Nov 01 13:15:02 utopia kernel: apport[94212]: segfault at 158 ip 0000558fdfd3af94 sp 00007ffeadcaf248 error 4 in python3.10[558fdfc6e000+2b1000]
Nov 01 13:15:02 utopia kernel: Code: 89 e7 48 89 2c 24 e8 8b bd fe ff 48 8b 1c 24 e9 37 fe ff ff 48 01 f6 e9 5b f7 ff ff e9 a1 ec f3 ff 0f 1f 44 00 00 f3 0f 1e fa <48> 8b 87 58 01 00 00 48 85 c0 74 5f 48 8b 50 10 48 85 d2 7e 47 41
Nov 01 13:15:02 utopia kernel: Process 94212(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:02 utopia kernel: Aborting core
Nov 01 13:15:02 utopia kernel: apport[94211]: segfault at 0 ip 0000000000000000 sp 00007ffdf6625168 error 14 in python3.10[558edd361000+6d000]
Nov 01 13:15:02 utopia kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Nov 01 13:15:02 utopia kernel: Process 94211(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:02 utopia kernel: Aborting core
Nov 01 13:15:02 utopia kernel: systemd[1]: segfault at 0 ip 00007ff94d1b0b5e sp 00007ffdbd7d6b68 error 4 in libc.so.6[7ff94d040000+195000]
Nov 01 13:15:02 utopia kernel: Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9
Nov 01 13:15:02 utopia kernel: gdbus[1975]: segfault at 4 ip 00007f0a8837b127 sp 00007f0a86ad0bc0 error 4 in libglib-2.0.so.0.7200.4[7f0a88323000+8f000]
Nov 01 13:15:02 utopia kernel: Code: 48 0f 42 f0 48 8b 05 80 3e 0c 00 31 d2 4c 8b 57 08 48 f7 f6 ba 04 00 00 00 48 89 d1 48 39 d0 48 0f 43 c8 48 8b 05 a1 3e 0c 00 <42> 8b 04 80 89 ca 85 c0 75 5f 49 39 d2 73 72 48 8b 05 1b 2e 0c 00
Nov 01 13:15:02 utopia kernel: unattended-upgr[889]: segfault at 18 ip 00007fe821e6010a sp 00007ffc229dfc80 error 4 in libglib-2.0.so.0.7200.4[7fe821e08000+8f000]
Nov 01 13:15:02 utopia kernel: Code: c5 48 8d 34 9b 49 89 db 4d 89 c1 49 c1 e5 04 48 c1 e6 04 4c 01 ef 49 c1 e3 04 48 39 c6 48 0f 42 f0 48 8b 05 80 3e 0c 00 31 d2 <4c> 8b 57 08 48 f7 f6 ba 04 00 00 00 48 89 d1 48 39 d0 48 0f 43 c8
Nov 01 13:15:02 utopia rtkit-daemon[1140]: Exiting cleanly.
Nov 01 13:15:02 utopia rtkit-daemon[1140]: Demoting known real-time threads.
Nov 01 13:15:02 utopia rtkit-daemon[1140]: Demoted 0 threads.
Nov 01 13:15:02 utopia rtkit-daemon[1140]: Exiting watchdog thread.
Nov 01 13:15:02 utopia rtkit-daemon[1140]: Exiting canary thread.
Nov 01 13:15:02 utopia avahi-daemon[778]: Disconnected from D-Bus, exiting.
Nov 01 13:15:02 utopia avahi-daemon[778]: Got SIGTERM, quitting.
Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface docker0.IPv4 with address 172.17.0.1.
Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface zt4mrrjgxa.IPv6 with address fe80::6c84:49ff:fe9f:6f.
Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface zt4mrrjgxa.IPv4 with address 10.11.1.17.
Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface br0.IPv6 with address 2003:a:271a:300:804d:2fff:fe3d:c3b1.
Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface br0.IPv4 with address 192.168.178.17.
Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface lo.IPv6 with address ::1.
Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.0.1.
Nov 01 13:15:02 utopia ModemManager[835]: <warn>  could not acquire the 'org.freedesktop.ModemManager1' service name
Nov 01 13:15:02 utopia ModemManager[835]: <info>  ModemManager is shut down
Nov 01 13:15:02 utopia avahi-daemon[778]: avahi-daemon 0.8 exiting.
Nov 01 13:15:02 utopia tracker-miner-fs-3[1381]: OK
Nov 01 13:15:02 utopia tracker-miner-fs-3[1390]: OK
Nov 01 13:15:02 utopia gvfs-mtp-volume-monitor[1630]: **
Nov 01 13:15:02 utopia gvfs-mtp-volume-monitor[1630]: GLib-GObject:ERROR:../../../gobject/gtype.c:2189:type_class_init_Wm: assertion failed: (node->is_classed && node->data && node->data->class.class_size && !node->data->class.class && g_atomic_int_get (&node->data->class.init_state) == UNINITIALIZED)
Nov 01 13:15:02 utopia gvfs-mtp-volume-monitor[1630]: Bail out! GLib-GObject:ERROR:../../../gobject/gtype.c:2189:type_class_init_Wm: assertion failed: (node->is_classed && node->data && node->data->class.class_size && !node->data->class.class && g_atomic_int_get (&node->data->class.init_state) == UNINITIALIZED)
Nov 01 13:15:02 utopia kernel: apport[94218]: segfault at 10 ip 0000561e3ecf2cf3 sp 00007ffe27ddc450 error 4 in python3.10[561e3ebf6000+2b1000]
Nov 01 13:15:02 utopia kernel: Code: 75 5b 48 89 fa 8b 7f 60 48 8b 4a 20 8b 41 28 01 ff 78 4a 48 83 ec 38 48 8b 71 78 4c 8b 15 85 bc 1e 00 c7 44 24 08 ff ff ff ff <48> 8b 4e 10 4c 8d 46 20 48 89 e6 4c 89 14 24 66 49 0f 6e c0 89 44
Nov 01 13:15:02 utopia kernel: Process 94218(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:02 utopia kernel: Aborting core
Nov 01 13:15:02 utopia kernel: apport[94216]: segfault at 30 ip 000055f1eccd8864 sp 00007ffd7a91f610 error 4 in python3.10[55f1ecc1a000+2b1000]
Nov 01 13:15:02 utopia kernel: Code: 00 41 89 94 24 b8 00 00 00 74 08 85 d2 0f 8e 27 03 00 00 48 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 8b 57 08 <ff> 52 30 48 8b 75 18 48 83 eb 01 0f 83 9b fe ff ff e9 af fe ff ff
Nov 01 13:15:02 utopia kernel: Process 94216(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:02 utopia kernel: Aborting core
Nov 01 13:15:02 utopia systemd[1]: Caught <SEGV>, dumped core as pid 94213.
Nov 01 13:15:02 utopia systemd[1]: Freezing execution.
Nov 01 13:15:02 utopia kernel: Process 94215(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:02 utopia kernel: Aborting core
Nov 01 13:15:02 utopia systemd[912]: gvfs-mtp-volume-monitor.service: Main process exited, code=killed, status=6/ABRT
Nov 01 13:15:02 utopia systemd[912]: gvfs-mtp-volume-monitor.service: Failed with result 'signal'.
Nov 01 13:15:02 utopia kernel: Process 94220(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:02 utopia kernel: Aborting core
Nov 01 13:15:02 utopia kernel: Process 94221(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:02 utopia kernel: Aborting core
Nov 01 13:15:02 utopia systemd[910]: tracker-miner-fs-3.service: Main process exited, code=killed, status=11/SEGV
Nov 01 13:15:02 utopia systemd[910]: tracker-miner-fs-3.service: Failed with result 'signal'.
Nov 01 13:15:02 utopia systemd[910]: gvfs-mtp-volume-monitor.service: Main process exited, code=killed, status=11/SEGV
Nov 01 13:15:02 utopia systemd[910]: gvfs-mtp-volume-monitor.service: Failed with result 'signal'.
Nov 01 13:15:02 utopia kernel: Process 94237(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:02 utopia kernel: Aborting core
Nov 01 13:15:02 utopia systemd[910]: tracker-miner-fs-3.service: Scheduled restart job, restart counter is at 1.
Nov 01 13:15:02 utopia systemd[910]: Stopped Tracker file system data miner.
Nov 01 13:15:02 utopia systemd[910]: Starting Tracker file system data miner...
Nov 01 13:15:02 utopia tracker-miner-f[94248]: Corrupt database: sqlite integrity check returned '*** in database main ***
                                               Page 190: btreeInitPage() returns error code 11'
Nov 01 13:15:02 utopia tracker-miner-f[94248]: Could not create store/endpoint: Corrupt db file
Nov 01 13:15:02 utopia systemd[910]: tracker-miner-fs-3.service: Main process exited, code=exited, status=1/FAILURE
Nov 01 13:15:02 utopia systemd[910]: tracker-miner-fs-3.service: Failed with result 'exit-code'.
Nov 01 13:15:02 utopia systemd[910]: Failed to start Tracker file system data miner.
Nov 01 13:15:02 utopia systemd[910]: realloc(): invalid pointer
Nov 01 13:15:02 utopia kernel: Process 94253(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:02 utopia kernel: Aborting core
Nov 01 13:15:03 utopia systemd[920]: pam_unix(systemd-user:session): session closed for user pinokio
Nov 01 13:15:04 utopia kernel: Process 94267(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:04 utopia kernel: Aborting core
Nov 01 13:15:04 utopia kernel: Process 94296(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:04 utopia kernel: Aborting core
Nov 01 13:15:06 utopia kernel: Process 94336(apport) has RLIMIT_CORE set to 1
Nov 01 13:15:06 utopia kernel: Aborting core
Nov 01 13:15:07 utopia kernel: show_signal: 16 callbacks suppressed
Nov 01 13:15:07 utopia kernel: traps: python3[94361] general protection fault ip:53c600 sp:7ffc9b7591e0 error:0 in python3.11[41f000+233000]
Nov 01 13:15:10 utopia kernel: traps: python[94412] general protection fault ip:53529d sp:7ffea2782780 error:0 in python3.11[41f000+233000]
Originally created by @chymian on GitHub (Nov 1, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/964 Originally assigned to: @dhiltgen on GitHub. running coodboga & nexusraven segfaults and makeing the host unresponsiv. they load w/o pbls and crash "on the first token". (zephyr works good) I tried that with stock ollama 0.1.7, (linux install), docker & selfcompiled ([516](https://github.com/jmorganca/ollama/issues/516)). - checked the sha256: ok - running them with `llama-bench` (self compiled), all models pass. __host:__ ubu: 22.04 4 x 3070 8GB i5-7400 __log:__ ```log Nov 01 13:15:02 utopia kernel: PREDICT[87637]: segfault at 90 ip 000055baddd22987 sp 00007f3d69ff56f0 error 4 in netdata[55baddb76000+474000] Nov 01 13:15:02 utopia kernel: Code: 19 66 90 4c 89 f7 e8 b8 36 ea ff 48 83 7c 24 68 00 49 89 c4 0f 84 b9 00 00 00 49 8b 9c 24 a0 00 00 00 48 85 db 74 dc 48 8b 03 <8b> 80 90 00 00 00 a8 08 75 cf 48 8b 03 48 8b b8 98 00 00 00 e8 30 Nov 01 13:15:02 utopia kernel: traps: apport[94210] general protection fault ip:55ed725f58e0 sp:7ffe9cf1c7d0 error:0 in python3.10[55ed72523000+2b1000] Nov 01 13:15:02 utopia kernel: Process 94210(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:02 utopia kernel: Aborting core Nov 01 13:15:02 utopia systemd[1]: netdata.service: Main process exited, code=killed, status=11/SEGV Nov 01 13:15:02 utopia dbus-daemon[781]: double free or corruption (!prev) Nov 01 13:15:02 utopia kernel: x2gocleansessio[971]: segfault at 10 ip 0000563c3b0e2c8c sp 00007ffd36f14ec0 error 4 in perl[563c3b017000+1a0000] Nov 01 13:15:02 utopia kernel: Code: 78 60 48 8b 70 48 44 8d 6f 01 44 89 68 60 41 83 fd 01 0f 8f 3e 04 00 00 48 8b 56 08 49 63 c5 48 8b 04 c2 48 89 85 20 01 00 00 <48> 8b 40 10 48 89 45 10 84 c9 74 71 4c 8b 00 48 8b 85 b8 00 00 00 Nov 01 13:15:02 utopia systemd[1]: x2goserver.service: Main process exited, code=dumped, status=11/SEGV Nov 01 13:15:02 utopia systemd[1]: x2goserver.service: Failed with result 'core-dump'. Nov 01 13:15:02 utopia systemd[1]: x2goserver.service: Consumed 2min 56.062s CPU time. Nov 01 13:15:02 utopia kernel: apport[94212]: segfault at 158 ip 0000558fdfd3af94 sp 00007ffeadcaf248 error 4 in python3.10[558fdfc6e000+2b1000] Nov 01 13:15:02 utopia kernel: Code: 89 e7 48 89 2c 24 e8 8b bd fe ff 48 8b 1c 24 e9 37 fe ff ff 48 01 f6 e9 5b f7 ff ff e9 a1 ec f3 ff 0f 1f 44 00 00 f3 0f 1e fa <48> 8b 87 58 01 00 00 48 85 c0 74 5f 48 8b 50 10 48 85 d2 7e 47 41 Nov 01 13:15:02 utopia kernel: Process 94212(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:02 utopia kernel: Aborting core Nov 01 13:15:02 utopia kernel: apport[94211]: segfault at 0 ip 0000000000000000 sp 00007ffdf6625168 error 14 in python3.10[558edd361000+6d000] Nov 01 13:15:02 utopia kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. Nov 01 13:15:02 utopia kernel: Process 94211(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:02 utopia kernel: Aborting core Nov 01 13:15:02 utopia kernel: systemd[1]: segfault at 0 ip 00007ff94d1b0b5e sp 00007ffdbd7d6b68 error 4 in libc.so.6[7ff94d040000+195000] Nov 01 13:15:02 utopia kernel: Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 89 f8 31 d2 c5 c1 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 Nov 01 13:15:02 utopia kernel: gdbus[1975]: segfault at 4 ip 00007f0a8837b127 sp 00007f0a86ad0bc0 error 4 in libglib-2.0.so.0.7200.4[7f0a88323000+8f000] Nov 01 13:15:02 utopia kernel: Code: 48 0f 42 f0 48 8b 05 80 3e 0c 00 31 d2 4c 8b 57 08 48 f7 f6 ba 04 00 00 00 48 89 d1 48 39 d0 48 0f 43 c8 48 8b 05 a1 3e 0c 00 <42> 8b 04 80 89 ca 85 c0 75 5f 49 39 d2 73 72 48 8b 05 1b 2e 0c 00 Nov 01 13:15:02 utopia kernel: unattended-upgr[889]: segfault at 18 ip 00007fe821e6010a sp 00007ffc229dfc80 error 4 in libglib-2.0.so.0.7200.4[7fe821e08000+8f000] Nov 01 13:15:02 utopia kernel: Code: c5 48 8d 34 9b 49 89 db 4d 89 c1 49 c1 e5 04 48 c1 e6 04 4c 01 ef 49 c1 e3 04 48 39 c6 48 0f 42 f0 48 8b 05 80 3e 0c 00 31 d2 <4c> 8b 57 08 48 f7 f6 ba 04 00 00 00 48 89 d1 48 39 d0 48 0f 43 c8 Nov 01 13:15:02 utopia rtkit-daemon[1140]: Exiting cleanly. Nov 01 13:15:02 utopia rtkit-daemon[1140]: Demoting known real-time threads. Nov 01 13:15:02 utopia rtkit-daemon[1140]: Demoted 0 threads. Nov 01 13:15:02 utopia rtkit-daemon[1140]: Exiting watchdog thread. Nov 01 13:15:02 utopia rtkit-daemon[1140]: Exiting canary thread. Nov 01 13:15:02 utopia avahi-daemon[778]: Disconnected from D-Bus, exiting. Nov 01 13:15:02 utopia avahi-daemon[778]: Got SIGTERM, quitting. Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface docker0.IPv4 with address 172.17.0.1. Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface zt4mrrjgxa.IPv6 with address fe80::6c84:49ff:fe9f:6f. Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface zt4mrrjgxa.IPv4 with address 10.11.1.17. Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface br0.IPv6 with address 2003:a:271a:300:804d:2fff:fe3d:c3b1. Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface br0.IPv4 with address 192.168.178.17. Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface lo.IPv6 with address ::1. Nov 01 13:15:02 utopia avahi-daemon[778]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.0.1. Nov 01 13:15:02 utopia ModemManager[835]: <warn> could not acquire the 'org.freedesktop.ModemManager1' service name Nov 01 13:15:02 utopia ModemManager[835]: <info> ModemManager is shut down Nov 01 13:15:02 utopia avahi-daemon[778]: avahi-daemon 0.8 exiting. Nov 01 13:15:02 utopia tracker-miner-fs-3[1381]: OK Nov 01 13:15:02 utopia tracker-miner-fs-3[1390]: OK Nov 01 13:15:02 utopia gvfs-mtp-volume-monitor[1630]: ** Nov 01 13:15:02 utopia gvfs-mtp-volume-monitor[1630]: GLib-GObject:ERROR:../../../gobject/gtype.c:2189:type_class_init_Wm: assertion failed: (node->is_classed && node->data && node->data->class.class_size && !node->data->class.class && g_atomic_int_get (&node->data->class.init_state) == UNINITIALIZED) Nov 01 13:15:02 utopia gvfs-mtp-volume-monitor[1630]: Bail out! GLib-GObject:ERROR:../../../gobject/gtype.c:2189:type_class_init_Wm: assertion failed: (node->is_classed && node->data && node->data->class.class_size && !node->data->class.class && g_atomic_int_get (&node->data->class.init_state) == UNINITIALIZED) Nov 01 13:15:02 utopia kernel: apport[94218]: segfault at 10 ip 0000561e3ecf2cf3 sp 00007ffe27ddc450 error 4 in python3.10[561e3ebf6000+2b1000] Nov 01 13:15:02 utopia kernel: Code: 75 5b 48 89 fa 8b 7f 60 48 8b 4a 20 8b 41 28 01 ff 78 4a 48 83 ec 38 48 8b 71 78 4c 8b 15 85 bc 1e 00 c7 44 24 08 ff ff ff ff <48> 8b 4e 10 4c 8d 46 20 48 89 e6 4c 89 14 24 66 49 0f 6e c0 89 44 Nov 01 13:15:02 utopia kernel: Process 94218(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:02 utopia kernel: Aborting core Nov 01 13:15:02 utopia kernel: apport[94216]: segfault at 30 ip 000055f1eccd8864 sp 00007ffd7a91f610 error 4 in python3.10[55f1ecc1a000+2b1000] Nov 01 13:15:02 utopia kernel: Code: 00 41 89 94 24 b8 00 00 00 74 08 85 d2 0f 8e 27 03 00 00 48 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 8b 57 08 <ff> 52 30 48 8b 75 18 48 83 eb 01 0f 83 9b fe ff ff e9 af fe ff ff Nov 01 13:15:02 utopia kernel: Process 94216(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:02 utopia kernel: Aborting core Nov 01 13:15:02 utopia systemd[1]: Caught <SEGV>, dumped core as pid 94213. Nov 01 13:15:02 utopia systemd[1]: Freezing execution. Nov 01 13:15:02 utopia kernel: Process 94215(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:02 utopia kernel: Aborting core Nov 01 13:15:02 utopia systemd[912]: gvfs-mtp-volume-monitor.service: Main process exited, code=killed, status=6/ABRT Nov 01 13:15:02 utopia systemd[912]: gvfs-mtp-volume-monitor.service: Failed with result 'signal'. Nov 01 13:15:02 utopia kernel: Process 94220(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:02 utopia kernel: Aborting core Nov 01 13:15:02 utopia kernel: Process 94221(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:02 utopia kernel: Aborting core Nov 01 13:15:02 utopia systemd[910]: tracker-miner-fs-3.service: Main process exited, code=killed, status=11/SEGV Nov 01 13:15:02 utopia systemd[910]: tracker-miner-fs-3.service: Failed with result 'signal'. Nov 01 13:15:02 utopia systemd[910]: gvfs-mtp-volume-monitor.service: Main process exited, code=killed, status=11/SEGV Nov 01 13:15:02 utopia systemd[910]: gvfs-mtp-volume-monitor.service: Failed with result 'signal'. Nov 01 13:15:02 utopia kernel: Process 94237(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:02 utopia kernel: Aborting core Nov 01 13:15:02 utopia systemd[910]: tracker-miner-fs-3.service: Scheduled restart job, restart counter is at 1. Nov 01 13:15:02 utopia systemd[910]: Stopped Tracker file system data miner. Nov 01 13:15:02 utopia systemd[910]: Starting Tracker file system data miner... Nov 01 13:15:02 utopia tracker-miner-f[94248]: Corrupt database: sqlite integrity check returned '*** in database main *** Page 190: btreeInitPage() returns error code 11' Nov 01 13:15:02 utopia tracker-miner-f[94248]: Could not create store/endpoint: Corrupt db file Nov 01 13:15:02 utopia systemd[910]: tracker-miner-fs-3.service: Main process exited, code=exited, status=1/FAILURE Nov 01 13:15:02 utopia systemd[910]: tracker-miner-fs-3.service: Failed with result 'exit-code'. Nov 01 13:15:02 utopia systemd[910]: Failed to start Tracker file system data miner. Nov 01 13:15:02 utopia systemd[910]: realloc(): invalid pointer Nov 01 13:15:02 utopia kernel: Process 94253(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:02 utopia kernel: Aborting core Nov 01 13:15:03 utopia systemd[920]: pam_unix(systemd-user:session): session closed for user pinokio Nov 01 13:15:04 utopia kernel: Process 94267(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:04 utopia kernel: Aborting core Nov 01 13:15:04 utopia kernel: Process 94296(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:04 utopia kernel: Aborting core Nov 01 13:15:06 utopia kernel: Process 94336(apport) has RLIMIT_CORE set to 1 Nov 01 13:15:06 utopia kernel: Aborting core Nov 01 13:15:07 utopia kernel: show_signal: 16 callbacks suppressed Nov 01 13:15:07 utopia kernel: traps: python3[94361] general protection fault ip:53c600 sp:7ffc9b7591e0 error:0 in python3.11[41f000+233000] Nov 01 13:15:10 utopia kernel: traps: python[94412] general protection fault ip:53529d sp:7ffea2782780 error:0 in python3.11[41f000+233000] ```
GiteaMirror added the bugnvidia labels 2026-05-03 09:16:31 -05:00
Author
Owner

@chymian commented on GitHub (Nov 18, 2023):

there seems to be a little progress.
just tested it with nexusraven on 0.1.10 (docker) and it does not crash the machine any longer.
but it's still producing garbage output, like in 961

ollama run nexusraven
>>> why is the sky black?
ROUP###########################################################^C
<!-- gh-comment-id:1817465574 --> @chymian commented on GitHub (Nov 18, 2023): there seems to be a little progress. just tested it with nexusraven on 0.1.10 (docker) and it does not crash the machine any longer. but it's still producing garbage output, like in [961](https://github.com/jmorganca/ollama/issues/961) ```bash ollama run nexusraven >>> why is the sky black? ROUP###########################################################^C ```
Author
Owner

@chymian commented on GitHub (Nov 22, 2023):

@mchiang0610 @jmorganca any news on that?
ollama is still completly unusable on multi-GPU for me.

<!-- gh-comment-id:1822364496 --> @chymian commented on GitHub (Nov 22, 2023): @mchiang0610 @jmorganca any news on that? ollama is still completly unusable on multi-GPU for me.
Author
Owner

@mxyng commented on GitHub (Jan 17, 2024):

@chymian would you be able to retest with the latest (0.1.20) ollama? There's been significant improvements to model running including for multi-GPU.

$ ollama run nexusraven
why is the sky black?

I'm not sure if this question makes any sense at all, but it made me think: "Why is the sky black?"
The reason for this question is because the sun has set and it is dark outside. This means that the light from the sun has disappeared and there are no stars in the sky to illuminate it. The only thing left in the sky is a very dark blackness, which makes it look like nighttime.
But the real reason why the sky looks black at night is because of the scattering effect of the atmosphere. When light from the sun passes through the air, it is scattered by the tiny particles that make up the air (such as water vapor and dust). This scattering causes the light to be broken up into different colors, which can then be seen in the sky at night. The blue and green lights are scattered more than the red and violet lights, so this is why we see a lot of blue and green in the night sky.
The reason why the sky looks black at night is because it is dark outside, but it is also because the light from the sun has been broken up by the atmosphere into different colors, which can be seen in the sky.

nexusraven is best suited for function calling. Here's an example using the example from their huggingface repo:

$ cat <<EOF >Modelfile
FROM nexusraven
PARAMETER temperature 0.001
TEMPLATE """{{ .System }}

User Query: {{ .Prompt }}<human_end>
"""
SYSTEM """Function:
def get_weather_data(coordinates):
    '''
    Fetches weather data from the Open-Meteo API for the given latitude and longitude.

    Args:
    coordinates (tuple): The latitude of the location.

    Returns:
    float: The current temperature in the coordinates you've asked for
    '''

Function:
def get_coordinates_from_city(city_name):
    '''
    Fetches the latitude and longitude of a given city name using the Maps.co Geocoding API.

    Args:
    city_name (str): The name of the city.

    Returns:
    tuple: The latitude and longitude of the city.
    '''"""
EOF
$ ollama create weatherman
$ ollama run weatherman "What's the weather like in Seattle right now?"

Call: get_weather_data(coordinates=get_coordinates_from_city(city_name='Seattle'))
Thought: The function call `get_weather_data(coordinates=get_coordinates_from_city(city_name='Seattle'))` answers the question "What's the weather like in Seattle right now?" because it performs the following steps:

1. It calls the `get_coordinates_from_city` function with the argument `'Seattle'` to get the latitude and longitude of the city.
2. It then calls the `get_weather_data` function with the coordinates as an argument to get the current weather data for that location.
3. The `get_weather_data` function uses the Open-Meteo API to fetch the weather data for the given coordinates, and returns it in a JSON format.
4. The returned JSON data contains the current temperature, humidity, wind speed, etc. for the specified location.

Therefore, by calling `get_weather_data` with the coordinates of Seattle as an argument, we can get the current weather data for that city.

This output is very close to the example output from huggingface

<!-- gh-comment-id:1896466642 --> @mxyng commented on GitHub (Jan 17, 2024): @chymian would you be able to retest with the latest (0.1.20) ollama? There's been significant improvements to model running including for multi-GPU. ``` $ ollama run nexusraven why is the sky black? I'm not sure if this question makes any sense at all, but it made me think: "Why is the sky black?" The reason for this question is because the sun has set and it is dark outside. This means that the light from the sun has disappeared and there are no stars in the sky to illuminate it. The only thing left in the sky is a very dark blackness, which makes it look like nighttime. But the real reason why the sky looks black at night is because of the scattering effect of the atmosphere. When light from the sun passes through the air, it is scattered by the tiny particles that make up the air (such as water vapor and dust). This scattering causes the light to be broken up into different colors, which can then be seen in the sky at night. The blue and green lights are scattered more than the red and violet lights, so this is why we see a lot of blue and green in the night sky. The reason why the sky looks black at night is because it is dark outside, but it is also because the light from the sun has been broken up by the atmosphere into different colors, which can be seen in the sky. ``` nexusraven is best suited for function calling. Here's an example using the example from their huggingface [repo](https://huggingface.co/Nexusflow/NexusRaven-V2-13B): ``` $ cat <<EOF >Modelfile FROM nexusraven PARAMETER temperature 0.001 TEMPLATE """{{ .System }} User Query: {{ .Prompt }}<human_end> """ SYSTEM """Function: def get_weather_data(coordinates): ''' Fetches weather data from the Open-Meteo API for the given latitude and longitude. Args: coordinates (tuple): The latitude of the location. Returns: float: The current temperature in the coordinates you've asked for ''' Function: def get_coordinates_from_city(city_name): ''' Fetches the latitude and longitude of a given city name using the Maps.co Geocoding API. Args: city_name (str): The name of the city. Returns: tuple: The latitude and longitude of the city. '''""" EOF $ ollama create weatherman $ ollama run weatherman "What's the weather like in Seattle right now?" Call: get_weather_data(coordinates=get_coordinates_from_city(city_name='Seattle')) Thought: The function call `get_weather_data(coordinates=get_coordinates_from_city(city_name='Seattle'))` answers the question "What's the weather like in Seattle right now?" because it performs the following steps: 1. It calls the `get_coordinates_from_city` function with the argument `'Seattle'` to get the latitude and longitude of the city. 2. It then calls the `get_weather_data` function with the coordinates as an argument to get the current weather data for that location. 3. The `get_weather_data` function uses the Open-Meteo API to fetch the weather data for the given coordinates, and returns it in a JSON format. 4. The returned JSON data contains the current temperature, humidity, wind speed, etc. for the specified location. Therefore, by calling `get_weather_data` with the coordinates of Seattle as an argument, we can get the current weather data for that city. ``` This output is very close to the example output from huggingface
Author
Owner

@chymian commented on GitHub (Feb 4, 2024):

@mxyng , I had some time to do a resetup of my GPU-RIG to give ollama 2 x 3070 (8G).
it loads the half of the LLM's layer to the first GPU, rest to CPU.
it does not crash but is extremly slow.

trying to unload more layers with num_gpu -paramter (which in every other LLM-SW stands for amout of GPUS, not layers)
ollama (docker) crashes, when filled up the VRAM of the 1. GPU.
haven't found any parameters for loading a model to a distinct, or a group of distinct GPUs.
so, no its not working, even so the error changed.

<!-- gh-comment-id:1925792923 --> @chymian commented on GitHub (Feb 4, 2024): @mxyng , I had some time to do a resetup of my GPU-RIG to give ollama 2 x 3070 (8G). it loads the half of the LLM's layer to the first GPU, rest to CPU. it does not crash but is extremly slow. trying to unload more layers with `num_gpu` -paramter (which in every other LLM-SW stands for amout of GPUS, not layers) ollama (docker) crashes, when filled up the VRAM of the 1. GPU. haven't found any parameters for loading a model to a distinct, or a group of distinct GPUs. so, no its not working, even so the error changed.
Author
Owner

@dhiltgen commented on GitHub (Mar 12, 2024):

@chymian it sounds like the original issue of crashing has been resolved on newer versions. If I understand correctly, you're now facing challenges with multiple GPUs and how memory is loaded across them. Can you upgrade to the latest version and share your server log, along with nvidia-smi output showing the skew so we can try to get a better understanding on what's going on? Setting OLLAMA_DEBUG=1 for the server may be helpful to get more verbose logging.

<!-- gh-comment-id:1991981405 --> @dhiltgen commented on GitHub (Mar 12, 2024): @chymian it sounds like the original issue of crashing has been resolved on newer versions. If I understand correctly, you're now facing challenges with multiple GPUs and how memory is loaded across them. Can you upgrade to the latest version and share your server log, along with nvidia-smi output showing the skew so we can try to get a better understanding on what's going on? Setting `OLLAMA_DEBUG=1` for the server may be helpful to get more verbose logging.
Author
Owner

@dhiltgen commented on GitHub (Apr 12, 2024):

Our approach to multi-GPU will change once PR #3418 merges, so I would recommend trying it out once that's available in a release.

<!-- gh-comment-id:2052609353 --> @dhiltgen commented on GitHub (Apr 12, 2024): Our approach to multi-GPU will change once PR #3418 merges, so I would recommend trying it out once that's available in a release.
Author
Owner

@dhiltgen commented on GitHub (May 4, 2024):

Now that 0.1.33 is shipped with concurrency support, I'm going to close this one out. If you're still having problems with the new scheduler logic in the latest release, please share an updated server log and I'll re-open.

<!-- gh-comment-id:2094418647 --> @dhiltgen commented on GitHub (May 4, 2024): Now that 0.1.33 is shipped with concurrency support, I'm going to close this one out. If you're still having problems with the new scheduler logic in the latest release, please share an updated server log and I'll re-open.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62507