[GH-ISSUE #3527] Ollama conflict with amdgpu driver on Debian #48686

Closed
opened 2026-04-28 09:05:53 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @hpsaturn on GitHub (Apr 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3527

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I notice that my Debian fails after the first suspend, it can't suspend again because the amdgpu driver has a kernel exception. Researching that, I found the Ollama service can't stop and also it produces this behavior. My current workaround is disable the systemd ollama service in the boot, with that my Debian is able to resume after each suspend.

What did you expect to see?

Not conflicts with my amdgpu driver and able to stop ollama service.

Steps to reproduce

  • install ollama via curl (official installation)
  • suspend your machine
  • wakeup your machine
  • suspend again (the machine dies, and you cant wakeup it again)

Are there any recent changes that introduced the issue?

I'm new with Ollama, but the last version (0.1.30) of curl reproduce the problem

OS

Linux

Architecture

amd64

Platform

No response

Ollama version

0.1.30

GPU

AMD

GPU info

kernel exception details after first wakeup after suspend:

Apr  7 20:01:45 minisf kernel: [   75.265979] ACPI: PM: Waking up from system sleep state S3
Apr  7 20:01:45 minisf kernel: [   75.267852] pci 0000:00:00.2: can't derive routing for PCI INT A
Apr  7 20:01:45 minisf kernel: [   75.267854] pci 0000:00:00.2: PCI INT A: no GSI
Apr  7 20:01:45 minisf kernel: [   75.267886] xhci_hcd 0000:01:00.0: xHC error in resume, USBSTS 0x401, Reinit
Apr  7 20:01:45 minisf kernel: [   75.267888] usb usb1: root hub lost power or was reset
Apr  7 20:01:45 minisf kernel: [   75.267889] usb usb2: root hub lost power or was reset
Apr  7 20:01:45 minisf kernel: [   75.268081] [drm] PCIE GART of 1024M enabled.
Apr  7 20:01:45 minisf kernel: [   75.268085] [drm] PTB located at 0x000000F41FC00000
Apr  7 20:01:45 minisf kernel: [   75.268099] [drm] PSP is resuming...
Apr  7 20:01:45 minisf kernel: [   75.287957] [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
Apr  7 20:01:45 minisf kernel: [   75.339908] nvme nvme1: 8/0/0 default/read/poll queues
Apr  7 20:01:45 minisf kernel: [   75.375226] amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
Apr  7 20:01:45 minisf kernel: [   75.383457] amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
Apr  7 20:01:45 minisf kernel: [   75.383458] amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Apr  7 20:01:45 minisf kernel: [   75.383460] amdgpu 0000:07:00.0: amdgpu: SMU is resuming...
Apr  7 20:01:45 minisf kernel: [   75.383502] amdgpu 0000:07:00.0: amdgpu: dpm has been disabled
Apr  7 20:01:45 minisf kernel: [   75.385720] amdgpu 0000:07:00.0: amdgpu: SMU is resumed successfully!
Apr  7 20:01:45 minisf kernel: [   75.386290] [drm] DMUB hardware initialized: version=0x0101000A
Apr  7 20:01:45 minisf kernel: [   75.414203] [drm] Unknown EDID CEA parser results
Apr  7 20:01:45 minisf kernel: [   75.441521] [drm] Unknown EDID CEA parser results
Apr  7 20:01:45 minisf kernel: [   75.506752] [drm] kiq ring mec 2 pipe 1 q 0
Apr  7 20:01:45 minisf kernel: [   75.510655] [drm] VCN decode and encode initialized successfully(under DPG Mode).
Apr  7 20:01:45 minisf kernel: [   75.510892] [drm] JPEG decode initialized successfully.
Apr  7 20:01:45 minisf kernel: [   75.510898] amdgpu 0000:07:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Apr  7 20:01:45 minisf kernel: [   75.510899] amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Apr  7 20:01:45 minisf kernel: [   75.510900] amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Apr  7 20:01:45 minisf kernel: [   75.510900] amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Apr  7 20:01:45 minisf kernel: [   75.510901] amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Apr  7 20:01:45 minisf kernel: [   75.510901] amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Apr  7 20:01:45 minisf kernel: [   75.510901] amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Apr  7 20:01:45 minisf kernel: [   75.510902] amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Apr  7 20:01:45 minisf kernel: [   75.510902] amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Apr  7 20:01:45 minisf kernel: [   75.510903] amdgpu 0000:07:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Apr  7 20:01:45 minisf kernel: [   75.510903] amdgpu 0000:07:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
Apr  7 20:01:45 minisf kernel: [   75.510904] amdgpu 0000:07:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
Apr  7 20:01:45 minisf kernel: [   75.510904] amdgpu 0000:07:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
Apr  7 20:01:45 minisf kernel: [   75.510905] amdgpu 0000:07:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
Apr  7 20:01:45 minisf kernel: [   75.510905] amdgpu 0000:07:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
Apr  7 20:01:45 minisf kernel: [   75.512851] PGD 0 P4D 0 
Apr  7 20:01:45 minisf kernel: [   75.512853] Oops: 0000 [#1] PREEMPT SMP NOPTI
Apr  7 20:01:45 minisf kernel: [   75.512854] CPU: 4 PID: 102 Comm: kworker/u64:8 Not tainted 6.1.0-0.deb11.17-amd64 #1  Debian 6.1.69-1~bpo11+1
Apr  7 20:01:45 minisf kernel: [   75.512857] Hardware name: BESSTAR TECH LIMITED B550/B550, BIOS 5.17 03/31/2022
Apr  7 20:01:45 minisf kernel: [   75.512858] Workqueue: kfd_restore_wq restore_process_worker [amdgpu]
Apr  7 20:01:45 minisf kernel: [   75.512993] RIP: 0010:amdgpu_amdkfd_gpuvm_restore_process_bos+0x75/0x680 [amdgpu]
Apr  7 20:01:45 minisf kernel: [   75.513109] Code: 00 00 48 89 84 24 e0 00 00 00 48 89 84 24 e8 00 00 00 48 8d 84 24 f0 00 00 00 48 89 84 24 f0 00 00 00 48 89 84 24 f8 00 00 00 <8b> 47 60 48 8d 3c c0 48 c1 e7 03 e8 6b 85 02 e1 48 85 c0 0f 84 bd
Apr  7 20:01:45 minisf kernel: [   75.513110] RSP: 0018:ffffae858054fcb0 EFLAGS: 00010246
Apr  7 20:01:45 minisf kernel: [   75.513111] RAX: ffffae858054fda0 RBX: 0000000000000000 RCX: ffff8caa80059028
Apr  7 20:01:45 minisf kernel: [   75.513112] RDX: 0000000000000001 RSI: 0000000000000dc0 RDI: 0000000000000000
Apr  7 20:01:45 minisf kernel: [   75.513113] RBP: ffff8caa9396a800 R08: ffff8caa9396aa28 R09: ffff8caa80e2a074
Apr  7 20:01:45 minisf kernel: [   75.513113] R10: 000000000000000f R11: 000000000000000f R12: ffff8caa9396aa20
Apr  7 20:01:45 minisf kernel: [   75.513114] R13: ffff8caa865f1800 R14: 0000000000000000 R15: ffff8caa865f1805
Apr  7 20:01:45 minisf kernel: [   75.513115] FS:  0000000000000000(0000) GS:ffff8cb17e300000(0000) knlGS:0000000000000000
Apr  7 20:01:45 minisf kernel: [   75.513115] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  7 20:01:45 minisf kernel: [   75.513116] CR2: 0000000000000060 CR3: 0000000477810000 CR4: 0000000000750ee0
Apr  7 20:01:45 minisf kernel: [   75.513117] PKRU: 55555554
Apr  7 20:01:45 minisf kernel: [   75.513117] Call Trace:
Apr  7 20:01:45 minisf kernel: [   75.513120]  <TASK>
Apr  7 20:01:45 minisf kernel: [   75.513122]  ? __die_body.cold+0x1a/0x1f
Apr  7 20:01:45 minisf kernel: [   75.513125]  ? page_fault_oops+0xae/0x280
Apr  7 20:01:45 minisf kernel: [   75.513127]  ? exc_page_fault+0x71/0x170
Apr  7 20:01:45 minisf kernel: [   75.513129]  ? asm_exc_page_fault+0x22/0x30
Apr  7 20:01:45 minisf kernel: [   75.513133]  ? amdgpu_amdkfd_gpuvm_restore_process_bos+0x75/0x680 [amdgpu]
Apr  7 20:01:45 minisf kernel: [   75.513235]  ? load_balance+0xa95/0xd70
Apr  7 20:01:45 minisf kernel: [   75.513238]  ? psi_group_change+0x151/0x340
Apr  7 20:01:45 minisf kernel: [   75.513240]  ? psi_task_switch+0xd7/0x230
Apr  7 20:01:45 minisf kernel: [   75.513242]  ? __switch_to_asm+0x3a/0x60
Apr  7 20:01:45 minisf kernel: [   75.513244]  ? finish_task_switch.isra.0+0x8f/0x2d0
Apr  7 20:01:45 minisf kernel: [   75.513246]  restore_process_worker+0x30/0xf0 [amdgpu]
Apr  7 20:01:45 minisf kernel: [   75.513347]  process_one_work+0x1e5/0x3b0
Apr  7 20:01:45 minisf kernel: [   75.513351]  worker_thread+0x50/0x3a0
Apr  7 20:01:45 minisf kernel: [   75.513353]  ? rescuer_thread+0x390/0x390
Apr  7 20:01:45 minisf kernel: [   75.513354]  kthread+0xd8/0x100
Apr  7 20:01:45 minisf kernel: [   75.513356]  ? kthread_complete_and_exit+0x20/0x20
Apr  7 20:01:45 minisf kernel: [   75.513358]  ret_from_fork+0x22/0x30
Apr  7 20:01:45 minisf kernel: [   75.513360]  </TASK>
Apr  7 20:01:45 minisf kernel: [   75.513361] Modules linked in: nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge stp llc binfmt_misc cmac algif_hash algif_skcipher af_alg bnep overlay ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4 mt7921e btusb xt_LOG btrtl mt7921_common nf_log_syslog btbcm btintel mt76_connac_lib btmtk amdgpu xt_comment mt76 bluetooth squashfs snd_hda_codec_hdmi nft_limit snd_hda_intel mac80211 jitterentropy_rng snd_intel_dspcfg intel_rapl_msr snd_intel_sdw_acpi snd_usb_audio gpu_sched intel_rapl_common libarc4 snd_hda_codec drm_buddy snd_usbmidi_lib nls_ascii edac_mce_amd drbg uvcvideo drm_display_helper nls_cp437 snd_rawmidi snd_hda_core ansi_cprng videobuf2_vmalloc cfg80211 snd_pci_acp6x snd_seq_device snd_hwdep cec vfat videobuf2_memops rc_core kvm_amd ecdh_generic snd_pcm videobuf2_v4l2 fat snd_pci_acp5x drm_ttm_helper ecc videobuf2_common joydev cdc_acm loop rfkill snd_timer ttm snd_rn_pci_acp3x kvm drm_kms_helper
Apr  7 20:01:45 minisf kernel: [   75.513392]  snd_acp_config snd xt_limit sp5100_tco irqbypass snd_soc_acpi i2c_algo_bit ccp snd_pci_acp3x soundcore xt_addrtype k10temp rapl watchdog wmi_bmof efi_pstore pcspkr xt_tcpudp evdev acpi_cpufreq button xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables videodev libcrc32c nfnetlink mc drm fuse configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_cmedia hid_logitech_hidpp hid_logitech_dj hid_generic crc32_pclmul crc32c_intel usbhid hid ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 nvme ahci libahci xhci_pci nvme_core libata xhci_hcd t10_pi crc64_rocksoft_generic aesni_intel crypto_simd crc64_rocksoft cryptd usbcore scsi_mod igc crc_t10dif crct10dif_generic i2c_piix4 crct10dif_pclmul crc64 crct10dif_common usb_common scsi_common video wmi gpio_amdpt gpio_generic
Apr  7 20:01:45 minisf kernel: [   75.513421] CR2: 0000000000000060
Apr  7 20:01:45 minisf kernel: [   75.513422] ---[ end trace 0000000000000000 ]---
Apr  7 20:01:45 minisf kernel: [   75.584387] ata4: SATA link down (SStatus 0 SControl 330)
Apr  7 20:01:45 minisf kernel: [   75.584408] ata5: SATA link down (SStatus 0 SControl 330)
Apr  7 20:01:45 minisf kernel: [   75.584430] ata1: SATA link down (SStatus 0 SControl 330)
Apr  7 20:01:45 minisf kernel: [   75.584441] ata7: SATA link down (SStatus 0 SControl 300)
Apr  7 20:01:45 minisf kernel: [   75.584443] ata8: SATA link down (SStatus 0 SControl 300)
Apr  7 20:01:45 minisf kernel: [   75.584449] ata2: SATA link down (SStatus 0 SControl 330)
Apr  7 20:01:45 minisf kernel: [   75.584466] ata3: SATA link down (SStatus 0 SControl 330)
Apr  7 20:01:45 minisf kernel: [   75.584487] ata6: SATA link down (SStatus 0 SControl 330)
Apr  7 20:01:45 minisf kernel: [   75.616395] usb 1-1: reset full-speed USB device number 2 using xhci_hcd
Apr  7 20:01:45 minisf kernel: [   75.623721] RIP: 0010:amdgpu_amdkfd_gpuvm_restore_process_bos+0x75/0x680 [amdgpu]
Apr  7 20:01:45 minisf kernel: [   75.623843] Code: 00 00 48 89 84 24 e0 00 00 00 48 89 84 24 e8 00 00 00 48 8d 84 24 f0 00 00 00 48 89 84 24 f0 00 00 00 48 89 84 24 f8 00 00 00 <8b> 47 60 48 8d 3c c0 48 c1 e7 03 e8 6b 85 02 e1 48 85 c0 0f 84 bd
Apr  7 20:01:45 minisf kernel: [   75.623844] RSP: 0018:ffffae858054fcb0 EFLAGS: 00010246
Apr  7 20:01:45 minisf kernel: [   75.623844] RAX: ffffae858054fda0 RBX: 0000000000000000 RCX: ffff8caa80059028
Apr  7 20:01:45 minisf kernel: [   75.623845] RDX: 0000000000000001 RSI: 0000000000000dc0 RDI: 0000000000000000
Apr  7 20:01:45 minisf kernel: [   75.623845] RBP: ffff8caa9396a800 R08: ffff8caa9396aa28 R09: ffff8caa80e2a074
Apr  7 20:01:45 minisf kernel: [   75.623846] R10: 000000000000000f R11: 000000000000000f R12: ffff8caa9396aa20
Apr  7 20:01:45 minisf kernel: [   75.623846] R13: ffff8caa865f1800 R14: 0000000000000000 R15: ffff8caa865f1805
Apr  7 20:01:45 minisf kernel: [   75.623847] FS:  0000000000000000(0000) GS:ffff8cb17e300000(0000) knlGS:0000000000000000
Apr  7 20:01:45 minisf kernel: [   75.623847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  7 20:01:45 minisf kernel: [   75.623848] CR2: 0000000000000060 CR3: 0000000477810000 CR4: 0000000000750ee0
Apr  7 20:01:45 minisf kernel: [   75.623849] PKRU: 55555554
Apr  7 20:01:45 minisf kernel: [   75.623849] note: kworker/u64:8[102] exited with irqs disabled
Apr  7 20:01:45 minisf kernel: [   75.628470] nvme nvme0: 8/0/0 default/read/poll queues
Apr  7 20:01:45 minisf kernel: [   76.039922] usb 1-6: reset full-speed USB device number 6 using xhci_hcd
Apr  7 20:01:45 minisf kernel: [   76.532304] usb 1-2: reset high-speed USB device number 3 using xhci_hcd
Apr  7 20:01:45 minisf kernel: [   76.892313] usb 1-5: reset high-speed USB device number 4 using xhci_hcd
Apr  7 20:01:45 minisf kernel: [   77.246326] usb 2-2: reset SuperSpeed USB device number 2 using xhci_hcd
Apr  7 20:01:45 minisf kernel: [   77.360800] usb 1-2.4: reset high-speed USB device number 5 using xhci_hcd
Apr  7 20:01:45 minisf kernel: [   77.512863] OOM killer enabled.
Apr  7 20:01:45 minisf kernel: [   77.512865] Restarting tasks ... done.

CPU

AMD

Other software

No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye

Originally created by @hpsaturn on GitHub (Apr 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3527 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I notice that my Debian fails after the first suspend, it can't suspend again because the amdgpu driver has a kernel exception. Researching that, I found the Ollama service can't stop and also it produces this behavior. My current workaround is disable the systemd ollama service in the boot, with that my Debian is able to resume after each suspend. ### What did you expect to see? Not conflicts with my amdgpu driver and able to stop ollama service. ### Steps to reproduce - install ollama via curl (official installation) - suspend your machine - wakeup your machine - suspend again (the machine dies, and you cant wakeup it again) ### Are there any recent changes that introduced the issue? I'm new with Ollama, but the last version (0.1.30) of curl reproduce the problem ### OS Linux ### Architecture amd64 ### Platform _No response_ ### Ollama version 0.1.30 ### GPU AMD ### GPU info kernel exception details after first wakeup after suspend: ```bash Apr 7 20:01:45 minisf kernel: [ 75.265979] ACPI: PM: Waking up from system sleep state S3 Apr 7 20:01:45 minisf kernel: [ 75.267852] pci 0000:00:00.2: can't derive routing for PCI INT A Apr 7 20:01:45 minisf kernel: [ 75.267854] pci 0000:00:00.2: PCI INT A: no GSI Apr 7 20:01:45 minisf kernel: [ 75.267886] xhci_hcd 0000:01:00.0: xHC error in resume, USBSTS 0x401, Reinit Apr 7 20:01:45 minisf kernel: [ 75.267888] usb usb1: root hub lost power or was reset Apr 7 20:01:45 minisf kernel: [ 75.267889] usb usb2: root hub lost power or was reset Apr 7 20:01:45 minisf kernel: [ 75.268081] [drm] PCIE GART of 1024M enabled. Apr 7 20:01:45 minisf kernel: [ 75.268085] [drm] PTB located at 0x000000F41FC00000 Apr 7 20:01:45 minisf kernel: [ 75.268099] [drm] PSP is resuming... Apr 7 20:01:45 minisf kernel: [ 75.287957] [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR Apr 7 20:01:45 minisf kernel: [ 75.339908] nvme nvme1: 8/0/0 default/read/poll queues Apr 7 20:01:45 minisf kernel: [ 75.375226] amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available Apr 7 20:01:45 minisf kernel: [ 75.383457] amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available Apr 7 20:01:45 minisf kernel: [ 75.383458] amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available Apr 7 20:01:45 minisf kernel: [ 75.383460] amdgpu 0000:07:00.0: amdgpu: SMU is resuming... Apr 7 20:01:45 minisf kernel: [ 75.383502] amdgpu 0000:07:00.0: amdgpu: dpm has been disabled Apr 7 20:01:45 minisf kernel: [ 75.385720] amdgpu 0000:07:00.0: amdgpu: SMU is resumed successfully! Apr 7 20:01:45 minisf kernel: [ 75.386290] [drm] DMUB hardware initialized: version=0x0101000A Apr 7 20:01:45 minisf kernel: [ 75.414203] [drm] Unknown EDID CEA parser results Apr 7 20:01:45 minisf kernel: [ 75.441521] [drm] Unknown EDID CEA parser results Apr 7 20:01:45 minisf kernel: [ 75.506752] [drm] kiq ring mec 2 pipe 1 q 0 Apr 7 20:01:45 minisf kernel: [ 75.510655] [drm] VCN decode and encode initialized successfully(under DPG Mode). Apr 7 20:01:45 minisf kernel: [ 75.510892] [drm] JPEG decode initialized successfully. Apr 7 20:01:45 minisf kernel: [ 75.510898] amdgpu 0000:07:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0 Apr 7 20:01:45 minisf kernel: [ 75.510899] amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Apr 7 20:01:45 minisf kernel: [ 75.510900] amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Apr 7 20:01:45 minisf kernel: [ 75.510900] amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 Apr 7 20:01:45 minisf kernel: [ 75.510901] amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 Apr 7 20:01:45 minisf kernel: [ 75.510901] amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 Apr 7 20:01:45 minisf kernel: [ 75.510901] amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 Apr 7 20:01:45 minisf kernel: [ 75.510902] amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 Apr 7 20:01:45 minisf kernel: [ 75.510902] amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 Apr 7 20:01:45 minisf kernel: [ 75.510903] amdgpu 0000:07:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 Apr 7 20:01:45 minisf kernel: [ 75.510903] amdgpu 0000:07:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1 Apr 7 20:01:45 minisf kernel: [ 75.510904] amdgpu 0000:07:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1 Apr 7 20:01:45 minisf kernel: [ 75.510904] amdgpu 0000:07:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1 Apr 7 20:01:45 minisf kernel: [ 75.510905] amdgpu 0000:07:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1 Apr 7 20:01:45 minisf kernel: [ 75.510905] amdgpu 0000:07:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1 Apr 7 20:01:45 minisf kernel: [ 75.512851] PGD 0 P4D 0 Apr 7 20:01:45 minisf kernel: [ 75.512853] Oops: 0000 [#1] PREEMPT SMP NOPTI Apr 7 20:01:45 minisf kernel: [ 75.512854] CPU: 4 PID: 102 Comm: kworker/u64:8 Not tainted 6.1.0-0.deb11.17-amd64 #1 Debian 6.1.69-1~bpo11+1 Apr 7 20:01:45 minisf kernel: [ 75.512857] Hardware name: BESSTAR TECH LIMITED B550/B550, BIOS 5.17 03/31/2022 Apr 7 20:01:45 minisf kernel: [ 75.512858] Workqueue: kfd_restore_wq restore_process_worker [amdgpu] Apr 7 20:01:45 minisf kernel: [ 75.512993] RIP: 0010:amdgpu_amdkfd_gpuvm_restore_process_bos+0x75/0x680 [amdgpu] Apr 7 20:01:45 minisf kernel: [ 75.513109] Code: 00 00 48 89 84 24 e0 00 00 00 48 89 84 24 e8 00 00 00 48 8d 84 24 f0 00 00 00 48 89 84 24 f0 00 00 00 48 89 84 24 f8 00 00 00 <8b> 47 60 48 8d 3c c0 48 c1 e7 03 e8 6b 85 02 e1 48 85 c0 0f 84 bd Apr 7 20:01:45 minisf kernel: [ 75.513110] RSP: 0018:ffffae858054fcb0 EFLAGS: 00010246 Apr 7 20:01:45 minisf kernel: [ 75.513111] RAX: ffffae858054fda0 RBX: 0000000000000000 RCX: ffff8caa80059028 Apr 7 20:01:45 minisf kernel: [ 75.513112] RDX: 0000000000000001 RSI: 0000000000000dc0 RDI: 0000000000000000 Apr 7 20:01:45 minisf kernel: [ 75.513113] RBP: ffff8caa9396a800 R08: ffff8caa9396aa28 R09: ffff8caa80e2a074 Apr 7 20:01:45 minisf kernel: [ 75.513113] R10: 000000000000000f R11: 000000000000000f R12: ffff8caa9396aa20 Apr 7 20:01:45 minisf kernel: [ 75.513114] R13: ffff8caa865f1800 R14: 0000000000000000 R15: ffff8caa865f1805 Apr 7 20:01:45 minisf kernel: [ 75.513115] FS: 0000000000000000(0000) GS:ffff8cb17e300000(0000) knlGS:0000000000000000 Apr 7 20:01:45 minisf kernel: [ 75.513115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 7 20:01:45 minisf kernel: [ 75.513116] CR2: 0000000000000060 CR3: 0000000477810000 CR4: 0000000000750ee0 Apr 7 20:01:45 minisf kernel: [ 75.513117] PKRU: 55555554 Apr 7 20:01:45 minisf kernel: [ 75.513117] Call Trace: Apr 7 20:01:45 minisf kernel: [ 75.513120] <TASK> Apr 7 20:01:45 minisf kernel: [ 75.513122] ? __die_body.cold+0x1a/0x1f Apr 7 20:01:45 minisf kernel: [ 75.513125] ? page_fault_oops+0xae/0x280 Apr 7 20:01:45 minisf kernel: [ 75.513127] ? exc_page_fault+0x71/0x170 Apr 7 20:01:45 minisf kernel: [ 75.513129] ? asm_exc_page_fault+0x22/0x30 Apr 7 20:01:45 minisf kernel: [ 75.513133] ? amdgpu_amdkfd_gpuvm_restore_process_bos+0x75/0x680 [amdgpu] Apr 7 20:01:45 minisf kernel: [ 75.513235] ? load_balance+0xa95/0xd70 Apr 7 20:01:45 minisf kernel: [ 75.513238] ? psi_group_change+0x151/0x340 Apr 7 20:01:45 minisf kernel: [ 75.513240] ? psi_task_switch+0xd7/0x230 Apr 7 20:01:45 minisf kernel: [ 75.513242] ? __switch_to_asm+0x3a/0x60 Apr 7 20:01:45 minisf kernel: [ 75.513244] ? finish_task_switch.isra.0+0x8f/0x2d0 Apr 7 20:01:45 minisf kernel: [ 75.513246] restore_process_worker+0x30/0xf0 [amdgpu] Apr 7 20:01:45 minisf kernel: [ 75.513347] process_one_work+0x1e5/0x3b0 Apr 7 20:01:45 minisf kernel: [ 75.513351] worker_thread+0x50/0x3a0 Apr 7 20:01:45 minisf kernel: [ 75.513353] ? rescuer_thread+0x390/0x390 Apr 7 20:01:45 minisf kernel: [ 75.513354] kthread+0xd8/0x100 Apr 7 20:01:45 minisf kernel: [ 75.513356] ? kthread_complete_and_exit+0x20/0x20 Apr 7 20:01:45 minisf kernel: [ 75.513358] ret_from_fork+0x22/0x30 Apr 7 20:01:45 minisf kernel: [ 75.513360] </TASK> Apr 7 20:01:45 minisf kernel: [ 75.513361] Modules linked in: nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge stp llc binfmt_misc cmac algif_hash algif_skcipher af_alg bnep overlay ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4 mt7921e btusb xt_LOG btrtl mt7921_common nf_log_syslog btbcm btintel mt76_connac_lib btmtk amdgpu xt_comment mt76 bluetooth squashfs snd_hda_codec_hdmi nft_limit snd_hda_intel mac80211 jitterentropy_rng snd_intel_dspcfg intel_rapl_msr snd_intel_sdw_acpi snd_usb_audio gpu_sched intel_rapl_common libarc4 snd_hda_codec drm_buddy snd_usbmidi_lib nls_ascii edac_mce_amd drbg uvcvideo drm_display_helper nls_cp437 snd_rawmidi snd_hda_core ansi_cprng videobuf2_vmalloc cfg80211 snd_pci_acp6x snd_seq_device snd_hwdep cec vfat videobuf2_memops rc_core kvm_amd ecdh_generic snd_pcm videobuf2_v4l2 fat snd_pci_acp5x drm_ttm_helper ecc videobuf2_common joydev cdc_acm loop rfkill snd_timer ttm snd_rn_pci_acp3x kvm drm_kms_helper Apr 7 20:01:45 minisf kernel: [ 75.513392] snd_acp_config snd xt_limit sp5100_tco irqbypass snd_soc_acpi i2c_algo_bit ccp snd_pci_acp3x soundcore xt_addrtype k10temp rapl watchdog wmi_bmof efi_pstore pcspkr xt_tcpudp evdev acpi_cpufreq button xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables videodev libcrc32c nfnetlink mc drm fuse configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_cmedia hid_logitech_hidpp hid_logitech_dj hid_generic crc32_pclmul crc32c_intel usbhid hid ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 nvme ahci libahci xhci_pci nvme_core libata xhci_hcd t10_pi crc64_rocksoft_generic aesni_intel crypto_simd crc64_rocksoft cryptd usbcore scsi_mod igc crc_t10dif crct10dif_generic i2c_piix4 crct10dif_pclmul crc64 crct10dif_common usb_common scsi_common video wmi gpio_amdpt gpio_generic Apr 7 20:01:45 minisf kernel: [ 75.513421] CR2: 0000000000000060 Apr 7 20:01:45 minisf kernel: [ 75.513422] ---[ end trace 0000000000000000 ]--- Apr 7 20:01:45 minisf kernel: [ 75.584387] ata4: SATA link down (SStatus 0 SControl 330) Apr 7 20:01:45 minisf kernel: [ 75.584408] ata5: SATA link down (SStatus 0 SControl 330) Apr 7 20:01:45 minisf kernel: [ 75.584430] ata1: SATA link down (SStatus 0 SControl 330) Apr 7 20:01:45 minisf kernel: [ 75.584441] ata7: SATA link down (SStatus 0 SControl 300) Apr 7 20:01:45 minisf kernel: [ 75.584443] ata8: SATA link down (SStatus 0 SControl 300) Apr 7 20:01:45 minisf kernel: [ 75.584449] ata2: SATA link down (SStatus 0 SControl 330) Apr 7 20:01:45 minisf kernel: [ 75.584466] ata3: SATA link down (SStatus 0 SControl 330) Apr 7 20:01:45 minisf kernel: [ 75.584487] ata6: SATA link down (SStatus 0 SControl 330) Apr 7 20:01:45 minisf kernel: [ 75.616395] usb 1-1: reset full-speed USB device number 2 using xhci_hcd Apr 7 20:01:45 minisf kernel: [ 75.623721] RIP: 0010:amdgpu_amdkfd_gpuvm_restore_process_bos+0x75/0x680 [amdgpu] Apr 7 20:01:45 minisf kernel: [ 75.623843] Code: 00 00 48 89 84 24 e0 00 00 00 48 89 84 24 e8 00 00 00 48 8d 84 24 f0 00 00 00 48 89 84 24 f0 00 00 00 48 89 84 24 f8 00 00 00 <8b> 47 60 48 8d 3c c0 48 c1 e7 03 e8 6b 85 02 e1 48 85 c0 0f 84 bd Apr 7 20:01:45 minisf kernel: [ 75.623844] RSP: 0018:ffffae858054fcb0 EFLAGS: 00010246 Apr 7 20:01:45 minisf kernel: [ 75.623844] RAX: ffffae858054fda0 RBX: 0000000000000000 RCX: ffff8caa80059028 Apr 7 20:01:45 minisf kernel: [ 75.623845] RDX: 0000000000000001 RSI: 0000000000000dc0 RDI: 0000000000000000 Apr 7 20:01:45 minisf kernel: [ 75.623845] RBP: ffff8caa9396a800 R08: ffff8caa9396aa28 R09: ffff8caa80e2a074 Apr 7 20:01:45 minisf kernel: [ 75.623846] R10: 000000000000000f R11: 000000000000000f R12: ffff8caa9396aa20 Apr 7 20:01:45 minisf kernel: [ 75.623846] R13: ffff8caa865f1800 R14: 0000000000000000 R15: ffff8caa865f1805 Apr 7 20:01:45 minisf kernel: [ 75.623847] FS: 0000000000000000(0000) GS:ffff8cb17e300000(0000) knlGS:0000000000000000 Apr 7 20:01:45 minisf kernel: [ 75.623847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 7 20:01:45 minisf kernel: [ 75.623848] CR2: 0000000000000060 CR3: 0000000477810000 CR4: 0000000000750ee0 Apr 7 20:01:45 minisf kernel: [ 75.623849] PKRU: 55555554 Apr 7 20:01:45 minisf kernel: [ 75.623849] note: kworker/u64:8[102] exited with irqs disabled Apr 7 20:01:45 minisf kernel: [ 75.628470] nvme nvme0: 8/0/0 default/read/poll queues Apr 7 20:01:45 minisf kernel: [ 76.039922] usb 1-6: reset full-speed USB device number 6 using xhci_hcd Apr 7 20:01:45 minisf kernel: [ 76.532304] usb 1-2: reset high-speed USB device number 3 using xhci_hcd Apr 7 20:01:45 minisf kernel: [ 76.892313] usb 1-5: reset high-speed USB device number 4 using xhci_hcd Apr 7 20:01:45 minisf kernel: [ 77.246326] usb 2-2: reset SuperSpeed USB device number 2 using xhci_hcd Apr 7 20:01:45 minisf kernel: [ 77.360800] usb 1-2.4: reset high-speed USB device number 5 using xhci_hcd Apr 7 20:01:45 minisf kernel: [ 77.512863] OOM killer enabled. Apr 7 20:01:45 minisf kernel: [ 77.512865] Restarting tasks ... done. ``` ### CPU AMD ### Other software No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 11 (bullseye) Release: 11 Codename: bullseye
GiteaMirror added the bugamd labels 2026-04-28 09:05:53 -05:00
Author
Owner

@dhiltgen commented on GitHub (May 5, 2024):

Please try upgrading to 0.1.33 and see if that clears up the problem. We've changed to using subprocesses for GPU access, and after 5m (configurable) models will be unloaded, and we should now fully release the GPU.

<!-- gh-comment-id:2094523307 --> @dhiltgen commented on GitHub (May 5, 2024): Please try upgrading to 0.1.33 and see if that clears up the problem. We've changed to using subprocesses for GPU access, and after 5m ([configurable](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately)) models will be unloaded, and we should now fully release the GPU.
Author
Owner

@dhiltgen commented on GitHub (May 21, 2024):

If you're still having trouble with unloading, make sure to upgrade to the latest version, and you can also use the --keepalive argument to ollama run to force a model to immediately unload with a zero. If that still doesn't yield a setup that can suspend, let us know.

<!-- gh-comment-id:2123195558 --> @dhiltgen commented on GitHub (May 21, 2024): If you're still having trouble with unloading, make sure to upgrade to the latest version, and you can also use the `--keepalive` argument to `ollama run` to force a model to immediately unload with a zero. If that still doesn't yield a setup that can suspend, let us know.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48686