[GH-ISSUE #7941] signal arrived during cgo execution #67138

Closed
opened 2026-05-04 09:32:24 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @datamg-star on GitHub (Dec 5, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7941

[root@localhost data]# ollama run llama3.1:8b

a
It looks likeError: an error was encountered while running the model: unexpected EOF

tail -200 /var/log/messages
Dec 5 10:29:10 localhost ollama: Device 0: NVIDIA A800-SXM4-40GB, compute capability 8.0, VMM: yes
Dec 5 10:29:10 localhost ollama: llm_load_tensors: ggml ctx size = 0.27 MiB
Dec 5 10:29:11 localhost ollama: llm_load_tensors: offloading 32 repeating layers to GPU
Dec 5 10:29:11 localhost ollama: llm_load_tensors: offloading non-repeating layers to GPU
Dec 5 10:29:11 localhost ollama: llm_load_tensors: offloaded 33/33 layers to GPU
Dec 5 10:29:11 localhost ollama: llm_load_tensors: CPU buffer size = 281.81 MiB
Dec 5 10:29:11 localhost ollama: llm_load_tensors: CUDA0 buffer size = 4403.50 MiB
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: n_ctx = 8192
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: n_batch = 2048
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: n_ubatch = 512
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: flash_attn = 0
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: freq_base = 500000.0
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: freq_scale = 1
Dec 5 10:29:16 localhost ollama: llama_kv_cache_init: CUDA0 KV buffer size = 1024.00 MiB
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: CUDA_Host output buffer size = 2.02 MiB
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: CUDA0 compute buffer size = 560.00 MiB
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: CUDA_Host compute buffer size = 24.01 MiB
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: graph nodes = 1030
Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: graph splits = 2
Dec 5 10:29:16 localhost ollama: time=2024-12-05T10:29:16.838+08:00 level=INFO source=server.go:601 msg="llama runner started in 13.05 seconds"
Dec 5 10:29:16 localhost ollama: [GIN] 2024/12/05 - 10:29:16 | 200 | 13.185062523s | 127.0.0.1 | POST "/api/generate"
Dec 5 10:29:24 localhost ollama: SIGSEGV: segmentation violation
Dec 5 10:29:24 localhost ollama: PC=0x7f74e0682a00 m=4 sigcode=1 addr=0x7f74359ca7ca
Dec 5 10:29:24 localhost ollama: signal arrived during cgo execution
Dec 5 10:29:24 localhost ollama: goroutine 36 gp=0xc000104700 m=4 mp=0xc000057808 [syscall]:
Dec 5 10:29:24 localhost ollama: runtime.cgocall(0x5640ed665110, 0xc0002a8b48)
Dec 5 10:29:24 localhost ollama: runtime/cgocall.go:157 +0x4b fp=0xc0002a8b20 sp=0xc0002a8ae8 pc=0x5640ed3e63cb
Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7f74893e78e0, {0x1, 0x7f74887e89a0, 0x0, 0x0, 0x7f74887ea9b0, 0x7f74887ec9c0, 0x7f74887ee9d0, 0x7f7488802480, 0x0, ...})
Dec 5 10:29:24 localhost ollama: _cgo_gotypes.go:543 +0x52 fp=0xc0002a8b48 sp=0xc0002a8b20 pc=0x5640ed4e3952
Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama.(*Context).Decode.func1(0x5640ed660e0b?, 0x7f74893e78e0?)
Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/llama.go:167 +0xd8 fp=0xc0002a8c68 sp=0xc0002a8b48 pc=0x5640ed4e5f78
Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama.(*Context).Decode(0x5640edc560e0?, 0x0?)
Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/llama.go:167 +0x13 fp=0xc0002a8cb0 sp=0xc0002a8c68 pc=0x5640ed4e5e13
Dec 5 10:29:24 localhost ollama: main.(*Server).processBatch(0xc00013c120, 0xc0002ac000, 0xc0002a8f10)
Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:425 +0x24d fp=0xc0002a8ed0 sp=0xc0002a8cb0 pc=0x5640ed65facd
Dec 5 10:29:24 localhost ollama: main.(*Server).run(0xc00013c120, {0x5640ed99ecc0, 0xc00017a050})
Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:333 +0x1e5 fp=0xc0002a8fb8 sp=0xc0002a8ed0 pc=0x5640ed65f545
Dec 5 10:29:24 localhost ollama: main.main.gowrap2()
Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:934 +0x28 fp=0xc0002a8fe0 sp=0xc0002a8fb8 pc=0x5640ed664148
Dec 5 10:29:24 localhost ollama: runtime.goexit({})
Dec 5 10:29:24 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc0002a8fe8 sp=0xc0002a8fe0 pc=0x5640ed44ede1
Dec 5 10:29:24 localhost ollama: created by main.main in goroutine 1
Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:934 +0xc52
Dec 5 10:29:24 localhost ollama: goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
Dec 5 10:29:24 localhost ollama: runtime.gopark(0xc000034a08?, 0x0?, 0xc0?, 0x61?, 0xc0000298b8?)
Dec 5 10:29:24 localhost ollama: runtime/proc.go:402 +0xce fp=0xc000029880 sp=0xc000029860 pc=0x5640ed41d00e
Dec 5 10:29:24 localhost ollama: runtime.netpollblock(0xc000029918?, 0xed3e5b26?, 0x40?)
Dec 5 10:29:24 localhost ollama: runtime/netpoll.go:573 +0xf7 fp=0xc0000298b8 sp=0xc000029880 pc=0x5640ed415257
Dec 5 10:29:24 localhost ollama: internal/poll.runtime_pollWait(0x7f74975fef20, 0x72)
Dec 5 10:29:24 localhost ollama: runtime/netpoll.go:345 +0x85 fp=0xc0000298d8 sp=0xc0000298b8 pc=0x5640ed449aa5
Dec 5 10:29:24 localhost ollama: internal/poll.(*pollDesc).wait(0x3?, 0x3fe?, 0x0)
Dec 5 10:29:24 localhost ollama: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000029900 sp=0xc0000298d8 pc=0x5640ed4999c7
Dec 5 10:29:24 localhost ollama: internal/poll.(*pollDesc).waitRead(...)
Dec 5 10:29:24 localhost ollama: internal/poll/fd_poll_runtime.go:89
Dec 5 10:29:24 localhost ollama: internal/poll.(*FD).Accept(0xc000174080)
Dec 5 10:29:24 localhost ollama: internal/poll/fd_unix.go:611 +0x2ac fp=0xc0000299a8 sp=0xc000029900 pc=0x5640ed49ae8c
Dec 5 10:29:24 localhost ollama: net.(*netFD).accept(0xc000174080)
Dec 5 10:29:24 localhost ollama: net/fd_unix.go:172 +0x29 fp=0xc000029a60 sp=0xc0000299a8 pc=0x5640ed509a09
Dec 5 10:29:24 localhost ollama: net.(*TCPListener).accept(0xc00013e1c0)
Dec 5 10:29:24 localhost ollama: net/tcpsock_posix.go:159 +0x1e fp=0xc000029a88 sp=0xc000029a60 pc=0x5640ed51a73e
Dec 5 10:29:24 localhost ollama: net.(*TCPListener).Accept(0xc00013e1c0)
Dec 5 10:29:24 localhost ollama: net/tcpsock.go:327 +0x30 fp=0xc000029ab8 sp=0xc000029a88 pc=0x5640ed519a90
Dec 5 10:29:24 localhost ollama: net/http.(*onceCloseListener).Accept(0xc00013c1b0?)
Dec 5 10:29:24 localhost ollama: :1 +0x24 fp=0xc000029ad0 sp=0xc000029ab8 pc=0x5640ed640ca4
Dec 5 10:29:24 localhost ollama: net/http.(*Server).Serve(0xc0001220f0, {0x5640ed99e680, 0xc00013e1c0})
Dec 5 10:29:24 localhost ollama: net/http/server.go:3260 +0x33e fp=0xc000029c00 sp=0xc000029ad0 pc=0x5640ed637abe
Dec 5 10:29:24 localhost ollama: main.main()
Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:954 +0xfec fp=0xc000029f50 sp=0xc000029c00 pc=0x5640ed663ecc
Dec 5 10:29:24 localhost ollama: runtime.main()
Dec 5 10:29:24 localhost ollama: runtime/proc.go:271 +0x29d fp=0xc000029fe0 sp=0xc000029f50 pc=0x5640ed41cbdd
Dec 5 10:29:24 localhost ollama: runtime.goexit({})
Dec 5 10:29:24 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc000029fe8 sp=0xc000029fe0 pc=0x5640ed44ede1
Dec 5 10:29:24 localhost ollama: goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
Dec 5 10:29:24 localhost ollama: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Dec 5 10:29:24 localhost ollama: runtime/proc.go:402 +0xce fp=0xc000050fa8 sp=0xc000050f88 pc=0x5640ed41d00e
Dec 5 10:29:24 localhost ollama: runtime.goparkunlock(...)
Dec 5 10:29:24 localhost ollama: runtime/proc.go:408
Dec 5 10:29:24 localhost ollama: runtime.forcegchelper()
Dec 5 10:29:24 localhost ollama: runtime/proc.go:326 +0xb8 fp=0xc000050fe0 sp=0xc000050fa8 pc=0x5640ed41ce98
Dec 5 10:29:24 localhost ollama: runtime.goexit({})
Dec 5 10:29:24 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc000050fe8 sp=0xc000050fe0 pc=0x5640ed44ede1
Dec 5 10:29:24 localhost ollama: created by runtime.init.6 in goroutine 1
Dec 5 10:29:24 localhost ollama: runtime/proc.go:314 +0x1a
Dec 5 10:29:24 localhost ollama: goroutine 18 gp=0xc00008a380 m=nil [GC sweep wait]:
Dec 5 10:29:24 localhost ollama: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Dec 5 10:29:24 localhost ollama: runtime/proc.go:402 +0xce fp=0xc00004c780 sp=0xc00004c760 pc=0x5640ed41d00e
Dec 5 10:29:24 localhost ollama: runtime.goparkunlock(...)
Dec 5 10:29:24 localhost ollama: runtime/proc.go:408
Dec 5 10:29:24 localhost ollama: runtime.bgsweep(0xc000096000)
Dec 5 10:29:24 localhost ollama: runtime/mgcsweep.go:278 +0x94 fp=0xc00004c7c8 sp=0xc00004c780 pc=0x5640ed407b54
Dec 5 10:29:24 localhost ollama: runtime.gcenable.gowrap1()
Dec 5 10:29:24 localhost ollama: runtime/mgc.go:203 +0x25 fp=0xc00004c7e0 sp=0xc00004c7c8 pc=0x5640ed3fc685
Dec 5 10:29:24 localhost ollama: runtime.goexit({})
Dec 5 10:29:25 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc00004c7e8 sp=0xc00004c7e0 pc=0x5640ed44ede1
Dec 5 10:29:25 localhost ollama: created by runtime.gcenable in goroutine 1
Dec 5 10:29:25 localhost ollama: runtime/mgc.go:203 +0x66
Dec 5 10:29:25 localhost ollama: goroutine 19 gp=0xc00008a540 m=nil [GC scavenge wait]:
Dec 5 10:29:25 localhost ollama: runtime.gopark(0xc000096000?, 0x5640ed8a02b0?, 0x1?, 0x0?, 0xc00008a540?)
Dec 5 10:29:25 localhost ollama: runtime/proc.go:402 +0xce fp=0xc00004cf78 sp=0xc00004cf58 pc=0x5640ed41d00e
Dec 5 10:29:25 localhost ollama: runtime.goparkunlock(...)
Dec 5 10:29:25 localhost ollama: runtime/proc.go:408
Dec 5 10:29:25 localhost ollama: runtime.(*scavengerState).park(0x5640edb6d540)
Dec 5 10:29:25 localhost ollama: runtime/mgcscavenge.go:425 +0x49 fp=0xc00004cfa8 sp=0xc00004cf78 pc=0x5640ed405549
Dec 5 10:29:25 localhost ollama: runtime.bgscavenge(0xc000096000)
Dec 5 10:29:25 localhost ollama: runtime/mgcscavenge.go:653 +0x3c fp=0xc00004cfc8 sp=0xc00004cfa8 pc=0x5640ed405adc
Dec 5 10:29:25 localhost ollama: runtime.gcenable.gowrap2()
Dec 5 10:29:25 localhost ollama: runtime/mgc.go:204 +0x25 fp=0xc00004cfe0 sp=0xc00004cfc8 pc=0x5640ed3fc625
Dec 5 10:29:25 localhost ollama: runtime.goexit({})
Dec 5 10:29:25 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc00004cfe8 sp=0xc00004cfe0 pc=0x5640ed44ede1
Dec 5 10:29:25 localhost ollama: created by runtime.gcenable in goroutine 1
Dec 5 10:29:25 localhost ollama: runtime/mgc.go:204 +0xa5
Dec 5 10:29:25 localhost ollama: goroutine 34 gp=0xc000104380 m=nil [finalizer wait]:
Dec 5 10:29:25 localhost ollama: runtime.gopark(0xc000050648?, 0x5640ed3eff85?, 0xa8?, 0x1?, 0xc0000061c0?)
Dec 5 10:29:25 localhost ollama: runtime/proc.go:402 +0xce fp=0xc000050620 sp=0xc000050600 pc=0x5640ed41d00e
Dec 5 10:29:25 localhost ollama: runtime.runfinq()
Dec 5 10:29:25 localhost ollama: runtime/mfinal.go:194 +0x107 fp=0xc0000507e0 sp=0xc000050620 pc=0x5640ed3fb6c7
Dec 5 10:29:25 localhost ollama: runtime.goexit({})
Dec 5 10:29:25 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000507e8 sp=0xc0000507e0 pc=0x5640ed44ede1
Dec 5 10:29:25 localhost ollama: created by runtime.createfing in goroutine 1
Dec 5 10:29:25 localhost ollama: runtime/mfinal.go:164 +0x3d
Dec 5 10:29:25 localhost ollama: goroutine 32 gp=0xc000104540 m=nil [IO wait]:
Dec 5 10:29:25 localhost ollama: runtime.gopark(0x10?, 0x10?, 0xf0?, 0x5d?, 0xb?)
Dec 5 10:29:25 localhost ollama: runtime/proc.go:402 +0xce fp=0xc000185da8 sp=0xc000185d88 pc=0x5640ed41d00e
Dec 5 10:29:25 localhost ollama: runtime.netpollblock(0x5640ed483558?, 0xed3e5b26?, 0x40?)
Dec 5 10:29:25 localhost ollama: runtime/netpoll.go:573 +0xf7 fp=0xc000185de0 sp=0xc000185da8 pc=0x5640ed415257
Dec 5 10:29:25 localhost ollama: internal/poll.runtime_pollWait(0x7f74975fee28, 0x72)
Dec 5 10:29:25 localhost ollama: runtime/netpoll.go:345 +0x85 fp=0xc000185e00 sp=0xc000185de0 pc=0x5640ed449aa5
Dec 5 10:29:25 localhost ollama: internal/poll.(*pollDesc).wait(0xc000174100?, 0xc000114ee1?, 0x0)
Dec 5 10:29:25 localhost ollama: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000185e28 sp=0xc000185e00 pc=0x5640ed4999c7
Dec 5 10:29:25 localhost ollama: internal/poll.(*pollDesc).waitRead(...)
Dec 5 10:29:25 localhost ollama: internal/poll/fd_poll_runtime.go:89
Dec 5 10:29:25 localhost ollama: internal/poll.(*FD).Read(0xc000174100, {0xc000114ee1, 0x1, 0x1})
Dec 5 10:29:25 localhost ollama: internal/poll/fd_unix.go:164 +0x27a fp=0xc000185ec0 sp=0xc000185e28 pc=0x5640ed49a51a
Dec 5 10:29:25 localhost ollama: net.(*netFD).Read(0xc000174100, {0xc000114ee1?, 0xc000185f48?, 0x5640ed44b6d0?})
Dec 5 10:29:25 localhost ollama: net/fd_posix.go:55 +0x25 fp=0xc000185f08 sp=0xc000185ec0 pc=0x5640ed508905
Dec 5 10:29:25 localhost ollama: net.(*conn).Read(0xc000112090, {0xc000114ee1?, 0x0?, 0x5640edc560e0?})
Dec 5 10:29:25 localhost ollama: net/net.go:185 +0x45 fp=0xc000185f50 sp=0xc000185f08 pc=0x5640ed512bc5
Dec 5 10:29:25 localhost ollama: net.(*TCPConn).Read(0x5640edb2e870?, {0xc000114ee1?, 0x0?, 0x0?})
Dec 5 10:29:25 localhost ollama: :1 +0x25 fp=0xc000185f80 sp=0xc000185f50 pc=0x5640ed51e5a5
Dec 5 10:29:25 localhost ollama: net/http.(*connReader).backgroundRead(0xc000114ed0)
Dec 5 10:29:25 localhost ollama: net/http/server.go:681 +0x37 fp=0xc000185fc8 sp=0xc000185f80 pc=0x5640ed62d437
Dec 5 10:29:25 localhost ollama: net/http.(*connReader).startBackgroundRead.gowrap2()
Dec 5 10:29:25 localhost ollama: net/http/server.go:677 +0x25 fp=0xc000185fe0 sp=0xc000185fc8 pc=0x5640ed62d365
Dec 5 10:29:25 localhost ollama: runtime.goexit({})
Dec 5 10:29:25 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc000185fe8 sp=0xc000185fe0 pc=0x5640ed44ede1
Dec 5 10:29:25 localhost ollama: created by net/http.(*connReader).startBackgroundRead in goroutine 37
Dec 5 10:29:25 localhost ollama: net/http/server.go:677 +0xba
Dec 5 10:29:25 localhost ollama: goroutine 37 gp=0xc0001048c0 m=nil [select]:
Dec 5 10:29:25 localhost ollama: runtime.gopark(0xc0000d9a48?, 0x2?, 0xd8?, 0x96?, 0xc0000d97ec?)
Dec 5 10:29:25 localhost ollama: runtime/proc.go:402 +0xce fp=0xc0000d9658 sp=0xc0000d9638 pc=0x5640ed41d00e
Dec 5 10:29:25 localhost ollama: runtime.selectgo(0xc0000d9a48, 0xc0000d97e8, 0xc0002b0000?, 0x0, 0x1?, 0x1)
Dec 5 10:29:25 localhost ollama: runtime/select.go:327 +0x725 fp=0xc0000d9778 sp=0xc0000d9658 pc=0x5640ed42e3e5
Dec 5 10:29:25 localhost ollama: main.(*Server).completion(0xc00013c120, {0x5640ed99e830, 0xc0000aca80}, 0xc0000a2d80)
Dec 5 10:29:25 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:679 +0xa45 fp=0xc0000d9ab8 sp=0xc0000d9778 pc=0x5640ed6618e5
Dec 5 10:29:25 localhost ollama: main.(*Server).completion-fm({0x5640ed99e830?, 0xc0000aca80?}, 0x5640ed63bded?)
Dec 5 10:29:25 localhost ollama: :1 +0x36 fp=0xc0000d9ae8 sp=0xc0000d9ab8 pc=0x5640ed664936
Dec 5 10:29:25 localhost ollama: net/http.HandlerFunc.ServeHTTP(0xc000116d00?, {0x5640ed99e830?, 0xc0000aca80?}, 0x10?)
Dec 5 10:29:25 localhost ollama: net/http/server.go:2171 +0x29 fp=0xc0000d9b10 sp=0xc0000d9ae8 pc=0x5640ed634889
Dec 5 10:29:25 localhost ollama: net/http.(*ServeMux).ServeHTTP(0x5640ed3eff85?, {0x5640ed99e830, 0xc0000aca80}, 0xc0000a2d80)
Dec 5 10:29:25 localhost ollama: net/http/server.go:2688 +0x1ad fp=0xc0000d9b60 sp=0xc0000d9b10 pc=0x5640ed63670d
Dec 5 10:29:25 localhost ollama: net/http.serverHandler.ServeHTTP({0x5640ed99db80?}, {0x5640ed99e830?, 0xc0000aca80?}, 0x6?)
Dec 5 10:29:25 localhost ollama: net/http/server.go:3142 +0x8e fp=0xc0000d9b90 sp=0xc0000d9b60 pc=0x5640ed63772e
Dec 5 10:29:25 localhost ollama: net/http.(*conn).serve(0xc00013c1b0, {0x5640ed99ec88, 0xc000114db0})
Dec 5 10:29:25 localhost ollama: net/http/server.go:2044 +0x5e8 fp=0xc0000d9fb8 sp=0xc0000d9b90 pc=0x5640ed6334c8
Dec 5 10:29:25 localhost ollama: net/http.(*Server).Serve.gowrap3()
Dec 5 10:29:25 localhost ollama: net/http/server.go:3290 +0x28 fp=0xc0000d9fe0 sp=0xc0000d9fb8 pc=0x5640ed637ea8
Dec 5 10:29:25 localhost ollama: runtime.goexit({})
Dec 5 10:29:25 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000d9fe8 sp=0xc0000d9fe0 pc=0x5640ed44ede1
Dec 5 10:29:25 localhost ollama: created by net/http.(*Server).Serve in goroutine 1
Dec 5 10:29:25 localhost ollama: net/http/server.go:3290 +0x4b4
Dec 5 10:29:25 localhost ollama: rax 0x7f74600fc0e0
Dec 5 10:29:25 localhost ollama: rbx 0x7f749864b7b0
Dec 5 10:29:25 localhost ollama: rcx 0x7f74600fc0e0
Dec 5 10:29:25 localhost ollama: rdx 0x7f74e0682a00
Dec 5 10:29:25 localhost ollama: rdi 0x7f74600fc0e0
Dec 5 10:29:25 localhost ollama: rsi 0x7f74359ca7ca
Dec 5 10:29:25 localhost ollama: rbp 0x7f749864b700
Dec 5 10:29:25 localhost ollama: rsp 0x7f749864b6a8
Dec 5 10:29:25 localhost ollama: r8 0x4
Dec 5 10:29:25 localhost ollama: r9 0x4c
Dec 5 10:29:25 localhost ollama: r10 0x0
Dec 5 10:29:25 localhost ollama: r11 0x7f74e06b4750
Dec 5 10:29:25 localhost ollama: r12 0x7f7468296fd0
Dec 5 10:29:25 localhost ollama: r13 0x7f7468297910
Dec 5 10:29:25 localhost ollama: r14 0x7f74682970d0
Dec 5 10:29:25 localhost ollama: r15 0x7f746851a1e0
Dec 5 10:29:25 localhost ollama: rip 0x7f74e0682a00
Dec 5 10:29:25 localhost ollama: rflags 0x10287
Dec 5 10:29:25 localhost ollama: cs 0x33
Dec 5 10:29:25 localhost ollama: fs 0x0
Dec 5 10:29:25 localhost ollama: gs 0x0
Dec 5 10:29:25 localhost ollama: [GIN] 2024/12/05 - 10:29:25 | 200 | 2.175286662s | 127.0.0.1 | POST "/api/chat"
Dec 5 10:30:02 localhost systemd: Started Session 306 of user root.
Dec 5 10:34:30 localhost ollama: time=2024-12-05T10:34:30.066+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.032960268 model=/data/ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29
Dec 5 10:34:30 localhost ollama: time=2024-12-05T10:34:30.316+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.2833089730000005 model=/data/ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29
Dec 5 10:34:30 localhost ollama: time=2024-12-05T10:34:30.565+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.532611974 model=/data/ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29

Originally created by @datamg-star on GitHub (Dec 5, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7941 [root@localhost data]# ollama run llama3.1:8b >>> a It looks likeError: an error was encountered while running the model: unexpected EOF tail -200 /var/log/messages Dec 5 10:29:10 localhost ollama: Device 0: NVIDIA A800-SXM4-40GB, compute capability 8.0, VMM: yes Dec 5 10:29:10 localhost ollama: llm_load_tensors: ggml ctx size = 0.27 MiB Dec 5 10:29:11 localhost ollama: llm_load_tensors: offloading 32 repeating layers to GPU Dec 5 10:29:11 localhost ollama: llm_load_tensors: offloading non-repeating layers to GPU Dec 5 10:29:11 localhost ollama: llm_load_tensors: offloaded 33/33 layers to GPU Dec 5 10:29:11 localhost ollama: llm_load_tensors: CPU buffer size = 281.81 MiB Dec 5 10:29:11 localhost ollama: llm_load_tensors: CUDA0 buffer size = 4403.50 MiB Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: n_ctx = 8192 Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: n_batch = 2048 Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: n_ubatch = 512 Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: flash_attn = 0 Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: freq_base = 500000.0 Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: freq_scale = 1 Dec 5 10:29:16 localhost ollama: llama_kv_cache_init: CUDA0 KV buffer size = 1024.00 MiB Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: CUDA_Host output buffer size = 2.02 MiB Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: CUDA0 compute buffer size = 560.00 MiB Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: CUDA_Host compute buffer size = 24.01 MiB Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: graph nodes = 1030 Dec 5 10:29:16 localhost ollama: llama_new_context_with_model: graph splits = 2 Dec 5 10:29:16 localhost ollama: time=2024-12-05T10:29:16.838+08:00 level=INFO source=server.go:601 msg="llama runner started in 13.05 seconds" Dec 5 10:29:16 localhost ollama: [GIN] 2024/12/05 - 10:29:16 | 200 | 13.185062523s | 127.0.0.1 | POST "/api/generate" Dec 5 10:29:24 localhost ollama: SIGSEGV: segmentation violation Dec 5 10:29:24 localhost ollama: PC=0x7f74e0682a00 m=4 sigcode=1 addr=0x7f74359ca7ca Dec 5 10:29:24 localhost ollama: signal arrived during cgo execution Dec 5 10:29:24 localhost ollama: goroutine 36 gp=0xc000104700 m=4 mp=0xc000057808 [syscall]: Dec 5 10:29:24 localhost ollama: runtime.cgocall(0x5640ed665110, 0xc0002a8b48) Dec 5 10:29:24 localhost ollama: runtime/cgocall.go:157 +0x4b fp=0xc0002a8b20 sp=0xc0002a8ae8 pc=0x5640ed3e63cb Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7f74893e78e0, {0x1, 0x7f74887e89a0, 0x0, 0x0, 0x7f74887ea9b0, 0x7f74887ec9c0, 0x7f74887ee9d0, 0x7f7488802480, 0x0, ...}) Dec 5 10:29:24 localhost ollama: _cgo_gotypes.go:543 +0x52 fp=0xc0002a8b48 sp=0xc0002a8b20 pc=0x5640ed4e3952 Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama.(*Context).Decode.func1(0x5640ed660e0b?, 0x7f74893e78e0?) Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/llama.go:167 +0xd8 fp=0xc0002a8c68 sp=0xc0002a8b48 pc=0x5640ed4e5f78 Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama.(*Context).Decode(0x5640edc560e0?, 0x0?) Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/llama.go:167 +0x13 fp=0xc0002a8cb0 sp=0xc0002a8c68 pc=0x5640ed4e5e13 Dec 5 10:29:24 localhost ollama: main.(*Server).processBatch(0xc00013c120, 0xc0002ac000, 0xc0002a8f10) Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:425 +0x24d fp=0xc0002a8ed0 sp=0xc0002a8cb0 pc=0x5640ed65facd Dec 5 10:29:24 localhost ollama: main.(*Server).run(0xc00013c120, {0x5640ed99ecc0, 0xc00017a050}) Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:333 +0x1e5 fp=0xc0002a8fb8 sp=0xc0002a8ed0 pc=0x5640ed65f545 Dec 5 10:29:24 localhost ollama: main.main.gowrap2() Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:934 +0x28 fp=0xc0002a8fe0 sp=0xc0002a8fb8 pc=0x5640ed664148 Dec 5 10:29:24 localhost ollama: runtime.goexit({}) Dec 5 10:29:24 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc0002a8fe8 sp=0xc0002a8fe0 pc=0x5640ed44ede1 Dec 5 10:29:24 localhost ollama: created by main.main in goroutine 1 Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:934 +0xc52 Dec 5 10:29:24 localhost ollama: goroutine 1 gp=0xc0000061c0 m=nil [IO wait]: Dec 5 10:29:24 localhost ollama: runtime.gopark(0xc000034a08?, 0x0?, 0xc0?, 0x61?, 0xc0000298b8?) Dec 5 10:29:24 localhost ollama: runtime/proc.go:402 +0xce fp=0xc000029880 sp=0xc000029860 pc=0x5640ed41d00e Dec 5 10:29:24 localhost ollama: runtime.netpollblock(0xc000029918?, 0xed3e5b26?, 0x40?) Dec 5 10:29:24 localhost ollama: runtime/netpoll.go:573 +0xf7 fp=0xc0000298b8 sp=0xc000029880 pc=0x5640ed415257 Dec 5 10:29:24 localhost ollama: internal/poll.runtime_pollWait(0x7f74975fef20, 0x72) Dec 5 10:29:24 localhost ollama: runtime/netpoll.go:345 +0x85 fp=0xc0000298d8 sp=0xc0000298b8 pc=0x5640ed449aa5 Dec 5 10:29:24 localhost ollama: internal/poll.(*pollDesc).wait(0x3?, 0x3fe?, 0x0) Dec 5 10:29:24 localhost ollama: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000029900 sp=0xc0000298d8 pc=0x5640ed4999c7 Dec 5 10:29:24 localhost ollama: internal/poll.(*pollDesc).waitRead(...) Dec 5 10:29:24 localhost ollama: internal/poll/fd_poll_runtime.go:89 Dec 5 10:29:24 localhost ollama: internal/poll.(*FD).Accept(0xc000174080) Dec 5 10:29:24 localhost ollama: internal/poll/fd_unix.go:611 +0x2ac fp=0xc0000299a8 sp=0xc000029900 pc=0x5640ed49ae8c Dec 5 10:29:24 localhost ollama: net.(*netFD).accept(0xc000174080) Dec 5 10:29:24 localhost ollama: net/fd_unix.go:172 +0x29 fp=0xc000029a60 sp=0xc0000299a8 pc=0x5640ed509a09 Dec 5 10:29:24 localhost ollama: net.(*TCPListener).accept(0xc00013e1c0) Dec 5 10:29:24 localhost ollama: net/tcpsock_posix.go:159 +0x1e fp=0xc000029a88 sp=0xc000029a60 pc=0x5640ed51a73e Dec 5 10:29:24 localhost ollama: net.(*TCPListener).Accept(0xc00013e1c0) Dec 5 10:29:24 localhost ollama: net/tcpsock.go:327 +0x30 fp=0xc000029ab8 sp=0xc000029a88 pc=0x5640ed519a90 Dec 5 10:29:24 localhost ollama: net/http.(*onceCloseListener).Accept(0xc00013c1b0?) Dec 5 10:29:24 localhost ollama: <autogenerated>:1 +0x24 fp=0xc000029ad0 sp=0xc000029ab8 pc=0x5640ed640ca4 Dec 5 10:29:24 localhost ollama: net/http.(*Server).Serve(0xc0001220f0, {0x5640ed99e680, 0xc00013e1c0}) Dec 5 10:29:24 localhost ollama: net/http/server.go:3260 +0x33e fp=0xc000029c00 sp=0xc000029ad0 pc=0x5640ed637abe Dec 5 10:29:24 localhost ollama: main.main() Dec 5 10:29:24 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:954 +0xfec fp=0xc000029f50 sp=0xc000029c00 pc=0x5640ed663ecc Dec 5 10:29:24 localhost ollama: runtime.main() Dec 5 10:29:24 localhost ollama: runtime/proc.go:271 +0x29d fp=0xc000029fe0 sp=0xc000029f50 pc=0x5640ed41cbdd Dec 5 10:29:24 localhost ollama: runtime.goexit({}) Dec 5 10:29:24 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc000029fe8 sp=0xc000029fe0 pc=0x5640ed44ede1 Dec 5 10:29:24 localhost ollama: goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: Dec 5 10:29:24 localhost ollama: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Dec 5 10:29:24 localhost ollama: runtime/proc.go:402 +0xce fp=0xc000050fa8 sp=0xc000050f88 pc=0x5640ed41d00e Dec 5 10:29:24 localhost ollama: runtime.goparkunlock(...) Dec 5 10:29:24 localhost ollama: runtime/proc.go:408 Dec 5 10:29:24 localhost ollama: runtime.forcegchelper() Dec 5 10:29:24 localhost ollama: runtime/proc.go:326 +0xb8 fp=0xc000050fe0 sp=0xc000050fa8 pc=0x5640ed41ce98 Dec 5 10:29:24 localhost ollama: runtime.goexit({}) Dec 5 10:29:24 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc000050fe8 sp=0xc000050fe0 pc=0x5640ed44ede1 Dec 5 10:29:24 localhost ollama: created by runtime.init.6 in goroutine 1 Dec 5 10:29:24 localhost ollama: runtime/proc.go:314 +0x1a Dec 5 10:29:24 localhost ollama: goroutine 18 gp=0xc00008a380 m=nil [GC sweep wait]: Dec 5 10:29:24 localhost ollama: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Dec 5 10:29:24 localhost ollama: runtime/proc.go:402 +0xce fp=0xc00004c780 sp=0xc00004c760 pc=0x5640ed41d00e Dec 5 10:29:24 localhost ollama: runtime.goparkunlock(...) Dec 5 10:29:24 localhost ollama: runtime/proc.go:408 Dec 5 10:29:24 localhost ollama: runtime.bgsweep(0xc000096000) Dec 5 10:29:24 localhost ollama: runtime/mgcsweep.go:278 +0x94 fp=0xc00004c7c8 sp=0xc00004c780 pc=0x5640ed407b54 Dec 5 10:29:24 localhost ollama: runtime.gcenable.gowrap1() Dec 5 10:29:24 localhost ollama: runtime/mgc.go:203 +0x25 fp=0xc00004c7e0 sp=0xc00004c7c8 pc=0x5640ed3fc685 Dec 5 10:29:24 localhost ollama: runtime.goexit({}) Dec 5 10:29:25 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc00004c7e8 sp=0xc00004c7e0 pc=0x5640ed44ede1 Dec 5 10:29:25 localhost ollama: created by runtime.gcenable in goroutine 1 Dec 5 10:29:25 localhost ollama: runtime/mgc.go:203 +0x66 Dec 5 10:29:25 localhost ollama: goroutine 19 gp=0xc00008a540 m=nil [GC scavenge wait]: Dec 5 10:29:25 localhost ollama: runtime.gopark(0xc000096000?, 0x5640ed8a02b0?, 0x1?, 0x0?, 0xc00008a540?) Dec 5 10:29:25 localhost ollama: runtime/proc.go:402 +0xce fp=0xc00004cf78 sp=0xc00004cf58 pc=0x5640ed41d00e Dec 5 10:29:25 localhost ollama: runtime.goparkunlock(...) Dec 5 10:29:25 localhost ollama: runtime/proc.go:408 Dec 5 10:29:25 localhost ollama: runtime.(*scavengerState).park(0x5640edb6d540) Dec 5 10:29:25 localhost ollama: runtime/mgcscavenge.go:425 +0x49 fp=0xc00004cfa8 sp=0xc00004cf78 pc=0x5640ed405549 Dec 5 10:29:25 localhost ollama: runtime.bgscavenge(0xc000096000) Dec 5 10:29:25 localhost ollama: runtime/mgcscavenge.go:653 +0x3c fp=0xc00004cfc8 sp=0xc00004cfa8 pc=0x5640ed405adc Dec 5 10:29:25 localhost ollama: runtime.gcenable.gowrap2() Dec 5 10:29:25 localhost ollama: runtime/mgc.go:204 +0x25 fp=0xc00004cfe0 sp=0xc00004cfc8 pc=0x5640ed3fc625 Dec 5 10:29:25 localhost ollama: runtime.goexit({}) Dec 5 10:29:25 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc00004cfe8 sp=0xc00004cfe0 pc=0x5640ed44ede1 Dec 5 10:29:25 localhost ollama: created by runtime.gcenable in goroutine 1 Dec 5 10:29:25 localhost ollama: runtime/mgc.go:204 +0xa5 Dec 5 10:29:25 localhost ollama: goroutine 34 gp=0xc000104380 m=nil [finalizer wait]: Dec 5 10:29:25 localhost ollama: runtime.gopark(0xc000050648?, 0x5640ed3eff85?, 0xa8?, 0x1?, 0xc0000061c0?) Dec 5 10:29:25 localhost ollama: runtime/proc.go:402 +0xce fp=0xc000050620 sp=0xc000050600 pc=0x5640ed41d00e Dec 5 10:29:25 localhost ollama: runtime.runfinq() Dec 5 10:29:25 localhost ollama: runtime/mfinal.go:194 +0x107 fp=0xc0000507e0 sp=0xc000050620 pc=0x5640ed3fb6c7 Dec 5 10:29:25 localhost ollama: runtime.goexit({}) Dec 5 10:29:25 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000507e8 sp=0xc0000507e0 pc=0x5640ed44ede1 Dec 5 10:29:25 localhost ollama: created by runtime.createfing in goroutine 1 Dec 5 10:29:25 localhost ollama: runtime/mfinal.go:164 +0x3d Dec 5 10:29:25 localhost ollama: goroutine 32 gp=0xc000104540 m=nil [IO wait]: Dec 5 10:29:25 localhost ollama: runtime.gopark(0x10?, 0x10?, 0xf0?, 0x5d?, 0xb?) Dec 5 10:29:25 localhost ollama: runtime/proc.go:402 +0xce fp=0xc000185da8 sp=0xc000185d88 pc=0x5640ed41d00e Dec 5 10:29:25 localhost ollama: runtime.netpollblock(0x5640ed483558?, 0xed3e5b26?, 0x40?) Dec 5 10:29:25 localhost ollama: runtime/netpoll.go:573 +0xf7 fp=0xc000185de0 sp=0xc000185da8 pc=0x5640ed415257 Dec 5 10:29:25 localhost ollama: internal/poll.runtime_pollWait(0x7f74975fee28, 0x72) Dec 5 10:29:25 localhost ollama: runtime/netpoll.go:345 +0x85 fp=0xc000185e00 sp=0xc000185de0 pc=0x5640ed449aa5 Dec 5 10:29:25 localhost ollama: internal/poll.(*pollDesc).wait(0xc000174100?, 0xc000114ee1?, 0x0) Dec 5 10:29:25 localhost ollama: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000185e28 sp=0xc000185e00 pc=0x5640ed4999c7 Dec 5 10:29:25 localhost ollama: internal/poll.(*pollDesc).waitRead(...) Dec 5 10:29:25 localhost ollama: internal/poll/fd_poll_runtime.go:89 Dec 5 10:29:25 localhost ollama: internal/poll.(*FD).Read(0xc000174100, {0xc000114ee1, 0x1, 0x1}) Dec 5 10:29:25 localhost ollama: internal/poll/fd_unix.go:164 +0x27a fp=0xc000185ec0 sp=0xc000185e28 pc=0x5640ed49a51a Dec 5 10:29:25 localhost ollama: net.(*netFD).Read(0xc000174100, {0xc000114ee1?, 0xc000185f48?, 0x5640ed44b6d0?}) Dec 5 10:29:25 localhost ollama: net/fd_posix.go:55 +0x25 fp=0xc000185f08 sp=0xc000185ec0 pc=0x5640ed508905 Dec 5 10:29:25 localhost ollama: net.(*conn).Read(0xc000112090, {0xc000114ee1?, 0x0?, 0x5640edc560e0?}) Dec 5 10:29:25 localhost ollama: net/net.go:185 +0x45 fp=0xc000185f50 sp=0xc000185f08 pc=0x5640ed512bc5 Dec 5 10:29:25 localhost ollama: net.(*TCPConn).Read(0x5640edb2e870?, {0xc000114ee1?, 0x0?, 0x0?}) Dec 5 10:29:25 localhost ollama: <autogenerated>:1 +0x25 fp=0xc000185f80 sp=0xc000185f50 pc=0x5640ed51e5a5 Dec 5 10:29:25 localhost ollama: net/http.(*connReader).backgroundRead(0xc000114ed0) Dec 5 10:29:25 localhost ollama: net/http/server.go:681 +0x37 fp=0xc000185fc8 sp=0xc000185f80 pc=0x5640ed62d437 Dec 5 10:29:25 localhost ollama: net/http.(*connReader).startBackgroundRead.gowrap2() Dec 5 10:29:25 localhost ollama: net/http/server.go:677 +0x25 fp=0xc000185fe0 sp=0xc000185fc8 pc=0x5640ed62d365 Dec 5 10:29:25 localhost ollama: runtime.goexit({}) Dec 5 10:29:25 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc000185fe8 sp=0xc000185fe0 pc=0x5640ed44ede1 Dec 5 10:29:25 localhost ollama: created by net/http.(*connReader).startBackgroundRead in goroutine 37 Dec 5 10:29:25 localhost ollama: net/http/server.go:677 +0xba Dec 5 10:29:25 localhost ollama: goroutine 37 gp=0xc0001048c0 m=nil [select]: Dec 5 10:29:25 localhost ollama: runtime.gopark(0xc0000d9a48?, 0x2?, 0xd8?, 0x96?, 0xc0000d97ec?) Dec 5 10:29:25 localhost ollama: runtime/proc.go:402 +0xce fp=0xc0000d9658 sp=0xc0000d9638 pc=0x5640ed41d00e Dec 5 10:29:25 localhost ollama: runtime.selectgo(0xc0000d9a48, 0xc0000d97e8, 0xc0002b0000?, 0x0, 0x1?, 0x1) Dec 5 10:29:25 localhost ollama: runtime/select.go:327 +0x725 fp=0xc0000d9778 sp=0xc0000d9658 pc=0x5640ed42e3e5 Dec 5 10:29:25 localhost ollama: main.(*Server).completion(0xc00013c120, {0x5640ed99e830, 0xc0000aca80}, 0xc0000a2d80) Dec 5 10:29:25 localhost ollama: github.com/ollama/ollama/llama/runner/runner.go:679 +0xa45 fp=0xc0000d9ab8 sp=0xc0000d9778 pc=0x5640ed6618e5 Dec 5 10:29:25 localhost ollama: main.(*Server).completion-fm({0x5640ed99e830?, 0xc0000aca80?}, 0x5640ed63bded?) Dec 5 10:29:25 localhost ollama: <autogenerated>:1 +0x36 fp=0xc0000d9ae8 sp=0xc0000d9ab8 pc=0x5640ed664936 Dec 5 10:29:25 localhost ollama: net/http.HandlerFunc.ServeHTTP(0xc000116d00?, {0x5640ed99e830?, 0xc0000aca80?}, 0x10?) Dec 5 10:29:25 localhost ollama: net/http/server.go:2171 +0x29 fp=0xc0000d9b10 sp=0xc0000d9ae8 pc=0x5640ed634889 Dec 5 10:29:25 localhost ollama: net/http.(*ServeMux).ServeHTTP(0x5640ed3eff85?, {0x5640ed99e830, 0xc0000aca80}, 0xc0000a2d80) Dec 5 10:29:25 localhost ollama: net/http/server.go:2688 +0x1ad fp=0xc0000d9b60 sp=0xc0000d9b10 pc=0x5640ed63670d Dec 5 10:29:25 localhost ollama: net/http.serverHandler.ServeHTTP({0x5640ed99db80?}, {0x5640ed99e830?, 0xc0000aca80?}, 0x6?) Dec 5 10:29:25 localhost ollama: net/http/server.go:3142 +0x8e fp=0xc0000d9b90 sp=0xc0000d9b60 pc=0x5640ed63772e Dec 5 10:29:25 localhost ollama: net/http.(*conn).serve(0xc00013c1b0, {0x5640ed99ec88, 0xc000114db0}) Dec 5 10:29:25 localhost ollama: net/http/server.go:2044 +0x5e8 fp=0xc0000d9fb8 sp=0xc0000d9b90 pc=0x5640ed6334c8 Dec 5 10:29:25 localhost ollama: net/http.(*Server).Serve.gowrap3() Dec 5 10:29:25 localhost ollama: net/http/server.go:3290 +0x28 fp=0xc0000d9fe0 sp=0xc0000d9fb8 pc=0x5640ed637ea8 Dec 5 10:29:25 localhost ollama: runtime.goexit({}) Dec 5 10:29:25 localhost ollama: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000d9fe8 sp=0xc0000d9fe0 pc=0x5640ed44ede1 Dec 5 10:29:25 localhost ollama: created by net/http.(*Server).Serve in goroutine 1 Dec 5 10:29:25 localhost ollama: net/http/server.go:3290 +0x4b4 Dec 5 10:29:25 localhost ollama: rax 0x7f74600fc0e0 Dec 5 10:29:25 localhost ollama: rbx 0x7f749864b7b0 Dec 5 10:29:25 localhost ollama: rcx 0x7f74600fc0e0 Dec 5 10:29:25 localhost ollama: rdx 0x7f74e0682a00 Dec 5 10:29:25 localhost ollama: rdi 0x7f74600fc0e0 Dec 5 10:29:25 localhost ollama: rsi 0x7f74359ca7ca Dec 5 10:29:25 localhost ollama: rbp 0x7f749864b700 Dec 5 10:29:25 localhost ollama: rsp 0x7f749864b6a8 Dec 5 10:29:25 localhost ollama: r8 0x4 Dec 5 10:29:25 localhost ollama: r9 0x4c Dec 5 10:29:25 localhost ollama: r10 0x0 Dec 5 10:29:25 localhost ollama: r11 0x7f74e06b4750 Dec 5 10:29:25 localhost ollama: r12 0x7f7468296fd0 Dec 5 10:29:25 localhost ollama: r13 0x7f7468297910 Dec 5 10:29:25 localhost ollama: r14 0x7f74682970d0 Dec 5 10:29:25 localhost ollama: r15 0x7f746851a1e0 Dec 5 10:29:25 localhost ollama: rip 0x7f74e0682a00 Dec 5 10:29:25 localhost ollama: rflags 0x10287 Dec 5 10:29:25 localhost ollama: cs 0x33 Dec 5 10:29:25 localhost ollama: fs 0x0 Dec 5 10:29:25 localhost ollama: gs 0x0 Dec 5 10:29:25 localhost ollama: [GIN] 2024/12/05 - 10:29:25 | 200 | 2.175286662s | 127.0.0.1 | POST "/api/chat" Dec 5 10:30:02 localhost systemd: Started Session 306 of user root. Dec 5 10:34:30 localhost ollama: time=2024-12-05T10:34:30.066+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.032960268 model=/data/ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 Dec 5 10:34:30 localhost ollama: time=2024-12-05T10:34:30.316+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.2833089730000005 model=/data/ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 Dec 5 10:34:30 localhost ollama: time=2024-12-05T10:34:30.565+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.532611974 model=/data/ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29
Author
Owner

@Pekkari commented on GitHub (Dec 19, 2024):

I'm facing this error in latest ollama:rocm container image where, when I start a new conversation using open-webui, ollama will crash with the following output:

time=2024-12-19T15:18:05.370Z level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff gpu=0 parallel=4 available=15934570496 required="3.7 GiB"
time=2024-12-19T15:18:05.370Z level=INFO source=server.go:104 msg="system memory" total="46.8 GiB" free="25.3 GiB" free_swap="8.0 GiB"
time=2024-12-19T15:18:05.370Z level=INFO source=memory.go:356 msg="offload to rocm" layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[14.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="3.7 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="2.4 GiB" memory.weights.repeating="2.1 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="424.0 MiB" memory.graph.partial="570.7 MiB"
time=2024-12-19T15:18:05.372Z level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/lib/ollama/runners/rocm_avx/ollama_llama_server runner --model /root/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --threads 8 --parallel 4 --port 46877"
time=2024-12-19T15:18:05.372Z level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2024-12-19T15:18:05.372Z level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
time=2024-12-19T15:18:05.373Z level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
time=2024-12-19T15:18:05.434Z level=INFO source=runner.go:945 msg="starting go runner"
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 11.0, VMM: no
time=2024-12-19T15:18:07.752Z level=INFO source=runner.go:946 msg=system info="ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=8
llama_load_model_from_file: using device ROCm0 (AMD Radeon Graphics) - 23887 MiB free
time=2024-12-19T15:18:07.753Z level=INFO source=.:0 msg="Server listening on 127.0.0.1:46877"
llama_model_loader: loaded meta data with 30 key-value pairs and 255 tensors from /root/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 3B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 3B
llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   8:                          llama.block_count u32              = 28
llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
llama_model_loader: - kv  10:                     llama.embedding_length u32              = 3072
llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 24
llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  17:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  18:                          general.file_type u32              = 15
llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   58 tensors
llama_model_loader: - type q4_K:  168 tensors
llama_model_loader: - type q6_K:   29 tensors
time=2024-12-19T15:18:07.885Z level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 3072
llm_load_print_meta: n_layer          = 28
llm_load_print_meta: n_head           = 24
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 3
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 3B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 3.21 B
llm_load_print_meta: model size       = 1.87 GiB (5.01 BPW) 
llm_load_print_meta: general.name     = Llama 3.2 3B Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token        = 128008 '<|eom_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOG token        = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: offloading 28 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 29/29 layers to GPU
llm_load_tensors:   CPU_Mapped model buffer size =   308.23 MiB
llm_load_tensors:        ROCm0 model buffer size =  1918.35 MiB
SIGSEGV: segmentation violation
PC=0x7f26721a2c2d m=5 sigcode=1 addr=0x18
signal arrived during cgo execution

goroutine 7 gp=0xc0000ee000 m=5 mp=0xc000100008 [syscall]:
runtime.cgocall(0x556f984f0970, 0xc000070b78)
        runtime/cgocall.go:167 +0x4b fp=0xc000070b50 sp=0xc000070b18 pc=0x556f982a4b2b
github.com/ollama/ollama/llama._Cfunc_llama_load_model_from_file(0x7f2530000c30, {0x0, 0x1d, 0x1, 0x0, 0x0, 0x0, 0x556f984f0380, 0xc000182000, 0x0, ...})
        _cgo_gotypes.go:699 +0x50 fp=0xc000070b78 sp=0xc000070b50 pc=0x556f9834f410
github.com/ollama/ollama/llama.LoadModelFromFile.func1({0x7ffff9666daa?, 0xc000060d30?}, {0x0, 0x1d, 0x1, 0x0, 0x0, 0x0, 0x556f984f0380, 0xc000182000, ...})
        github.com/ollama/ollama/llama/llama.go:311 +0x127 fp=0xc000070c78 sp=0xc000070b78 pc=0x556f98352027
github.com/ollama/ollama/llama.LoadModelFromFile({0x7ffff9666daa, 0x62}, {0x1d, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc000022200, ...})
        github.com/ollama/ollama/llama/llama.go:311 +0x2d6 fp=0xc000070dc8 sp=0xc000070c78 pc=0x556f98351d16
github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc0000ba1b0, {0x1d, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc000022200, 0x0}, ...)
        github.com/ollama/ollama/llama/runner/runner.go:859 +0xc5 fp=0xc000070f10 sp=0xc000070dc8 pc=0x556f984edde5
github.com/ollama/ollama/llama/runner.Execute.gowrap1()
        github.com/ollama/ollama/llama/runner/runner.go:979 +0xda fp=0xc000070fe0 sp=0xc000070f10 pc=0x556f984ef73a
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x556f982b2561
created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1
        github.com/ollama/ollama/llama/runner/runner.go:979 +0xd0d

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc0000277b0 sp=0xc000027790 pc=0x556f982aa92e
runtime.netpollblock(0xc0001a3f80?, 0x98243186?, 0x6f?)
        runtime/netpoll.go:575 +0xf7 fp=0xc0000277e8 sp=0xc0000277b0 pc=0x556f9826f697
internal/poll.runtime_pollWait(0x7f25495caf90, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000027808 sp=0xc0000277e8 pc=0x556f982a9c25
internal/poll.(*pollDesc).wait(0xc00002e180?, 0x2c?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000027830 sp=0xc000027808 pc=0x556f982ffa67
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00002e180)
        internal/poll/fd_unix.go:620 +0x295 fp=0xc0000278d8 sp=0xc000027830 pc=0x556f98300fd5
net.(*netFD).accept(0xc00002e180)
        net/fd_unix.go:172 +0x29 fp=0xc000027990 sp=0xc0000278d8 pc=0x556f98379969
net.(*TCPListener).accept(0xc00008e700)
        net/tcpsock_posix.go:159 +0x1e fp=0xc0000279e0 sp=0xc000027990 pc=0x556f98389fbe
net.(*TCPListener).Accept(0xc00008e700)
        net/tcpsock.go:372 +0x30 fp=0xc000027a10 sp=0xc0000279e0 pc=0x556f983892f0
net/http.(*onceCloseListener).Accept(0xc00019e000?)
        <autogenerated>:1 +0x24 fp=0xc000027a28 sp=0xc000027a10 pc=0x556f984c7ec4
net/http.(*Server).Serve(0xc0000e84b0, {0x556f988e4818, 0xc00008e700})
        net/http/server.go:3330 +0x30c fp=0xc000027b58 sp=0xc000027a28 pc=0x556f984b9c0c
github.com/ollama/ollama/llama/runner.Execute({0xc000016110?, 0x556f982b21bc?, 0x0?})
        github.com/ollama/ollama/llama/runner/runner.go:1005 +0x11a9 fp=0xc000027ef8 sp=0xc000027b58 pc=0x556f984ef309
main.main()
        github.com/ollama/ollama/cmd/runner/main.go:11 +0x54 fp=0xc000027f50 sp=0xc000027ef8 pc=0x556f984f0294
runtime.main()
        runtime/proc.go:272 +0x29d fp=0xc000027fe0 sp=0xc000027f50 pc=0x556f98276c7d
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000027fe8 sp=0xc000027fe0 pc=0x556f982b2561

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00005efa8 sp=0xc00005ef88 pc=0x556f982aa92e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.forcegchelper()
        runtime/proc.go:337 +0xb8 fp=0xc00005efe0 sp=0xc00005efa8 pc=0x556f98276fb8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00005efe8 sp=0xc00005efe0 pc=0x556f982b2561
created by runtime.init.7 in goroutine 1
        runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00005f780 sp=0xc00005f760 pc=0x556f982aa92e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.bgsweep(0xc00008c000)
        runtime/mgcsweep.go:277 +0x94 fp=0xc00005f7c8 sp=0xc00005f780 pc=0x556f982617f4
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc00005f7e0 sp=0xc00005f7c8 pc=0x556f982560a5
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00005f7e8 sp=0xc00005f7e0 pc=0x556f982b2561
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0xc00008c000?, 0x556f987c5e60?, 0x1?, 0x0?, 0xc000007340?)
        runtime/proc.go:424 +0xce fp=0xc00005ff78 sp=0xc00005ff58 pc=0x556f982aa92e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.(*scavengerState).park(0x556f98ad0060)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc00005ffa8 sp=0xc00005ff78 pc=0x556f9825f229
runtime.bgscavenge(0xc00008c000)
        runtime/mgcscavenge.go:653 +0x3c fp=0xc00005ffc8 sp=0xc00005ffa8 pc=0x556f9825f79c
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc00005ffe0 sp=0xc00005ffc8 pc=0x556f98256045
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00005ffe8 sp=0xc00005ffe0 pc=0x556f982b2561
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc00005e648?, 0x556f9824c5a5?, 0xb0?, 0x1?, 0xc0000061c0?)
        runtime/proc.go:424 +0xce fp=0xc00005e620 sp=0xc00005e600 pc=0x556f982aa92e
runtime.runfinq()
        runtime/mfinal.go:193 +0x107 fp=0xc00005e7e0 sp=0xc00005e620 pc=0x556f98255127
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00005e7e8 sp=0xc00005e7e0 pc=0x556f982b2561
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:163 +0x3d

goroutine 6 gp=0xc000007dc0 m=nil [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000060718 sp=0xc0000606f8 pc=0x556f982aa92e
runtime.chanrecv(0xc00009a0e0, 0x0, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc000060790 sp=0xc000060718 pc=0x556f98245d7c
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:489 +0x12 fp=0xc0000607b8 sp=0xc000060790 pc=0x556f98245952
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
        runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1784 +0x2f fp=0xc0000607e0 sp=0xc0000607b8 pc=0x556f98258f0f
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000607e8 sp=0xc0000607e0 pc=0x556f982b2561
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1779 +0x96

goroutine 8 gp=0xc0000ee1c0 m=nil [semacquire]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x20?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000061618 sp=0xc0000615f8 pc=0x556f982aa92e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.semacquire1(0xc0000ba1b8, 0x0, 0x1, 0x0, 0x12)
        runtime/sema.go:178 +0x22c fp=0xc000061680 sp=0xc000061618 pc=0x556f98289c4c
sync.runtime_Semacquire(0x0?)
        runtime/sema.go:71 +0x25 fp=0xc0000616b8 sp=0xc000061680 pc=0x556f982abb65
sync.(*WaitGroup).Wait(0x0?)
        sync/waitgroup.go:118 +0x48 fp=0xc0000616e0 sp=0xc0000616b8 pc=0x556f982c7e08
github.com/ollama/ollama/llama/runner.(*Server).run(0xc0000ba1b0, {0x556f988e4e00, 0xc0000ec050})
        github.com/ollama/ollama/llama/runner/runner.go:315 +0x47 fp=0xc0000617b8 sp=0xc0000616e0 pc=0x556f984ea487
github.com/ollama/ollama/llama/runner.Execute.gowrap2()
        github.com/ollama/ollama/llama/runner/runner.go:984 +0x28 fp=0xc0000617e0 sp=0xc0000617b8 pc=0x556f984ef628
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000617e8 sp=0xc0000617e0 pc=0x556f982b2561
created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1
        github.com/ollama/ollama/llama/runner/runner.go:984 +0xde5

goroutine 18 gp=0xc0001a4000 m=nil [IO wait]:
runtime.gopark(0x556f98300d05?, 0xc00019c000?, 0x10?, 0xda?, 0xb?)
        runtime/proc.go:424 +0xce fp=0xc0000dd918 sp=0xc0000dd8f8 pc=0x556f982aa92e
runtime.netpollblock(0x556f982e6158?, 0x98243186?, 0x6f?)
        runtime/netpoll.go:575 +0xf7 fp=0xc0000dd950 sp=0xc0000dd918 pc=0x556f9826f697
internal/poll.runtime_pollWait(0x7f25495cae78, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc0000dd970 sp=0xc0000dd950 pc=0x556f982a9c25
internal/poll.(*pollDesc).wait(0xc00019c000?, 0xc0001aa000?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000dd998 sp=0xc0000dd970 pc=0x556f982ffa67
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00019c000, {0xc0001aa000, 0x1000, 0x1000})
        internal/poll/fd_unix.go:165 +0x27a fp=0xc0000dda30 sp=0xc0000dd998 pc=0x556f983005ba
net.(*netFD).Read(0xc00019c000, {0xc0001aa000?, 0xc0000ddaa0?, 0x556f982fff25?})
        net/fd_posix.go:55 +0x25 fp=0xc0000dda78 sp=0xc0000dda30 pc=0x556f98378885
net.(*conn).Read(0xc000188008, {0xc0001aa000?, 0x0?, 0xc0001860f8?})
        net/net.go:189 +0x45 fp=0xc0000ddac0 sp=0xc0000dda78 pc=0x556f98382285
net.(*TCPConn).Read(0xc0001860f0?, {0xc0001aa000?, 0xc00019c000?, 0xc0000ddaf8?})
        <autogenerated>:1 +0x25 fp=0xc0000ddaf0 sp=0xc0000ddac0 pc=0x556f9838f325
net/http.(*connReader).Read(0xc0001860f0, {0xc0001aa000, 0x1000, 0x1000})
        net/http/server.go:798 +0x14b fp=0xc0000ddb40 sp=0xc0000ddaf0 pc=0x556f984b050b
bufio.(*Reader).fill(0xc000180060)
        bufio/bufio.go:110 +0x103 fp=0xc0000ddb78 sp=0xc0000ddb40 pc=0x556f9846f123
bufio.(*Reader).Peek(0xc000180060, 0x4)
        bufio/bufio.go:148 +0x53 fp=0xc0000ddb98 sp=0xc0000ddb78 pc=0x556f9846f253
net/http.(*conn).serve(0xc00019e000, {0x556f988e4dc8, 0xc0000acf60})
        net/http/server.go:2127 +0x738 fp=0xc0000ddfb8 sp=0xc0000ddb98 pc=0x556f984b5858
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3360 +0x28 fp=0xc0000ddfe0 sp=0xc0000ddfb8 pc=0x556f984ba008
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ddfe8 sp=0xc0000ddfe0 pc=0x556f982b2561
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3360 +0x485

rax    0xffffffffffffffd0
rbx    0x7f253255b320
rcx    0x3
rdx    0x7f2530005150
rdi    0x7f253255b320
rsi    0x3
rbp    0x0
rsp    0x7f2543ff62c0
r8     0x0
r9     0x0
r10    0x7f25e2037ea0
r11    0x7f2531f06ba0
r12    0x0
r13    0x0
r14    0x7f2533470690
r15    0x7f2413fd10e0
rip    0x7f26721a2c2d
rflags 0x10246
cs     0x33
fs     0x0
gs     0x0
time=2024-12-19T15:18:08.640Z level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
time=2024-12-19T15:18:08.891Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 2"

This used to work in former versions of ollama:rocm container, it broke when I updated recently to last version of the container.

<!-- gh-comment-id:2554639038 --> @Pekkari commented on GitHub (Dec 19, 2024): I'm facing this error in latest ollama:rocm container image where, when I start a new conversation using open-webui, ollama will crash with the following output: ``` time=2024-12-19T15:18:05.370Z level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff gpu=0 parallel=4 available=15934570496 required="3.7 GiB" time=2024-12-19T15:18:05.370Z level=INFO source=server.go:104 msg="system memory" total="46.8 GiB" free="25.3 GiB" free_swap="8.0 GiB" time=2024-12-19T15:18:05.370Z level=INFO source=memory.go:356 msg="offload to rocm" layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[14.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="3.7 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="2.4 GiB" memory.weights.repeating="2.1 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="424.0 MiB" memory.graph.partial="570.7 MiB" time=2024-12-19T15:18:05.372Z level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/lib/ollama/runners/rocm_avx/ollama_llama_server runner --model /root/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --threads 8 --parallel 4 --port 46877" time=2024-12-19T15:18:05.372Z level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2024-12-19T15:18:05.372Z level=INFO source=server.go:555 msg="waiting for llama runner to start responding" time=2024-12-19T15:18:05.373Z level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" time=2024-12-19T15:18:05.434Z level=INFO source=runner.go:945 msg="starting go runner" /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, compute capability 11.0, VMM: no time=2024-12-19T15:18:07.752Z level=INFO source=runner.go:946 msg=system info="ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=8 llama_load_model_from_file: using device ROCm0 (AMD Radeon Graphics) - 23887 MiB free time=2024-12-19T15:18:07.753Z level=INFO source=.:0 msg="Server listening on 127.0.0.1:46877" llama_model_loader: loaded meta data with 30 key-value pairs and 255 tensors from /root/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.2 3B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Llama-3.2 llama_model_loader: - kv 5: general.size_label str = 3B llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 8: llama.block_count u32 = 28 llama_model_loader: - kv 9: llama.context_length u32 = 131072 llama_model_loader: - kv 10: llama.embedding_length u32 = 3072 llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 llama_model_loader: - kv 12: llama.attention.head_count u32 = 24 llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 16: llama.attention.key_length u32 = 128 llama_model_loader: - kv 17: llama.attention.value_length u32 = 128 llama_model_loader: - kv 18: general.file_type u32 = 15 llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 29: general.quantization_version u32 = 2 llama_model_loader: - type f32: 58 tensors llama_model_loader: - type q4_K: 168 tensors llama_model_loader: - type q6_K: 29 tensors time=2024-12-19T15:18:07.885Z level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 3072 llm_load_print_meta: n_layer = 28 llm_load_print_meta: n_head = 24 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 3 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 8192 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 3B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 3.21 B llm_load_print_meta: model size = 1.87 GiB (5.01 BPW) llm_load_print_meta: general.name = Llama 3.2 3B Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: EOM token = 128008 '<|eom_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOG token = 128008 '<|eom_id|>' llm_load_print_meta: EOG token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llm_load_tensors: offloading 28 repeating layers to GPU llm_load_tensors: offloading output layer to GPU llm_load_tensors: offloaded 29/29 layers to GPU llm_load_tensors: CPU_Mapped model buffer size = 308.23 MiB llm_load_tensors: ROCm0 model buffer size = 1918.35 MiB SIGSEGV: segmentation violation PC=0x7f26721a2c2d m=5 sigcode=1 addr=0x18 signal arrived during cgo execution goroutine 7 gp=0xc0000ee000 m=5 mp=0xc000100008 [syscall]: runtime.cgocall(0x556f984f0970, 0xc000070b78) runtime/cgocall.go:167 +0x4b fp=0xc000070b50 sp=0xc000070b18 pc=0x556f982a4b2b github.com/ollama/ollama/llama._Cfunc_llama_load_model_from_file(0x7f2530000c30, {0x0, 0x1d, 0x1, 0x0, 0x0, 0x0, 0x556f984f0380, 0xc000182000, 0x0, ...}) _cgo_gotypes.go:699 +0x50 fp=0xc000070b78 sp=0xc000070b50 pc=0x556f9834f410 github.com/ollama/ollama/llama.LoadModelFromFile.func1({0x7ffff9666daa?, 0xc000060d30?}, {0x0, 0x1d, 0x1, 0x0, 0x0, 0x0, 0x556f984f0380, 0xc000182000, ...}) github.com/ollama/ollama/llama/llama.go:311 +0x127 fp=0xc000070c78 sp=0xc000070b78 pc=0x556f98352027 github.com/ollama/ollama/llama.LoadModelFromFile({0x7ffff9666daa, 0x62}, {0x1d, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc000022200, ...}) github.com/ollama/ollama/llama/llama.go:311 +0x2d6 fp=0xc000070dc8 sp=0xc000070c78 pc=0x556f98351d16 github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc0000ba1b0, {0x1d, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc000022200, 0x0}, ...) github.com/ollama/ollama/llama/runner/runner.go:859 +0xc5 fp=0xc000070f10 sp=0xc000070dc8 pc=0x556f984edde5 github.com/ollama/ollama/llama/runner.Execute.gowrap1() github.com/ollama/ollama/llama/runner/runner.go:979 +0xda fp=0xc000070fe0 sp=0xc000070f10 pc=0x556f984ef73a runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x556f982b2561 created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1 github.com/ollama/ollama/llama/runner/runner.go:979 +0xd0d goroutine 1 gp=0xc0000061c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc0000277b0 sp=0xc000027790 pc=0x556f982aa92e runtime.netpollblock(0xc0001a3f80?, 0x98243186?, 0x6f?) runtime/netpoll.go:575 +0xf7 fp=0xc0000277e8 sp=0xc0000277b0 pc=0x556f9826f697 internal/poll.runtime_pollWait(0x7f25495caf90, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000027808 sp=0xc0000277e8 pc=0x556f982a9c25 internal/poll.(*pollDesc).wait(0xc00002e180?, 0x2c?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000027830 sp=0xc000027808 pc=0x556f982ffa67 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc00002e180) internal/poll/fd_unix.go:620 +0x295 fp=0xc0000278d8 sp=0xc000027830 pc=0x556f98300fd5 net.(*netFD).accept(0xc00002e180) net/fd_unix.go:172 +0x29 fp=0xc000027990 sp=0xc0000278d8 pc=0x556f98379969 net.(*TCPListener).accept(0xc00008e700) net/tcpsock_posix.go:159 +0x1e fp=0xc0000279e0 sp=0xc000027990 pc=0x556f98389fbe net.(*TCPListener).Accept(0xc00008e700) net/tcpsock.go:372 +0x30 fp=0xc000027a10 sp=0xc0000279e0 pc=0x556f983892f0 net/http.(*onceCloseListener).Accept(0xc00019e000?) <autogenerated>:1 +0x24 fp=0xc000027a28 sp=0xc000027a10 pc=0x556f984c7ec4 net/http.(*Server).Serve(0xc0000e84b0, {0x556f988e4818, 0xc00008e700}) net/http/server.go:3330 +0x30c fp=0xc000027b58 sp=0xc000027a28 pc=0x556f984b9c0c github.com/ollama/ollama/llama/runner.Execute({0xc000016110?, 0x556f982b21bc?, 0x0?}) github.com/ollama/ollama/llama/runner/runner.go:1005 +0x11a9 fp=0xc000027ef8 sp=0xc000027b58 pc=0x556f984ef309 main.main() github.com/ollama/ollama/cmd/runner/main.go:11 +0x54 fp=0xc000027f50 sp=0xc000027ef8 pc=0x556f984f0294 runtime.main() runtime/proc.go:272 +0x29d fp=0xc000027fe0 sp=0xc000027f50 pc=0x556f98276c7d runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000027fe8 sp=0xc000027fe0 pc=0x556f982b2561 goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc00005efa8 sp=0xc00005ef88 pc=0x556f982aa92e runtime.goparkunlock(...) runtime/proc.go:430 runtime.forcegchelper() runtime/proc.go:337 +0xb8 fp=0xc00005efe0 sp=0xc00005efa8 pc=0x556f98276fb8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00005efe8 sp=0xc00005efe0 pc=0x556f982b2561 created by runtime.init.7 in goroutine 1 runtime/proc.go:325 +0x1a goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc00005f780 sp=0xc00005f760 pc=0x556f982aa92e runtime.goparkunlock(...) runtime/proc.go:430 runtime.bgsweep(0xc00008c000) runtime/mgcsweep.go:277 +0x94 fp=0xc00005f7c8 sp=0xc00005f780 pc=0x556f982617f4 runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc00005f7e0 sp=0xc00005f7c8 pc=0x556f982560a5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00005f7e8 sp=0xc00005f7e0 pc=0x556f982b2561 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: runtime.gopark(0xc00008c000?, 0x556f987c5e60?, 0x1?, 0x0?, 0xc000007340?) runtime/proc.go:424 +0xce fp=0xc00005ff78 sp=0xc00005ff58 pc=0x556f982aa92e runtime.goparkunlock(...) runtime/proc.go:430 runtime.(*scavengerState).park(0x556f98ad0060) runtime/mgcscavenge.go:425 +0x49 fp=0xc00005ffa8 sp=0xc00005ff78 pc=0x556f9825f229 runtime.bgscavenge(0xc00008c000) runtime/mgcscavenge.go:653 +0x3c fp=0xc00005ffc8 sp=0xc00005ffa8 pc=0x556f9825f79c runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc00005ffe0 sp=0xc00005ffc8 pc=0x556f98256045 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00005ffe8 sp=0xc00005ffe0 pc=0x556f982b2561 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]: runtime.gopark(0xc00005e648?, 0x556f9824c5a5?, 0xb0?, 0x1?, 0xc0000061c0?) runtime/proc.go:424 +0xce fp=0xc00005e620 sp=0xc00005e600 pc=0x556f982aa92e runtime.runfinq() runtime/mfinal.go:193 +0x107 fp=0xc00005e7e0 sp=0xc00005e620 pc=0x556f98255127 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00005e7e8 sp=0xc00005e7e0 pc=0x556f982b2561 created by runtime.createfing in goroutine 1 runtime/mfinal.go:163 +0x3d goroutine 6 gp=0xc000007dc0 m=nil [chan receive]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000060718 sp=0xc0000606f8 pc=0x556f982aa92e runtime.chanrecv(0xc00009a0e0, 0x0, 0x1) runtime/chan.go:639 +0x41c fp=0xc000060790 sp=0xc000060718 pc=0x556f98245d7c runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:489 +0x12 fp=0xc0000607b8 sp=0xc000060790 pc=0x556f98245952 runtime.unique_runtime_registerUniqueMapCleanup.func1(...) runtime/mgc.go:1781 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1784 +0x2f fp=0xc0000607e0 sp=0xc0000607b8 pc=0x556f98258f0f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000607e8 sp=0xc0000607e0 pc=0x556f982b2561 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1779 +0x96 goroutine 8 gp=0xc0000ee1c0 m=nil [semacquire]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x20?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000061618 sp=0xc0000615f8 pc=0x556f982aa92e runtime.goparkunlock(...) runtime/proc.go:430 runtime.semacquire1(0xc0000ba1b8, 0x0, 0x1, 0x0, 0x12) runtime/sema.go:178 +0x22c fp=0xc000061680 sp=0xc000061618 pc=0x556f98289c4c sync.runtime_Semacquire(0x0?) runtime/sema.go:71 +0x25 fp=0xc0000616b8 sp=0xc000061680 pc=0x556f982abb65 sync.(*WaitGroup).Wait(0x0?) sync/waitgroup.go:118 +0x48 fp=0xc0000616e0 sp=0xc0000616b8 pc=0x556f982c7e08 github.com/ollama/ollama/llama/runner.(*Server).run(0xc0000ba1b0, {0x556f988e4e00, 0xc0000ec050}) github.com/ollama/ollama/llama/runner/runner.go:315 +0x47 fp=0xc0000617b8 sp=0xc0000616e0 pc=0x556f984ea487 github.com/ollama/ollama/llama/runner.Execute.gowrap2() github.com/ollama/ollama/llama/runner/runner.go:984 +0x28 fp=0xc0000617e0 sp=0xc0000617b8 pc=0x556f984ef628 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000617e8 sp=0xc0000617e0 pc=0x556f982b2561 created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1 github.com/ollama/ollama/llama/runner/runner.go:984 +0xde5 goroutine 18 gp=0xc0001a4000 m=nil [IO wait]: runtime.gopark(0x556f98300d05?, 0xc00019c000?, 0x10?, 0xda?, 0xb?) runtime/proc.go:424 +0xce fp=0xc0000dd918 sp=0xc0000dd8f8 pc=0x556f982aa92e runtime.netpollblock(0x556f982e6158?, 0x98243186?, 0x6f?) runtime/netpoll.go:575 +0xf7 fp=0xc0000dd950 sp=0xc0000dd918 pc=0x556f9826f697 internal/poll.runtime_pollWait(0x7f25495cae78, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0000dd970 sp=0xc0000dd950 pc=0x556f982a9c25 internal/poll.(*pollDesc).wait(0xc00019c000?, 0xc0001aa000?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000dd998 sp=0xc0000dd970 pc=0x556f982ffa67 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc00019c000, {0xc0001aa000, 0x1000, 0x1000}) internal/poll/fd_unix.go:165 +0x27a fp=0xc0000dda30 sp=0xc0000dd998 pc=0x556f983005ba net.(*netFD).Read(0xc00019c000, {0xc0001aa000?, 0xc0000ddaa0?, 0x556f982fff25?}) net/fd_posix.go:55 +0x25 fp=0xc0000dda78 sp=0xc0000dda30 pc=0x556f98378885 net.(*conn).Read(0xc000188008, {0xc0001aa000?, 0x0?, 0xc0001860f8?}) net/net.go:189 +0x45 fp=0xc0000ddac0 sp=0xc0000dda78 pc=0x556f98382285 net.(*TCPConn).Read(0xc0001860f0?, {0xc0001aa000?, 0xc00019c000?, 0xc0000ddaf8?}) <autogenerated>:1 +0x25 fp=0xc0000ddaf0 sp=0xc0000ddac0 pc=0x556f9838f325 net/http.(*connReader).Read(0xc0001860f0, {0xc0001aa000, 0x1000, 0x1000}) net/http/server.go:798 +0x14b fp=0xc0000ddb40 sp=0xc0000ddaf0 pc=0x556f984b050b bufio.(*Reader).fill(0xc000180060) bufio/bufio.go:110 +0x103 fp=0xc0000ddb78 sp=0xc0000ddb40 pc=0x556f9846f123 bufio.(*Reader).Peek(0xc000180060, 0x4) bufio/bufio.go:148 +0x53 fp=0xc0000ddb98 sp=0xc0000ddb78 pc=0x556f9846f253 net/http.(*conn).serve(0xc00019e000, {0x556f988e4dc8, 0xc0000acf60}) net/http/server.go:2127 +0x738 fp=0xc0000ddfb8 sp=0xc0000ddb98 pc=0x556f984b5858 net/http.(*Server).Serve.gowrap3() net/http/server.go:3360 +0x28 fp=0xc0000ddfe0 sp=0xc0000ddfb8 pc=0x556f984ba008 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ddfe8 sp=0xc0000ddfe0 pc=0x556f982b2561 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3360 +0x485 rax 0xffffffffffffffd0 rbx 0x7f253255b320 rcx 0x3 rdx 0x7f2530005150 rdi 0x7f253255b320 rsi 0x3 rbp 0x0 rsp 0x7f2543ff62c0 r8 0x0 r9 0x0 r10 0x7f25e2037ea0 r11 0x7f2531f06ba0 r12 0x0 r13 0x0 r14 0x7f2533470690 r15 0x7f2413fd10e0 rip 0x7f26721a2c2d rflags 0x10246 cs 0x33 fs 0x0 gs 0x0 time=2024-12-19T15:18:08.640Z level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" time=2024-12-19T15:18:08.891Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 2" ``` This used to work in former versions of ollama:rocm container, it broke when I updated recently to last version of the container.
Author
Owner

@ican2002 commented on GitHub (Feb 5, 2025):

can anyone help to resolve this issue? thanks.

CPU: intel i7-6700HQ
OS: windows10
GPU: 960M

seems CPU and GPU detected: in the log "Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm_avx cpu]"

and cgo related problems as log shows:
runtime.cgocall(0x7ff6bdc60920, 0xc0003f4c10)
runtime/cgocall.go:167 +0x3e fp=0xc0003f4be8 sp=0xc0003f4b80 pc=0x7ff6bcea9c3e

seems many ones are facing this problems.

if anyone resloeved it, please reply to this, thank you.

===================================================================
2025/02/05 20:47:23 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\Users\can\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-02-05T20:47:23.684+08:00 level=INFO source=images.go:432 msg="total blobs: 0"
time=2025-02-05T20:47:23.685+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-02-05T20:47:23.686+08:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
time=2025-02-05T20:47:23.687+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm_avx cpu]"
time=2025-02-05T20:47:23.687+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-02-05T20:47:23.687+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-02-05T20:47:23.687+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=4 efficiency=0 threads=8
Exception 0xc0000005 0x0 0x10 0x7ffcaca97983
PC=0x7ffcaca97983
signal arrived during external code execution

runtime.cgocall(0x7ff6bdc60920, 0xc0003f4c10)
runtime/cgocall.go:167 +0x3e fp=0xc0003f4be8 sp=0xc0003f4b80 pc=0x7ff6bcea9c3e
github.com/ollama/ollama/discover._Cfunc_nvml_init(0x2041ea093d0, 0xc00004f440)
_cgo_gotypes.go:573 +0x4d fp=0xc0003f4c10 sp=0xc0003f4be8 pc=0x7ff6bd476f8d
github.com/ollama/ollama/discover.loadNVMLMgmt.func2(0x2041ea093d0, 0xc00004f440)
github.com/ollama/ollama/discover/gpu.go:651 +0x4a fp=0xc0003f4c40 sp=0xc0003f4c10 pc=0x7ff6bd47e68a
github.com/ollama/ollama/discover.loadNVMLMgmt({0xc00004f400, 0x3, 0x7ff6be8b9410?})
github.com/ollama/ollama/discover/gpu.go:651 +0x245 fp=0xc0003f4d30 sp=0xc0003f4c40 pc=0x7ff6bd47e4c5
github.com/ollama/ollama/discover.initCudaHandles()
github.com/ollama/ollama/discover/gpu.go:118 +0x4fa fp=0xc0003f4f98 sp=0xc0003f4d30 pc=0x7ff6bd477a3a
github.com/ollama/ollama/discover.GetGPUInfo()
github.com/ollama/ollama/discover/gpu.go:262 +0x705 fp=0xc0003f5ae0 sp=0xc0003f4f98 pc=0x7ff6bd478b45
github.com/ollama/ollama/server.Serve({0x7ff6be099760, 0xc000608a80})
github.com/ollama/ollama/server/routes.go:1274 +0x8aa fp=0xc0003f5d18 sp=0xc0003f5ae0 pc=0x7ff6bda2e94a
github.com/ollama/ollama/cmd.RunServer(0xc00062a400?, {0x7ff6be955020?, 0x4?, 0x7ff6bdeda1ef?})
github.com/ollama/ollama/cmd/cmd.go:1033 +0x4a fp=0xc0003f5d58 sp=0xc0003f5d18 pc=0x7ff6bda5daaa
github.com/spf13/cobra.(*Command).execute(0xc0000bc608, {0x7ff6be955020, 0x0, 0x0})
github.com/spf13/cobra@v1.7.0/command.go:940 +0x862 fp=0xc0003f5e78 sp=0xc0003f5d58 pc=0x7ff6bd02c122
github.com/spf13/cobra.(*Command).ExecuteC(0xc00008b508)
github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0003f5f30 sp=0xc0003f5e78 pc=0x7ff6bd02c965
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
github.com/ollama/ollama/main.go:12 +0x4d fp=0xc0003f5f50 sp=0xc0003f5f30 pc=0x7ff6bda65c8d
runtime.main()
runtime/proc.go:272 +0x27d fp=0xc0003f5fe0 sp=0xc0003f5f50 pc=0x7ff6bce7dfbd
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0003f5fe8 sp=0xc0003f5fe0 pc=0x7ff6bceb8921

<!-- gh-comment-id:2636804217 --> @ican2002 commented on GitHub (Feb 5, 2025): can anyone help to resolve this issue? thanks. CPU: intel i7-6700HQ OS: windows10 GPU: 960M seems CPU and GPU detected: in the log "Dynamic LLM libraries" runners="[**cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm_avx cpu**]" and cgo related problems as log shows: runtime.cgocall(0x7ff6bdc60920, 0xc0003f4c10) **runtime/cgocall.go:167 +0x3e fp=0xc0003f4be8 sp=0xc0003f4b80 pc=0x7ff6bcea9c3e** seems many ones are facing this problems. if anyone resloeved it, please reply to this, thank you. =================================================================== 2025/02/05 20:47:23 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\can\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-02-05T20:47:23.684+08:00 level=INFO source=images.go:432 msg="total blobs: 0" time=2025-02-05T20:47:23.685+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-02-05T20:47:23.686+08:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)" time=2025-02-05T20:47:23.687+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm_avx cpu]" time=2025-02-05T20:47:23.687+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs" time=2025-02-05T20:47:23.687+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-02-05T20:47:23.687+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=4 efficiency=0 threads=8 Exception 0xc0000005 0x0 0x10 0x7ffcaca97983 PC=0x7ffcaca97983 signal arrived during external code execution runtime.cgocall(0x7ff6bdc60920, 0xc0003f4c10) runtime/cgocall.go:167 +0x3e fp=0xc0003f4be8 sp=0xc0003f4b80 pc=0x7ff6bcea9c3e github.com/ollama/ollama/discover._Cfunc_nvml_init(0x2041ea093d0, 0xc00004f440) _cgo_gotypes.go:573 +0x4d fp=0xc0003f4c10 sp=0xc0003f4be8 pc=0x7ff6bd476f8d github.com/ollama/ollama/discover.loadNVMLMgmt.func2(0x2041ea093d0, 0xc00004f440) github.com/ollama/ollama/discover/gpu.go:651 +0x4a fp=0xc0003f4c40 sp=0xc0003f4c10 pc=0x7ff6bd47e68a github.com/ollama/ollama/discover.loadNVMLMgmt({0xc00004f400, 0x3, 0x7ff6be8b9410?}) github.com/ollama/ollama/discover/gpu.go:651 +0x245 fp=0xc0003f4d30 sp=0xc0003f4c40 pc=0x7ff6bd47e4c5 github.com/ollama/ollama/discover.initCudaHandles() github.com/ollama/ollama/discover/gpu.go:118 +0x4fa fp=0xc0003f4f98 sp=0xc0003f4d30 pc=0x7ff6bd477a3a github.com/ollama/ollama/discover.GetGPUInfo() github.com/ollama/ollama/discover/gpu.go:262 +0x705 fp=0xc0003f5ae0 sp=0xc0003f4f98 pc=0x7ff6bd478b45 github.com/ollama/ollama/server.Serve({0x7ff6be099760, 0xc000608a80}) github.com/ollama/ollama/server/routes.go:1274 +0x8aa fp=0xc0003f5d18 sp=0xc0003f5ae0 pc=0x7ff6bda2e94a github.com/ollama/ollama/cmd.RunServer(0xc00062a400?, {0x7ff6be955020?, 0x4?, 0x7ff6bdeda1ef?}) github.com/ollama/ollama/cmd/cmd.go:1033 +0x4a fp=0xc0003f5d58 sp=0xc0003f5d18 pc=0x7ff6bda5daaa github.com/spf13/cobra.(*Command).execute(0xc0000bc608, {0x7ff6be955020, 0x0, 0x0}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x862 fp=0xc0003f5e78 sp=0xc0003f5d58 pc=0x7ff6bd02c122 github.com/spf13/cobra.(*Command).ExecuteC(0xc00008b508) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0003f5f30 sp=0xc0003f5e78 pc=0x7ff6bd02c965 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc0003f5f50 sp=0xc0003f5f30 pc=0x7ff6bda65c8d runtime.main() runtime/proc.go:272 +0x27d fp=0xc0003f5fe0 sp=0xc0003f5f50 pc=0x7ff6bce7dfbd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0003f5fe8 sp=0xc0003f5fe0 pc=0x7ff6bceb8921
Author
Owner

@jessegross commented on GitHub (Feb 6, 2025):

@ican2002 All three of these issues look different to me. Please create a new bug and include your logs there.

<!-- gh-comment-id:2638372523 --> @jessegross commented on GitHub (Feb 6, 2025): @ican2002 All three of these issues look different to me. Please create a new bug and include your logs there.
Author
Owner

@ican2002 commented on GitHub (Feb 6, 2025):

@jessegross I create an issue at
https://github.com/ollama/ollama/issues/8886
Thanks.

@ican2002 All three of these issues look different to me. Please create a new bug and include your logs there.

<!-- gh-comment-id:2639903071 --> @ican2002 commented on GitHub (Feb 6, 2025): @jessegross I create an issue at https://github.com/ollama/ollama/issues/8886 Thanks. > [@ican2002](https://github.com/ican2002) All three of these issues look different to me. Please create a new bug and include your logs there.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67138