[GH-ISSUE #8262] Segmentation Fault in AMD GPGPU Applications on 780M #5282

Closed
opened 2026-04-12 16:27:26 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @zw963 on GitHub (Dec 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/8262

What is the issue?

Hi, I start my ollama model failed again when try use AMD 780M iGPU.

following is the log for HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/bin/ollama serve

 ╰──➤ $ HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/bin/ollama serve
2024/12/28 21:16:53 routes.go:1259: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/zw963/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-12-28T21:16:53.340+08:00 level=INFO source=images.go:757 msg="total blobs: 32"
time=2024-12-28T21:16:53.340+08:00 level=INFO source=images.go:764 msg="total unused blobs removed: 0"
time=2024-12-28T21:16:53.341+08:00 level=INFO source=routes.go:1310 msg="Listening on 127.0.0.1:11434 (version 0.5.4)"
time=2024-12-28T21:16:53.341+08:00 level=INFO source=routes.go:1339 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm_avx]"
time=2024-12-28T21:16:53.341+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2024-12-28T21:16:53.365+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-12-28T21:16:53.366+08:00 level=INFO source=amd_linux.go:391 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.0
time=2024-12-28T21:16:53.366+08:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="16.0 GiB" available="14.8 GiB"
^[[O[GIN] 2024/12/28 - 21:17:00 | 200 |      31.846µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/12/28 - 21:17:00 | 200 |   19.231074ms |       127.0.0.1 | POST     "/api/show"
time=2024-12-28T21:17:01.006+08:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 gpu=0 parallel=4 available=15936040960 required="8.8 GiB"
time=2024-12-28T21:17:01.006+08:00 level=INFO source=server.go:104 msg="system memory" total="46.8 GiB" free="42.7 GiB" free_swap="63.0 GiB"
time=2024-12-28T21:17:01.006+08:00 level=INFO source=memory.go:356 msg="offload to rocm" layers.requested=-1 layers.model=43 layers.offload=43 layers.split="" memory.available="[14.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.8 GiB" memory.required.partial="8.8 GiB" memory.required.kv="2.6 GiB" memory.required.allocations="[8.8 GiB]" memory.weights.total="7.0 GiB" memory.weights.repeating="6.3 GiB" memory.weights.nonrepeating="717.8 MiB" memory.graph.full="507.0 MiB" memory.graph.partial="1.2 GiB"
time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/lib/ollama/runners/rocm_avx/ollama_llama_server runner --model /home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 --ctx-size 8192 --batch-size 512 --n-gpu-layers 43 --threads 8 --parallel 4 --port 12215"
time=2024-12-28T21:17:01.007+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
time=2024-12-28T21:17:01.036+08:00 level=INFO source=runner.go:945 msg="starting go runner"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon 780M, compute capability 11.0, VMM: no
time=2024-12-28T21:17:02.349+08:00 level=INFO source=runner.go:946 msg=system info="ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=8
llama_load_model_from_file: using device ROCm0 (AMD Radeon 780M) - 23866 MiB free
time=2024-12-28T21:17:02.349+08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:12215"
llama_model_loader: loaded meta data with 29 key-value pairs and 464 tensors from /home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma2
llama_model_loader: - kv   1:                               general.name str              = gemma-2-9b-it
llama_model_loader: - kv   2:                      gemma2.context_length u32              = 8192
llama_model_loader: - kv   3:                    gemma2.embedding_length u32              = 3584
llama_model_loader: - kv   4:                         gemma2.block_count u32              = 42
llama_model_loader: - kv   5:                 gemma2.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                gemma2.attention.head_count u32              = 16
llama_model_loader: - kv   7:             gemma2.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:    gemma2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv   9:                gemma2.attention.key_length u32              = 256
llama_model_loader: - kv  10:              gemma2.attention.value_length u32              = 256
llama_model_loader: - kv  11:                          general.file_type u32              = 2
llama_model_loader: - kv  12:              gemma2.attn_logit_softcapping f32              = 50.000000
llama_model_loader: - kv  13:             gemma2.final_logit_softcapping f32              = 30.000000
llama_model_loader: - kv  14:            gemma2.attention.sliding_window u32              = 4096
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,256000]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  18:                      tokenizer.ggml.scores arr[f32,256000]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  19:                  tokenizer.ggml.token_type arr[i32,256000]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  20:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  21:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  22:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  23:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  24:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  25:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  26:                    tokenizer.chat_template str              = {{ bos_token }}{% if messages[0]['rol...
llama_model_loader: - kv  27:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  28:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  169 tensors
llama_model_loader: - type q4_0:  294 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 108
llm_load_vocab: token to piece cache size = 1.6014 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = gemma2
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 256000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 3584
llm_load_print_meta: n_layer          = 42
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 256
llm_load_print_meta: n_swa            = 4096
llm_load_print_meta: n_embd_head_k    = 256
llm_load_print_meta: n_embd_head_v    = 256
llm_load_print_meta: n_gqa            = 2
llm_load_print_meta: n_embd_k_gqa     = 2048
llm_load_print_meta: n_embd_v_gqa     = 2048
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 9B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 9.24 B
llm_load_print_meta: model size       = 5.06 GiB (4.71 BPW)
llm_load_print_meta: general.name     = gemma-2-9b-it
llm_load_print_meta: BOS token        = 2 '<bos>'
llm_load_print_meta: EOS token        = 1 '<eos>'
llm_load_print_meta: EOT token        = 107 '<end_of_turn>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: PAD token        = 0 '<pad>'
llm_load_print_meta: LF token         = 227 '<0x0A>'
llm_load_print_meta: EOG token        = 1 '<eos>'
llm_load_print_meta: EOG token        = 107 '<end_of_turn>'
llm_load_print_meta: max token length = 93
time=2024-12-28T21:17:02.535+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model"
llm_load_tensors: offloading 42 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors:   CPU_Mapped model buffer size =   717.77 MiB
llm_load_tensors:        ROCm0 model buffer size =  5185.21 MiB

Following is the failed logs when i try run ollama run gemma2 in another opened terminal.

SIGSEGV: segmentation violation
PC=0x713070f0fe2b m=5 sigcode=1 addr=0x18
signal arrived during cgo execution

goroutine 20 gp=0xc000104a80 m=5 mp=0xc000100008 [syscall]:
runtime.cgocall(0x5693bccf4990, 0xc000204b78)
        runtime/cgocall.go:167 +0x4b fp=0xc000204b50 sp=0xc000204b18 pc=0x5693bcaa896b
github.com/ollama/ollama/llama._Cfunc_llama_load_model_from_file(0x712ed4000be0, {0x0, 0x2b, 0x1, 0x0, 0x0, 0x0, 0x5693bccf41e0, 0xc000208000, 0x0, ...})
        _cgo_gotypes.go:707 +0x50 fp=0xc000204b78 sp=0xc000204b50 pc=0x5693bcb53250
github.com/ollama/ollama/llama.LoadModelFromFile.func1({0x7ffc8d222d0e?, 0x0?}, {0x0, 0x2b, 0x1, 0x0, 0x0, 0x0, 0x5693bccf41e0, 0xc000208000, ...})
        github.com/ollama/ollama/llama/llama.go:311 +0x127 fp=0xc000204c78 sp=0xc000204b78 pc=0x5693bcb55e67
github.com/ollama/ollama/llama.LoadModelFromFile({0x7ffc8d222d0e, 0x68}, {0x2b, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc00011e1b0, ...})
        github.com/ollama/ollama/llama/llama.go:311 +0x2d6 fp=0xc000204dc8 sp=0xc000204c78 pc=0x5693bcb55b56
github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc0001461b0, {0x2b, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc00011e1b0, 0x0}, ...)
        github.com/ollama/ollama/llama/runner/runner.go:859 +0xc5 fp=0xc000204f10 sp=0xc000204dc8 pc=0x5693bccf1c25
github.com/ollama/ollama/llama/runner.Execute.gowrap1()
        github.com/ollama/ollama/llama/runner/runner.go:979 +0xda fp=0xc000204fe0 sp=0xc000204f10 pc=0x5693bccf357a
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000204fe8 sp=0xc000204fe0 pc=0x5693bcab63a1
created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1
        github.com/ollama/ollama/llama/runner/runner.go:979 +0xd0d

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc0000637b0 sp=0xc000063790 pc=0x5693bcaae76e
runtime.netpollblock(0xc000063800?, 0xbca46fc6?, 0x93?)
        runtime/netpoll.go:575 +0xf7 fp=0xc0000637e8 sp=0xc0000637b0 pc=0x5693bca734d7
internal/poll.runtime_pollWait(0x712f89fca730, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000063808 sp=0xc0000637e8 pc=0x5693bcaada65
internal/poll.(*pollDesc).wait(0xc000190100?, 0x2c?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000063830 sp=0xc000063808 pc=0x5693bcb038a7
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000190100)
        internal/poll/fd_unix.go:620 +0x295 fp=0xc0000638d8 sp=0xc000063830 pc=0x5693bcb04e15
net.(*netFD).accept(0xc000190100)
        net/fd_unix.go:172 +0x29 fp=0xc000063990 sp=0xc0000638d8 pc=0x5693bcb7d7a9
net.(*TCPListener).accept(0xc00012e6c0)
        net/tcpsock_posix.go:159 +0x1e fp=0xc0000639e0 sp=0xc000063990 pc=0x5693bcb8ddfe
net.(*TCPListener).Accept(0xc00012e6c0)
        net/tcpsock.go:372 +0x30 fp=0xc000063a10 sp=0xc0000639e0 pc=0x5693bcb8d130
net/http.(*onceCloseListener).Accept(0xc000146240?)
        <autogenerated>:1 +0x24 fp=0xc000063a28 sp=0xc000063a10 pc=0x5693bcccbd04
net/http.(*Server).Serve(0xc00018e4b0, {0x5693bd0cbeb8, 0xc00012e6c0})
        net/http/server.go:3330 +0x30c fp=0xc000063b58 sp=0xc000063a28 pc=0x5693bccbda4c
github.com/ollama/ollama/llama/runner.Execute({0xc000132010?, 0x5693bcab5ffc?, 0x0?})
        github.com/ollama/ollama/llama/runner/runner.go:1005 +0x11a9 fp=0xc000063ef8 sp=0xc000063b58 pc=0x5693bccf3149
main.main()
        github.com/ollama/ollama/cmd/runner/main.go:11 +0x54 fp=0xc000063f50 sp=0xc000063ef8 pc=0x5693bccf40d4
runtime.main()
        runtime/proc.go:272 +0x29d fp=0xc000063fe0 sp=0xc000063f50 pc=0x5693bca7aabd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000063fe8 sp=0xc000063fe0 pc=0x5693bcab63a1

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000098fa8 sp=0xc000098f88 pc=0x5693bcaae76e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.forcegchelper()
        runtime/proc.go:337 +0xb8 fp=0xc000098fe0 sp=0xc000098fa8 pc=0x5693bca7adf8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000098fe8 sp=0xc000098fe0 pc=0x5693bcab63a1
created by runtime.init.7 in goroutine 1
        runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000099780 sp=0xc000099760 pc=0x5693bcaae76e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.bgsweep(0xc000026400)
        runtime/mgcsweep.go:277 +0x94 fp=0xc0000997c8 sp=0xc000099780 pc=0x5693bca65634
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc0000997e0 sp=0xc0000997c8 pc=0x5693bca59ee5
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000997e8 sp=0xc0000997e0 pc=0x5693bcab63a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0xc000026400?, 0x5693bcfb8fc0?, 0x1?, 0x0?, 0xc000007340?)
        runtime/proc.go:424 +0xce fp=0xc000099f78 sp=0xc000099f58 pc=0x5693bcaae76e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.(*scavengerState).park(0x5693bd2b6380)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc000099fa8 sp=0xc000099f78 pc=0x5693bca63069
runtime.bgscavenge(0xc000026400)
        runtime/mgcscavenge.go:653 +0x3c fp=0xc000099fc8 sp=0xc000099fa8 pc=0x5693bca635dc
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc000099fe0 sp=0xc000099fc8 pc=0x5693bca59e85
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000099fe8 sp=0xc000099fe0 pc=0x5693bcab63a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 18 gp=0xc000104700 m=nil [finalizer wait]:
runtime.gopark(0xc000098648?, 0x5693bca503e5?, 0xb0?, 0x1?, 0xc0000061c0?)
        runtime/proc.go:424 +0xce fp=0xc000098620 sp=0xc000098600 pc=0x5693bcaae76e
runtime.runfinq()
        runtime/mfinal.go:193 +0x107 fp=0xc0000987e0 sp=0xc000098620 pc=0x5693bca58f67
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000987e8 sp=0xc0000987e0 pc=0x5693bcab63a1
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:163 +0x3d

goroutine 19 gp=0xc0001048c0 m=nil [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000094718 sp=0xc0000946f8 pc=0x5693bcaae76e
runtime.chanrecv(0xc0001120e0, 0x0, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc000094790 sp=0xc000094718 pc=0x5693bca49bbc
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:489 +0x12 fp=0xc0000947b8 sp=0xc000094790 pc=0x5693bca49792
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
        runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1784 +0x2f fp=0xc0000947e0 sp=0xc0000947b8 pc=0x5693bca5cd4f
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000947e8 sp=0xc0000947e0 pc=0x5693bcab63a1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1779 +0x96

goroutine 21 gp=0xc000104c40 m=nil [semacquire]:
runtime.gopark(0x0?, 0x0?, 0x20?, 0x81?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000095618 sp=0xc0000955f8 pc=0x5693bcaae76e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.semacquire1(0xc0001461b8, 0x0, 0x1, 0x0, 0x12)
        runtime/sema.go:178 +0x22c fp=0xc000095680 sp=0xc000095618 pc=0x5693bca8da8c
sync.runtime_Semacquire(0x0?)
        runtime/sema.go:71 +0x25 fp=0xc0000956b8 sp=0xc000095680 pc=0x5693bcaaf9a5
sync.(*WaitGroup).Wait(0x0?)
        sync/waitgroup.go:118 +0x48 fp=0xc0000956e0 sp=0xc0000956b8 pc=0x5693bcacbc48
github.com/ollama/ollama/llama/runner.(*Server).run(0xc0001461b0, {0x5693bd0cc4a0, 0xc000196050})
        github.com/ollama/ollama/llama/runner/runner.go:315 +0x47 fp=0xc0000957b8 sp=0xc0000956e0 pc=0x5693bccee2c7
github.com/ollama/ollama/llama/runner.Execute.gowrap2()
        github.com/ollama/ollama/llama/runner/runner.go:984 +0x28 fp=0xc0000957e0 sp=0xc0000957b8 pc=0x5693bccf3468
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000957e8 sp=0xc0000957e0 pc=0x5693bcab63a1
created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1
        github.com/ollama/ollama/llama/runner/runner.go:984 +0xde5

goroutine 22 gp=0xc000105340 m=nil [IO wait]:
runtime.gopark(0xc0002a6000?, 0xc000185958?, 0x3e?, 0x1?, 0xb?)
        runtime/proc.go:424 +0xce fp=0xc000185918 sp=0xc0001858f8 pc=0x5693bcaae76e
runtime.netpollblock(0x5693bcae9f98?, 0xbca46fc6?, 0x93?)
        runtime/netpoll.go:575 +0xf7 fp=0xc000185950 sp=0xc000185918 pc=0x5693bca734d7
internal/poll.runtime_pollWait(0x712f89fca618, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000185970 sp=0xc000185950 pc=0x5693bcaada65
internal/poll.(*pollDesc).wait(0xc000190180?, 0xc0001b8000?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000185998 sp=0xc000185970 pc=0x5693bcb038a7
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000190180, {0xc0001b8000, 0x1000, 0x1000})
        internal/poll/fd_unix.go:165 +0x27a fp=0xc000185a30 sp=0xc000185998 pc=0x5693bcb043fa
net.(*netFD).Read(0xc000190180, {0xc0001b8000?, 0xc000185aa0?, 0x5693bcb03d65?})
        net/fd_posix.go:55 +0x25 fp=0xc000185a78 sp=0xc000185a30 pc=0x5693bcb7c6c5
net.(*conn).Read(0xc000124098, {0xc0001b8000?, 0x0?, 0xc00012d058?})
        net/net.go:189 +0x45 fp=0xc000185ac0 sp=0xc000185a78 pc=0x5693bcb860c5
net.(*TCPConn).Read(0xc00012d050?, {0xc0001b8000?, 0xc000190180?, 0xc000185af8?})
        <autogenerated>:1 +0x25 fp=0xc000185af0 sp=0xc000185ac0 pc=0x5693bcb93165
net/http.(*connReader).Read(0xc00012d050, {0xc0001b8000, 0x1000, 0x1000})
        net/http/server.go:798 +0x14b fp=0xc000185b40 sp=0xc000185af0 pc=0x5693bccb434b
bufio.(*Reader).fill(0xc000130480)
        bufio/bufio.go:110 +0x103 fp=0xc000185b78 sp=0xc000185b40 pc=0x5693bcc72f63
bufio.(*Reader).Peek(0xc000130480, 0x4)
        bufio/bufio.go:148 +0x53 fp=0xc000185b98 sp=0xc000185b78 pc=0x5693bcc73093
net/http.(*conn).serve(0xc000146240, {0x5693bd0cc468, 0xc00012cf60})
        net/http/server.go:2127 +0x738 fp=0xc000185fb8 sp=0xc000185b98 pc=0x5693bccb9698
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3360 +0x28 fp=0xc000185fe0 sp=0xc000185fb8 pc=0x5693bccbde48
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000185fe8 sp=0xc000185fe0 pc=0x5693bcab63a1
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3360 +0x485

rax    0x712ed72c8ad0
rbx    0x712ed72ced40
rcx    0x713070da6663
rdx    0x712ed4005130
rdi    0x712ed72ced40
rsi    0x3
rbp    0x712edbff61d0
rsp    0x712edbff61a0
r8     0x0
r9     0x0
r10    0x4
r11    0xa66e143e45c2eb86
r12    0x0
r13    0x18
r14    0xffffffffffffffc0
r15    0x712dc3ef8e80
rip    0x713070f0fe2b
rflags 0x10206
cs     0x33
fs     0x0
gs     0x0
time=2024-12-28T21:17:03.119+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
time=2024-12-28T21:17:03.370+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2024/12/28 - 21:17:03 | 500 |  2.421283447s |       127.0.0.1 | POST     "/api/generate"

I----------------

Following is my packages info:

 ╰──➤ $ 1  pacman -Q |grep 'ollama\|rocm'
ollama 0.5.4-1
ollama-rocm 0.5.4-1
python-pytorch-rocm 2.5.1-7
rocm-clang-ocl 6.1.2-1
rocm-cmake 6.2.4-1
rocm-core 6.2.4-2
rocm-device-libs 6.2.4-1
rocm-hip-libraries 6.2.2-1
rocm-hip-runtime 6.2.2-1
rocm-hip-sdk 6.2.2-1
rocm-language-runtime 6.2.2-1
rocm-llvm 6.2.4-1
rocm-opencl-runtime 6.2.4-1
rocm-opencl-sdk 6.2.2-1
rocm-smi-lib 6.2.4-1
rocminfo 6.2.4-1

Thanks

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.5.4-1 tested both on arch linux installed version and github release page downloaded version.
It works before and broken after i update my arch linux before create this issue.

Originally created by @zw963 on GitHub (Dec 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/8262 ### What is the issue? Hi, I start my ollama model failed again when try use AMD 780M iGPU. following is the log for `HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/bin/ollama serve` ```sh ╰──➤ $ HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/bin/ollama serve 2024/12/28 21:16:53 routes.go:1259: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/zw963/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2024-12-28T21:16:53.340+08:00 level=INFO source=images.go:757 msg="total blobs: 32" time=2024-12-28T21:16:53.340+08:00 level=INFO source=images.go:764 msg="total unused blobs removed: 0" time=2024-12-28T21:16:53.341+08:00 level=INFO source=routes.go:1310 msg="Listening on 127.0.0.1:11434 (version 0.5.4)" time=2024-12-28T21:16:53.341+08:00 level=INFO source=routes.go:1339 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm_avx]" time=2024-12-28T21:16:53.341+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs" time=2024-12-28T21:16:53.365+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-12-28T21:16:53.366+08:00 level=INFO source=amd_linux.go:391 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.0 time=2024-12-28T21:16:53.366+08:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="16.0 GiB" available="14.8 GiB" ^[[O[GIN] 2024/12/28 - 21:17:00 | 200 | 31.846µs | 127.0.0.1 | HEAD "/" [GIN] 2024/12/28 - 21:17:00 | 200 | 19.231074ms | 127.0.0.1 | POST "/api/show" time=2024-12-28T21:17:01.006+08:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 gpu=0 parallel=4 available=15936040960 required="8.8 GiB" time=2024-12-28T21:17:01.006+08:00 level=INFO source=server.go:104 msg="system memory" total="46.8 GiB" free="42.7 GiB" free_swap="63.0 GiB" time=2024-12-28T21:17:01.006+08:00 level=INFO source=memory.go:356 msg="offload to rocm" layers.requested=-1 layers.model=43 layers.offload=43 layers.split="" memory.available="[14.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.8 GiB" memory.required.partial="8.8 GiB" memory.required.kv="2.6 GiB" memory.required.allocations="[8.8 GiB]" memory.weights.total="7.0 GiB" memory.weights.repeating="6.3 GiB" memory.weights.nonrepeating="717.8 MiB" memory.graph.full="507.0 MiB" memory.graph.partial="1.2 GiB" time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/lib/ollama/runners/rocm_avx/ollama_llama_server runner --model /home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 --ctx-size 8192 --batch-size 512 --n-gpu-layers 43 --threads 8 --parallel 4 --port 12215" time=2024-12-28T21:17:01.007+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding" time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" time=2024-12-28T21:17:01.036+08:00 level=INFO source=runner.go:945 msg="starting go runner" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon 780M, compute capability 11.0, VMM: no time=2024-12-28T21:17:02.349+08:00 level=INFO source=runner.go:946 msg=system info="ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=8 llama_load_model_from_file: using device ROCm0 (AMD Radeon 780M) - 23866 MiB free time=2024-12-28T21:17:02.349+08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:12215" llama_model_loader: loaded meta data with 29 key-value pairs and 464 tensors from /home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = gemma2 llama_model_loader: - kv 1: general.name str = gemma-2-9b-it llama_model_loader: - kv 2: gemma2.context_length u32 = 8192 llama_model_loader: - kv 3: gemma2.embedding_length u32 = 3584 llama_model_loader: - kv 4: gemma2.block_count u32 = 42 llama_model_loader: - kv 5: gemma2.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: gemma2.attention.head_count u32 = 16 llama_model_loader: - kv 7: gemma2.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: gemma2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 9: gemma2.attention.key_length u32 = 256 llama_model_loader: - kv 10: gemma2.attention.value_length u32 = 256 llama_model_loader: - kv 11: general.file_type u32 = 2 llama_model_loader: - kv 12: gemma2.attn_logit_softcapping f32 = 50.000000 llama_model_loader: - kv 13: gemma2.final_logit_softcapping f32 = 30.000000 llama_model_loader: - kv 14: gemma2.attention.sliding_window u32 = 4096 llama_model_loader: - kv 15: tokenizer.ggml.model str = llama llama_model_loader: - kv 16: tokenizer.ggml.pre str = default llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,256000] = ["<pad>", "<eos>", "<bos>", "<unk>", ... llama_model_loader: - kv 18: tokenizer.ggml.scores arr[f32,256000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 19: tokenizer.ggml.token_type arr[i32,256000] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... llama_model_loader: - kv 20: tokenizer.ggml.bos_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.ggml.eos_token_id u32 = 1 llama_model_loader: - kv 22: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 23: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 24: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 25: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 26: tokenizer.chat_template str = {{ bos_token }}{% if messages[0]['rol... llama_model_loader: - kv 27: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 169 tensors llama_model_loader: - type q4_0: 294 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 108 llm_load_vocab: token to piece cache size = 1.6014 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = gemma2 llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 256000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 3584 llm_load_print_meta: n_layer = 42 llm_load_print_meta: n_head = 16 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 256 llm_load_print_meta: n_swa = 4096 llm_load_print_meta: n_embd_head_k = 256 llm_load_print_meta: n_embd_head_v = 256 llm_load_print_meta: n_gqa = 2 llm_load_print_meta: n_embd_k_gqa = 2048 llm_load_print_meta: n_embd_v_gqa = 2048 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 9B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 9.24 B llm_load_print_meta: model size = 5.06 GiB (4.71 BPW) llm_load_print_meta: general.name = gemma-2-9b-it llm_load_print_meta: BOS token = 2 '<bos>' llm_load_print_meta: EOS token = 1 '<eos>' llm_load_print_meta: EOT token = 107 '<end_of_turn>' llm_load_print_meta: UNK token = 3 '<unk>' llm_load_print_meta: PAD token = 0 '<pad>' llm_load_print_meta: LF token = 227 '<0x0A>' llm_load_print_meta: EOG token = 1 '<eos>' llm_load_print_meta: EOG token = 107 '<end_of_turn>' llm_load_print_meta: max token length = 93 time=2024-12-28T21:17:02.535+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model" llm_load_tensors: offloading 42 repeating layers to GPU llm_load_tensors: offloading output layer to GPU llm_load_tensors: offloaded 43/43 layers to GPU llm_load_tensors: CPU_Mapped model buffer size = 717.77 MiB llm_load_tensors: ROCm0 model buffer size = 5185.21 MiB ``` ---------------- Following is the failed logs when i try run `ollama run gemma2` in another opened terminal. ``` SIGSEGV: segmentation violation PC=0x713070f0fe2b m=5 sigcode=1 addr=0x18 signal arrived during cgo execution goroutine 20 gp=0xc000104a80 m=5 mp=0xc000100008 [syscall]: runtime.cgocall(0x5693bccf4990, 0xc000204b78) runtime/cgocall.go:167 +0x4b fp=0xc000204b50 sp=0xc000204b18 pc=0x5693bcaa896b github.com/ollama/ollama/llama._Cfunc_llama_load_model_from_file(0x712ed4000be0, {0x0, 0x2b, 0x1, 0x0, 0x0, 0x0, 0x5693bccf41e0, 0xc000208000, 0x0, ...}) _cgo_gotypes.go:707 +0x50 fp=0xc000204b78 sp=0xc000204b50 pc=0x5693bcb53250 github.com/ollama/ollama/llama.LoadModelFromFile.func1({0x7ffc8d222d0e?, 0x0?}, {0x0, 0x2b, 0x1, 0x0, 0x0, 0x0, 0x5693bccf41e0, 0xc000208000, ...}) github.com/ollama/ollama/llama/llama.go:311 +0x127 fp=0xc000204c78 sp=0xc000204b78 pc=0x5693bcb55e67 github.com/ollama/ollama/llama.LoadModelFromFile({0x7ffc8d222d0e, 0x68}, {0x2b, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc00011e1b0, ...}) github.com/ollama/ollama/llama/llama.go:311 +0x2d6 fp=0xc000204dc8 sp=0xc000204c78 pc=0x5693bcb55b56 github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc0001461b0, {0x2b, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc00011e1b0, 0x0}, ...) github.com/ollama/ollama/llama/runner/runner.go:859 +0xc5 fp=0xc000204f10 sp=0xc000204dc8 pc=0x5693bccf1c25 github.com/ollama/ollama/llama/runner.Execute.gowrap1() github.com/ollama/ollama/llama/runner/runner.go:979 +0xda fp=0xc000204fe0 sp=0xc000204f10 pc=0x5693bccf357a runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000204fe8 sp=0xc000204fe0 pc=0x5693bcab63a1 created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1 github.com/ollama/ollama/llama/runner/runner.go:979 +0xd0d goroutine 1 gp=0xc0000061c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc0000637b0 sp=0xc000063790 pc=0x5693bcaae76e runtime.netpollblock(0xc000063800?, 0xbca46fc6?, 0x93?) runtime/netpoll.go:575 +0xf7 fp=0xc0000637e8 sp=0xc0000637b0 pc=0x5693bca734d7 internal/poll.runtime_pollWait(0x712f89fca730, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000063808 sp=0xc0000637e8 pc=0x5693bcaada65 internal/poll.(*pollDesc).wait(0xc000190100?, 0x2c?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000063830 sp=0xc000063808 pc=0x5693bcb038a7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000190100) internal/poll/fd_unix.go:620 +0x295 fp=0xc0000638d8 sp=0xc000063830 pc=0x5693bcb04e15 net.(*netFD).accept(0xc000190100) net/fd_unix.go:172 +0x29 fp=0xc000063990 sp=0xc0000638d8 pc=0x5693bcb7d7a9 net.(*TCPListener).accept(0xc00012e6c0) net/tcpsock_posix.go:159 +0x1e fp=0xc0000639e0 sp=0xc000063990 pc=0x5693bcb8ddfe net.(*TCPListener).Accept(0xc00012e6c0) net/tcpsock.go:372 +0x30 fp=0xc000063a10 sp=0xc0000639e0 pc=0x5693bcb8d130 net/http.(*onceCloseListener).Accept(0xc000146240?) <autogenerated>:1 +0x24 fp=0xc000063a28 sp=0xc000063a10 pc=0x5693bcccbd04 net/http.(*Server).Serve(0xc00018e4b0, {0x5693bd0cbeb8, 0xc00012e6c0}) net/http/server.go:3330 +0x30c fp=0xc000063b58 sp=0xc000063a28 pc=0x5693bccbda4c github.com/ollama/ollama/llama/runner.Execute({0xc000132010?, 0x5693bcab5ffc?, 0x0?}) github.com/ollama/ollama/llama/runner/runner.go:1005 +0x11a9 fp=0xc000063ef8 sp=0xc000063b58 pc=0x5693bccf3149 main.main() github.com/ollama/ollama/cmd/runner/main.go:11 +0x54 fp=0xc000063f50 sp=0xc000063ef8 pc=0x5693bccf40d4 runtime.main() runtime/proc.go:272 +0x29d fp=0xc000063fe0 sp=0xc000063f50 pc=0x5693bca7aabd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000063fe8 sp=0xc000063fe0 pc=0x5693bcab63a1 goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000098fa8 sp=0xc000098f88 pc=0x5693bcaae76e runtime.goparkunlock(...) runtime/proc.go:430 runtime.forcegchelper() runtime/proc.go:337 +0xb8 fp=0xc000098fe0 sp=0xc000098fa8 pc=0x5693bca7adf8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000098fe8 sp=0xc000098fe0 pc=0x5693bcab63a1 created by runtime.init.7 in goroutine 1 runtime/proc.go:325 +0x1a goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000099780 sp=0xc000099760 pc=0x5693bcaae76e runtime.goparkunlock(...) runtime/proc.go:430 runtime.bgsweep(0xc000026400) runtime/mgcsweep.go:277 +0x94 fp=0xc0000997c8 sp=0xc000099780 pc=0x5693bca65634 runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000997e0 sp=0xc0000997c8 pc=0x5693bca59ee5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000997e8 sp=0xc0000997e0 pc=0x5693bcab63a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: runtime.gopark(0xc000026400?, 0x5693bcfb8fc0?, 0x1?, 0x0?, 0xc000007340?) runtime/proc.go:424 +0xce fp=0xc000099f78 sp=0xc000099f58 pc=0x5693bcaae76e runtime.goparkunlock(...) runtime/proc.go:430 runtime.(*scavengerState).park(0x5693bd2b6380) runtime/mgcscavenge.go:425 +0x49 fp=0xc000099fa8 sp=0xc000099f78 pc=0x5693bca63069 runtime.bgscavenge(0xc000026400) runtime/mgcscavenge.go:653 +0x3c fp=0xc000099fc8 sp=0xc000099fa8 pc=0x5693bca635dc runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000099fe0 sp=0xc000099fc8 pc=0x5693bca59e85 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000099fe8 sp=0xc000099fe0 pc=0x5693bcab63a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 18 gp=0xc000104700 m=nil [finalizer wait]: runtime.gopark(0xc000098648?, 0x5693bca503e5?, 0xb0?, 0x1?, 0xc0000061c0?) runtime/proc.go:424 +0xce fp=0xc000098620 sp=0xc000098600 pc=0x5693bcaae76e runtime.runfinq() runtime/mfinal.go:193 +0x107 fp=0xc0000987e0 sp=0xc000098620 pc=0x5693bca58f67 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000987e8 sp=0xc0000987e0 pc=0x5693bcab63a1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:163 +0x3d goroutine 19 gp=0xc0001048c0 m=nil [chan receive]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000094718 sp=0xc0000946f8 pc=0x5693bcaae76e runtime.chanrecv(0xc0001120e0, 0x0, 0x1) runtime/chan.go:639 +0x41c fp=0xc000094790 sp=0xc000094718 pc=0x5693bca49bbc runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:489 +0x12 fp=0xc0000947b8 sp=0xc000094790 pc=0x5693bca49792 runtime.unique_runtime_registerUniqueMapCleanup.func1(...) runtime/mgc.go:1781 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1784 +0x2f fp=0xc0000947e0 sp=0xc0000947b8 pc=0x5693bca5cd4f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000947e8 sp=0xc0000947e0 pc=0x5693bcab63a1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1779 +0x96 goroutine 21 gp=0xc000104c40 m=nil [semacquire]: runtime.gopark(0x0?, 0x0?, 0x20?, 0x81?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000095618 sp=0xc0000955f8 pc=0x5693bcaae76e runtime.goparkunlock(...) runtime/proc.go:430 runtime.semacquire1(0xc0001461b8, 0x0, 0x1, 0x0, 0x12) runtime/sema.go:178 +0x22c fp=0xc000095680 sp=0xc000095618 pc=0x5693bca8da8c sync.runtime_Semacquire(0x0?) runtime/sema.go:71 +0x25 fp=0xc0000956b8 sp=0xc000095680 pc=0x5693bcaaf9a5 sync.(*WaitGroup).Wait(0x0?) sync/waitgroup.go:118 +0x48 fp=0xc0000956e0 sp=0xc0000956b8 pc=0x5693bcacbc48 github.com/ollama/ollama/llama/runner.(*Server).run(0xc0001461b0, {0x5693bd0cc4a0, 0xc000196050}) github.com/ollama/ollama/llama/runner/runner.go:315 +0x47 fp=0xc0000957b8 sp=0xc0000956e0 pc=0x5693bccee2c7 github.com/ollama/ollama/llama/runner.Execute.gowrap2() github.com/ollama/ollama/llama/runner/runner.go:984 +0x28 fp=0xc0000957e0 sp=0xc0000957b8 pc=0x5693bccf3468 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000957e8 sp=0xc0000957e0 pc=0x5693bcab63a1 created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1 github.com/ollama/ollama/llama/runner/runner.go:984 +0xde5 goroutine 22 gp=0xc000105340 m=nil [IO wait]: runtime.gopark(0xc0002a6000?, 0xc000185958?, 0x3e?, 0x1?, 0xb?) runtime/proc.go:424 +0xce fp=0xc000185918 sp=0xc0001858f8 pc=0x5693bcaae76e runtime.netpollblock(0x5693bcae9f98?, 0xbca46fc6?, 0x93?) runtime/netpoll.go:575 +0xf7 fp=0xc000185950 sp=0xc000185918 pc=0x5693bca734d7 internal/poll.runtime_pollWait(0x712f89fca618, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000185970 sp=0xc000185950 pc=0x5693bcaada65 internal/poll.(*pollDesc).wait(0xc000190180?, 0xc0001b8000?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000185998 sp=0xc000185970 pc=0x5693bcb038a7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000190180, {0xc0001b8000, 0x1000, 0x1000}) internal/poll/fd_unix.go:165 +0x27a fp=0xc000185a30 sp=0xc000185998 pc=0x5693bcb043fa net.(*netFD).Read(0xc000190180, {0xc0001b8000?, 0xc000185aa0?, 0x5693bcb03d65?}) net/fd_posix.go:55 +0x25 fp=0xc000185a78 sp=0xc000185a30 pc=0x5693bcb7c6c5 net.(*conn).Read(0xc000124098, {0xc0001b8000?, 0x0?, 0xc00012d058?}) net/net.go:189 +0x45 fp=0xc000185ac0 sp=0xc000185a78 pc=0x5693bcb860c5 net.(*TCPConn).Read(0xc00012d050?, {0xc0001b8000?, 0xc000190180?, 0xc000185af8?}) <autogenerated>:1 +0x25 fp=0xc000185af0 sp=0xc000185ac0 pc=0x5693bcb93165 net/http.(*connReader).Read(0xc00012d050, {0xc0001b8000, 0x1000, 0x1000}) net/http/server.go:798 +0x14b fp=0xc000185b40 sp=0xc000185af0 pc=0x5693bccb434b bufio.(*Reader).fill(0xc000130480) bufio/bufio.go:110 +0x103 fp=0xc000185b78 sp=0xc000185b40 pc=0x5693bcc72f63 bufio.(*Reader).Peek(0xc000130480, 0x4) bufio/bufio.go:148 +0x53 fp=0xc000185b98 sp=0xc000185b78 pc=0x5693bcc73093 net/http.(*conn).serve(0xc000146240, {0x5693bd0cc468, 0xc00012cf60}) net/http/server.go:2127 +0x738 fp=0xc000185fb8 sp=0xc000185b98 pc=0x5693bccb9698 net/http.(*Server).Serve.gowrap3() net/http/server.go:3360 +0x28 fp=0xc000185fe0 sp=0xc000185fb8 pc=0x5693bccbde48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000185fe8 sp=0xc000185fe0 pc=0x5693bcab63a1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3360 +0x485 rax 0x712ed72c8ad0 rbx 0x712ed72ced40 rcx 0x713070da6663 rdx 0x712ed4005130 rdi 0x712ed72ced40 rsi 0x3 rbp 0x712edbff61d0 rsp 0x712edbff61a0 r8 0x0 r9 0x0 r10 0x4 r11 0xa66e143e45c2eb86 r12 0x0 r13 0x18 r14 0xffffffffffffffc0 r15 0x712dc3ef8e80 rip 0x713070f0fe2b rflags 0x10206 cs 0x33 fs 0x0 gs 0x0 time=2024-12-28T21:17:03.119+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" time=2024-12-28T21:17:03.370+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 2" [GIN] 2024/12/28 - 21:17:03 | 500 | 2.421283447s | 127.0.0.1 | POST "/api/generate" ``` I---------------- Following is my packages info: ``` ╰──➤ $ 1 pacman -Q |grep 'ollama\|rocm' ollama 0.5.4-1 ollama-rocm 0.5.4-1 python-pytorch-rocm 2.5.1-7 rocm-clang-ocl 6.1.2-1 rocm-cmake 6.2.4-1 rocm-core 6.2.4-2 rocm-device-libs 6.2.4-1 rocm-hip-libraries 6.2.2-1 rocm-hip-runtime 6.2.2-1 rocm-hip-sdk 6.2.2-1 rocm-language-runtime 6.2.2-1 rocm-llvm 6.2.4-1 rocm-opencl-runtime 6.2.4-1 rocm-opencl-sdk 6.2.2-1 rocm-smi-lib 6.2.4-1 rocminfo 6.2.4-1 ``` Thanks ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.5.4-1 tested both on arch linux installed version and github release page downloaded version. It works before and broken after i update my arch linux before create this issue.
GiteaMirror added the bug label 2026-04-12 16:27:26 -05:00
Author
Owner

@copycraft commented on GitHub (Dec 28, 2024):

i see ou use arch btw. chad

<!-- gh-comment-id:2564328681 --> @copycraft commented on GitHub (Dec 28, 2024): i see ou use arch btw. chad
Author
Owner

@zw963 commented on GitHub (Dec 28, 2024):

I tried remove the following arch package:

ollama 0.5.4-1
ollama-rocm 0.5.4-1

then installed from our github release page, then run downloaded save version ollama, get exactly same issue.

and, following is my upgrade log: (before upgrade it, ollama work well)

[2024-12-28T21:05:50+0800] [ALPM] upgraded python-pytorch-rocm (2.5.1-4 -> 2.5.1-7)
[2024-12-28T21:05:44+0800] [ALPM] upgraded ollama-rocm (0.5.2-1 -> 0.5.4-1)
[2024-12-28T21:05:43+0800] [ALPM] upgraded ollama (0.5.2-1 -> 0.5.4-1)

So, i tried download the 0.5.2 from github release page, run it instead, but still not work.

Thanks.

<!-- gh-comment-id:2564333447 --> @zw963 commented on GitHub (Dec 28, 2024): I tried remove the following arch package: ollama 0.5.4-1 ollama-rocm 0.5.4-1 then installed from our github release page, then run downloaded save version ollama, get exactly same issue. and, following is my upgrade log: (before upgrade it, ollama work well) ``` [2024-12-28T21:05:50+0800] [ALPM] upgraded python-pytorch-rocm (2.5.1-4 -> 2.5.1-7) [2024-12-28T21:05:44+0800] [ALPM] upgraded ollama-rocm (0.5.2-1 -> 0.5.4-1) [2024-12-28T21:05:43+0800] [ALPM] upgraded ollama (0.5.2-1 -> 0.5.4-1) ``` So, i tried download the 0.5.2 from github release page, run it instead, but still not work. Thanks.
Author
Owner

@zw963 commented on GitHub (Jan 5, 2025):

Okay, i can confirm this issue caused by 6.12 linux kernel.

https://gitlab.freedesktop.org/drm/amd/-/issues/3821

For arch linux user, use lts version linux or use an older 6.11 version linux kernel can workaround it.

image

<!-- gh-comment-id:2571703020 --> @zw963 commented on GitHub (Jan 5, 2025): Okay, i can confirm this issue caused by 6.12 linux kernel. https://gitlab.freedesktop.org/drm/amd/-/issues/3821 For arch linux user, use lts version linux or use an older 6.11 version linux kernel can workaround it. ![image](https://github.com/user-attachments/assets/f4d658ae-ca48-4584-abf2-f076d87ade0a)
Author
Owner

@dlasky commented on GitHub (Jan 17, 2025):

have the same issue on arch with 6.12

<!-- gh-comment-id:2597192138 --> @dlasky commented on GitHub (Jan 17, 2025): have the same issue on arch with 6.12
Author
Owner

@waltercool commented on GitHub (Jan 30, 2025):

Same issue on Gentoo with 6.13, but using my 7700S dGPU instead.

It works fine when using my 780M iGPU

<!-- gh-comment-id:2625799750 --> @waltercool commented on GitHub (Jan 30, 2025): Same issue on Gentoo with 6.13, but using my 7700S dGPU instead. It works fine when using my 780M iGPU
Author
Owner

@Cyanic commented on GitHub (Jan 31, 2025):

Fedora 41 6.12.11-200.fc41 780M iGPU same error.

Anyone resolve this?

<!-- gh-comment-id:2628512282 --> @Cyanic commented on GitHub (Jan 31, 2025): Fedora 41 6.12.11-200.fc41 780M iGPU same error. Anyone resolve this?
Author
Owner

@zw963 commented on GitHub (Feb 2, 2025):

Fedora 41 6.12.11-200.fc41 780M iGPU same error.

Anyone resolve this?

Check my post for answer.

https://github.com/ollama/ollama/issues/8262#issuecomment-2571703020

<!-- gh-comment-id:2629225449 --> @zw963 commented on GitHub (Feb 2, 2025): > Fedora 41 6.12.11-200.fc41 780M iGPU same error. > > Anyone resolve this? Check my post for answer. https://github.com/ollama/ollama/issues/8262#issuecomment-2571703020
Author
Owner

@kiolygenius commented on GitHub (Feb 6, 2025):

Fedora 41 6.12.11-200.fc41 780M iGPU same error.
Anyone resolve this?

Check my post for answer.

#8262 (comment)

Now archlinux's linux-lts package is 6.12.

<!-- gh-comment-id:2639582443 --> @kiolygenius commented on GitHub (Feb 6, 2025): > > Fedora 41 6.12.11-200.fc41 780M iGPU same error. > > Anyone resolve this? > > Check my post for answer. > > [#8262 (comment)](https://github.com/ollama/ollama/issues/8262#issuecomment-2571703020) Now archlinux's linux-lts package is 6.12.
Author
Owner

@zw963 commented on GitHub (Feb 7, 2025):

Fedora 41 6.12.11-200.fc41 780M iGPU same error.
Anyone resolve this?

Check my post for answer.
#8262 (comment)

Now archlinux's linux-lts package is 6.12.

I still use linux-amd package in arch linux, it still use kernel 6.11

<!-- gh-comment-id:2643660781 --> @zw963 commented on GitHub (Feb 7, 2025): > > > Fedora 41 6.12.11-200.fc41 780M iGPU same error. > > > Anyone resolve this? > > > > > > Check my post for answer. > > [#8262 (comment)](https://github.com/ollama/ollama/issues/8262#issuecomment-2571703020) > > Now archlinux's linux-lts package is 6.12. I still use `linux-amd` package in arch linux, it still use kernel 6.11
Author
Owner

@Cyanic commented on GitHub (Feb 10, 2025):

Can confirm that downgrading from 6.12 to kernel 6.11 worked.

<!-- gh-comment-id:2648503550 --> @Cyanic commented on GitHub (Feb 10, 2025): Can confirm that downgrading from 6.12 to kernel 6.11 worked.
Author
Owner

@zw963 commented on GitHub (Feb 15, 2025):

After reference this reply, I can confirm 780M works with kernel 6.12, 6.1.3 use 100% GPU.

 ╰──➤ $ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL
deepseek-r1:14b    ea35dfe18182    9.9 GB    100% GPU     Forever

 ╭─ 16:35  zw963  ~  ➦ ruby-3.3.5 ➦ crystal system ➦ elixir 1.17.3-otp-27 ➦ elixir-ls 0.23.0
 ╰──➤ $ uname -a
Linux mingfan 6.12.13-x64v2-xanmod1-1 #1 SMP PREEMPT_DYNAMIC Mon, 10 Feb 2025 04:36:57 +0000 x86_64 GNU/Linux

The KEY POINT is: you must set HSA_OVERRIDE_GFX_VERSION=11.0.2 instead of 11.0.0 or 11.0.1.

<!-- gh-comment-id:2660817494 --> @zw963 commented on GitHub (Feb 15, 2025): After reference [this reply](https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-12207487), I can confirm 780M works with kernel 6.12, 6.1.3 use 100% GPU. ``` ╰──➤ $ ollama ps NAME ID SIZE PROCESSOR UNTIL deepseek-r1:14b ea35dfe18182 9.9 GB 100% GPU Forever ╭─ 16:35 zw963  ~  ➦ ruby-3.3.5 ➦ crystal system ➦ elixir 1.17.3-otp-27 ➦ elixir-ls 0.23.0 ╰──➤ $ uname -a Linux mingfan 6.12.13-x64v2-xanmod1-1 #1 SMP PREEMPT_DYNAMIC Mon, 10 Feb 2025 04:36:57 +0000 x86_64 GNU/Linux ``` The `KEY POINT` is: you must set `HSA_OVERRIDE_GFX_VERSION=11.0.2` instead of `11.0.0` or `11.0.1`.
Author
Owner

@zw963 commented on GitHub (Jun 11, 2025):

The KEY POINT is: you must set HSA_OVERRIDE_GFX_VERSION=11.0.2 instead of 11.0.0 or 11.0.1.

This issue fixed now, you can set 11.0.0 instead.

<!-- gh-comment-id:2961502006 --> @zw963 commented on GitHub (Jun 11, 2025): > The `KEY POINT` is: you must set `HSA_OVERRIDE_GFX_VERSION=11.0.2` instead of `11.0.0` or `11.0.1`. This issue fixed now, you can set 11.0.0 instead.
Author
Owner

@Mubelotix commented on GitHub (Nov 13, 2025):

After reference this reply, I can confirm 780M works with kernel 6.12, 6.1.3 use 100% GPU.

 ╰──➤ $ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL
deepseek-r1:14b    ea35dfe18182    9.9 GB    100% GPU     Forever

 ╭─ 16:35  zw963  ~  ➦ ruby-3.3.5 ➦ crystal system ➦ elixir 1.17.3-otp-27 ➦ elixir-ls 0.23.0
 ╰──➤ $ uname -a
Linux mingfan 6.12.13-x64v2-xanmod1-1 #1 SMP PREEMPT_DYNAMIC Mon, 10 Feb 2025 04:36:57 +0000 x86_64 GNU/Linux

The KEY POINT is: you must set HSA_OVERRIDE_GFX_VERSION=11.0.2 instead of 11.0.0 or 11.0.1.

Thank you, I owe you

This wasn't fixed btw, this workaround is still required

<!-- gh-comment-id:3529204939 --> @Mubelotix commented on GitHub (Nov 13, 2025): > After reference [this reply](https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-12207487), I can confirm 780M works with kernel 6.12, 6.1.3 use 100% GPU. > > ``` > ╰──➤ $ ollama ps > NAME ID SIZE PROCESSOR UNTIL > deepseek-r1:14b ea35dfe18182 9.9 GB 100% GPU Forever > > ╭─ 16:35 zw963  ~  ➦ ruby-3.3.5 ➦ crystal system ➦ elixir 1.17.3-otp-27 ➦ elixir-ls 0.23.0 > ╰──➤ $ uname -a > Linux mingfan 6.12.13-x64v2-xanmod1-1 #1 SMP PREEMPT_DYNAMIC Mon, 10 Feb 2025 04:36:57 +0000 x86_64 GNU/Linux > ``` > > The `KEY POINT` is: you must set `HSA_OVERRIDE_GFX_VERSION=11.0.2` instead of `11.0.0` or `11.0.1`. Thank you, I owe you This wasn't fixed btw, this workaround is still required
Author
Owner

@zw963 commented on GitHub (Jan 12, 2026):

Just for clarify, HSA_OVERRIDE_GFX_VERSION is not needed since ROCm 7, 780M was supported, like a charm!

<!-- gh-comment-id:3737316663 --> @zw963 commented on GitHub (Jan 12, 2026): Just for clarify, HSA_OVERRIDE_GFX_VERSION is not needed since ROCm 7, 780M was supported, like a charm!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5282