[GH-ISSUE #9605] EXAONE fails to run with quantized KV cache #32026

Closed
opened 2026-04-22 12:54:08 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @AdamNiederer on GitHub (Mar 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9605

Originally assigned to: @jessegross on GitHub.

What is the issue?

$ OLLAMA_KV_CACHE_TYPE=q8_0 OLLAMA_FLASH_ATTENTION=1 ollama serve
$ ollama run exaone3.5:2.4b-instruct-q4_K_M
pulling manifest 
pulling d6e077ce2bb2... 100% ▕█████ ▏ 1.6 GB/1.6 GB
pulling 37cddd3bd818... 100% ▕██████▏  375 B                         
pulling 8cd06db3b613... 100% ▕██████▏   62 B                         
pulling 294fd63925d8... 100% ▕██████▏  13 KB                         
pulling a64d9e642d7b... 100% ▕██████▏   62 B                         
pulling 1cb9297f8af3... 100% ▕██████▏  563 B                         
verifying sha256 digest 
writing manifest 
success 
Error: llama runner process has terminated: GGML_ASSERT(hparams.n_embd_head_k % ggml_blck_size(type_k) == 0) failed

Also tried with bartowski's IQ4_XS quant, same error. Runs fine without flash attention.

Relevant log output

time=2025-03-08T21:22:53.060-05:00 level=WARN source=ggml.go:136 msg="key not found" key=exaone.attention.value_length default=80
time=2025-03-08T21:22:53.060-05:00 level=INFO source=server.go:182 msg="enabling flash attention"
time=2025-03-08T21:22:53.060-05:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/bin/ollama runner --model /var/lib/ollama/blobs/sha25>
time=2025-03-08T21:22:53.060-05:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-08T21:22:53.060-05:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-08T21:22:53.061-05:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-08T21:22:53.065-05:00 level=INFO source=runner.go:931 msg="starting go runner"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XT, gfx1100 (0x1100), VMM: no, Wave Size: 32
load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2025-03-08T21:22:53.766-05:00 level=INFO source=runner.go:934 msg=system info="CPU : LLAMAFILE = 1 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU >
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon RX 7900 XT) - 20346 MiB free
time=2025-03-08T21:22:53.766-05:00 level=INFO source=runner.go:992 msg="Server listening on 127.0.0.1:34459"
llama_model_loader: loaded meta data with 32 key-value pairs and 274 tensors from /var/lib/ollama/blobs/sha256-d6e077ce2bb2d36ad179739ea96d8b8d387f024b0bedede>
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = exaone
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = EXAONE 3.5 2.4B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = EXAONE-3.5
llama_model_loader: - kv   5:                         general.size_label str              = 2.4B
llama_model_loader: - kv   6:                            general.license str              = other
llama_model_loader: - kv   7:                       general.license.name str              = exaone
llama_model_loader: - kv   8:                       general.license.link str              = LICENSE
llama_model_loader: - kv   9:                               general.tags arr[str,4]       = ["lg-ai", "exaone", "exaone-3.5", "te...
llama_model_loader: - kv  10:                          general.languages arr[str,2]       = ["en", "ko"]
llama_model_loader: - kv  11:                    exaone.embedding_length u32              = 2560
llama_model_loader: - kv  12:                exaone.attention.head_count u32              = 32
llama_model_loader: - kv  13:             exaone.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                      exaone.context_length u32              = 32768
llama_model_loader: - kv  15:    exaone.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                 exaone.feed_forward_length u32              = 7168
llama_model_loader: - kv  17:                         exaone.block_count u32              = 30
llama_model_loader: - kv  18:                          general.file_type u32              = 15
llama_model_loader: - kv  19:                      exaone.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  20:                exaone.rope.dimension_count u32              = 80
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = exaone
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,102400]  = ["[PAD]", "[BOS]", "[EOS]", "[UNK]", ...
llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,102400]  = [3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, ...
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,101782]  = ["t h", "Ġ a", "Ġ í", "i n", "Ġ t...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 361
llama_model_loader: - kv  28:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  30:                    tokenizer.chat_template str              = {% for message in messages %}{% if lo...
llama_model_loader: - kv  31:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   62 tensors
llama_model_loader: - type q4_K:  183 tensors
llama_model_loader: - type q6_K:   29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 1.53 GiB (4.92 BPW)
time=2025-03-08T21:22:53.813-05:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 362
load: token to piece cache size = 0.6622 MB
print_info: arch             = exaone
print_info: vocab_only       = 0
print_info: n_ctx_train      = 32768
print_info: n_embd           = 2560
print_info: n_layer          = 30
print_info: n_head           = 32
print_info: n_head_kv        = 8
print_info: n_rot            = 80
print_info: n_swa            = 0
print_info: n_embd_head_k    = 80
print_info: n_embd_head_v    = 80
print_info: n_gqa            = 4
print_info: n_embd_k_gqa     = 640
print_info: n_embd_v_gqa     = 640
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: n_ff             = 7168
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 32768
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = ?B
print_info: model params     = 2.67 B
print_info: general.name     = EXAONE 3.5 2.4B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 102400
print_info: n_merges         = 101782
print_info: BOS token        = 1 '[BOS]'
print_info: EOS token        = 361 '[|endofturn|]'
print_info: EOT token        = 42 '<|endoftext|>'
print_info: UNK token        = 3 '[UNK]'
print_info: PAD token        = 0 '[PAD]'
print_info: LF token         = 560 'Ċ'
print_info: EOG token        = 42 '<|endoftext|>'
print_info: EOG token        = 361 '[|endofturn|]'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 30 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 31/31 layers to GPU
load_tensors:        ROCm0 model buffer size =  1424.09 MiB
load_tensors:   CPU_Mapped model buffer size =   140.62 MiB
llama_init_from_model: n_seq_max     = 4
llama_init_from_model: n_ctx         = 8192
llama_init_from_model: n_ctx_per_seq = 2048
llama_init_from_model: n_batch       = 2048
llama_init_from_model: n_ubatch      = 512
llama_init_from_model: flash_attn    = 1
llama_init_from_model: freq_base     = 1000000.0
llama_init_from_model: freq_scale    = 1
llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama.cpp:10094: GGML_ASSERT(hparams.n_embd_head_k % ggml_blck_size(type_k) == 0) failed
ptrace: Operation not permitted.
No stack.
The program is not being run.
SIGABRT: abort
PC=0x799f393ad624 m=7 sigcode=18446744073709551610
signal arrived during cgo execution
goroutine 10 gp=0xc000504700 m=7 mp=0xc000100808 [syscall]:
runtime.cgocall(0x578fcc395310, 0xc000093c28)
        /usr/lib/go/src/runtime/cgocall.go:167 +0x4b fp=0xc000093c00 sp=0xc000093bc8 pc=0x578fcb742bcb
github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x799ed4000c50, {0x2000, 0x800, 0x200, 0x4, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...})
        _cgo_gotypes.go:616 +0x4e fp=0xc000093c28 sp=0xc000093c00 pc=0x578fcbac8cce
github.com/ollama/ollama/llama.NewContextWithModel.func1(...)
        /build/ollama/src/ollama/llama/llama.go:279
github.com/ollama/ollama/llama.NewContextWithModel(0xc0003d0028, {{0x2000, 0x800, 0x200, 0x4, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...}})
        /build/ollama/src/ollama/llama/llama.go:279 +0x158 fp=0xc000093dc8 sp=0xc000093c28 pc=0x578fcbacc5f8
github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0004f22d0, {0x1f, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc000318510, 0x0}, ...)
        /build/ollama/src/ollama/runner/llamarunner/runner.go:855 +0x178 fp=0xc000093f10 sp=0xc000093dc8 pc=0x578fcbae74f8
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
        /build/ollama/src/ollama/runner/llamarunner/runner.go:966 +0xda fp=0xc000093fe0 sp=0xc000093f10 pc=0x578fcbae8cda
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x578fcb74d5e1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        /build/ollama/src/ollama/runner/llamarunner/runner.go:966 +0xcb7
goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00050f5b8 sp=0xc00050f598 pc=0x578fcb745eae
runtime.netpollblock(0xc00050f608?, 0xcb6df7e6?, 0x8f?)
        /usr/lib/go/src/runtime/netpoll.go:575 +0xf7 fp=0xc00050f5f0 sp=0xc00050f5b8 pc=0x578fcb70acb7
internal/poll.runtime_pollWait(0x799ef20c6eb0, 0x72)
        /usr/lib/go/src/runtime/netpoll.go:351 +0x85 fp=0xc00050f610 sp=0xc00050f5f0 pc=0x578fcb7450c5
internal/poll.(*pollDesc).wait(0xc00004f600?, 0x900000036?, 0x0)
        /usr/lib/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00050f638 sp=0xc00050f610 pc=0x578fcb7cc547
internal/poll.(*pollDesc).waitRead(...)
        /usr/lib/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00004f600)
        /usr/lib/go/src/internal/poll/fd_unix.go:620 +0x295 fp=0xc00050f6e0 sp=0xc00050f638 pc=0x578fcb7d1915
net.(*netFD).accept(0xc00004f600)
        /usr/lib/go/src/net/fd_unix.go:172 +0x29 fp=0xc00050f798 sp=0xc00050f6e0 pc=0x578fcb843d89
net.(*TCPListener).accept(0xc0002eeac0)
        /usr/lib/go/src/net/tcpsock_posix.go:159 +0x1b fp=0xc00050f7e8 sp=0xc00050f798 pc=0x578fcb85973b
net.(*TCPListener).Accept(0xc0002eeac0)
        /usr/lib/go/src/net/tcpsock.go:380 +0x30 fp=0xc00050f818 sp=0xc00050f7e8 pc=0x578fcb8585f0
net/http.(*onceCloseListener).Accept(0xc0004f23f0?)
        <autogenerated>:1 +0x24 fp=0xc00050f830 sp=0xc00050f818 pc=0x578fcba6f4a4
net/http.(*Server).Serve(0xc000126f00, {0x578fcca0cbe8, 0xc0002eeac0})
        /usr/lib/go/src/net/http/server.go:3424 +0x30c fp=0xc00050f960 sp=0xc00050f830 pc=0x578fcba46d6c
github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034160, 0x11, 0x12})
        /build/ollama/src/ollama/runner/llamarunner/runner.go:993 +0x116a fp=0xc00050fd08 sp=0xc00050f960 pc=0x578fcbae890a
github.com/ollama/ollama/runner.Execute({0xc000034150?, 0x0?, 0x0?})
        /build/ollama/src/ollama/runner/runner.go:22 +0xd4 fp=0xc00050fd30 sp=0xc00050fd08 pc=0x578fcbd12ff4
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000126d00?, {0x578fcc58e195?, 0x4?, 0x578fcc58e199?})
        /build/ollama/src/ollama/cmd/cmd.go:1281 +0x45 fp=0xc00050fd58 sp=0xc00050fd30 pc=0x578fcc328485
github.com/spf13/cobra.(*Command).execute(0xc0004f6f08, {0xc0004ec900, 0x11, 0x12})
        /build/ollama/src/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00050fe78 sp=0xc00050fd58 pc=0x578fcb8bd01c
github.com/spf13/cobra.(*Command).ExecuteC(0xc0004c6908)
        /build/ollama/src/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00050ff30 sp=0xc00050fe78 pc=0x578fcb8bd865
github.com/spf13/cobra.(*Command).Execute(...)
        /build/ollama/src/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        /build/ollama/src/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        /build/ollama/src/ollama/main.go:12 +0x4d fp=0xc00050ff50 sp=0xc00050ff30 pc=0x578fcc3287ed
runtime.main()
        /usr/lib/go/src/runtime/proc.go:283 +0x29d fp=0xc00050ffe0 sp=0xc00050ff50 pc=0x578fcb7122bd
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00050ffe8 sp=0xc00050ffe0 pc=0x578fcb74d5e1
goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x578fcb745eae
runtime.goparkunlock(...)
        /usr/lib/go/src/runtime/proc.go:441
runtime.forcegchelper()
        /usr/lib/go/src/runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x578fcb7125f8
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x578fcb74d5e1
created by runtime.init.7 in goroutine 1
        /usr/lib/go/src/runtime/proc.go:336 +0x1a
goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x578fcb745eae
runtime.goparkunlock(...)
        /usr/lib/go/src/runtime/proc.go:441
runtime.bgsweep(0xc0000aa000)
        /usr/lib/go/src/runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x578fcb6fce1f
runtime.gcenable.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x578fcb6f1205
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x578fcb74d5e1
created by runtime.gcenable in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:204 +0x66
goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x578fcc740b18?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x578fcb745eae
runtime.goparkunlock(...)
        /usr/lib/go/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x578fcd268240)
        /usr/lib/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x578fcb6fa869
runtime.bgscavenge(0xc0000aa000)
        /usr/lib/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x578fcb6fadf9
runtime.gcenable.gowrap2()
        /usr/lib/go/src/runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x578fcb6f11a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x578fcb74d5e1
created by runtime.gcenable in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:205 +0xa5
goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x578fcb745eae
runtime.runfinq()
        /usr/lib/go/src/runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x578fcb6f01c7
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x578fcb74d5e1
created by runtime.createfing in goroutine 1
        /usr/lib/go/src/runtime/mfinal.go:166 +0x3d
goroutine 6 gp=0xc0001de8c0 m=nil [chan receive]:
runtime.gopark(0xc000233860?, 0xc000610018?, 0x60?, 0x67?, 0x578fcb82aac8?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x578fcb745eae
runtime.chanrecv(0xc0000b8380, 0x0, 0x1)
        /usr/lib/go/src/runtime/chan.go:664 +0x445 fp=0xc000086790 sp=0xc000086718 pc=0x578fcb6e23c5
runtime.chanrecv1(0x0?, 0x0?)
        /usr/lib/go/src/runtime/chan.go:506 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x578fcb6e1f52
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        /usr/lib/go/src/runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1799 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x578fcb6f43af
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x578fcb74d5e1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1794 +0x85
goroutine 7 gp=0xc0001dec40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 34 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 8 gp=0xc0001dee00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 35 gp=0xc0001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 36 gp=0xc000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011b738 sp=0xc00011b718 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011b7c8 sp=0xc00011b738 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011b7e0 sp=0xc00011b7c8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011b7e8 sp=0xc00011b7e0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 37 gp=0xc000102c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011bf38 sp=0xc00011bf18 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011bfc8 sp=0xc00011bf38 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011bfe0 sp=0xc00011bfc8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011bfe8 sp=0xc00011bfe0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 38 gp=0xc000102e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011c738 sp=0xc00011c718 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011c7c8 sp=0xc00011c738 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011c7e0 sp=0xc00011c7c8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011c7e8 sp=0xc00011c7e0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 39 gp=0xc000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011cf38 sp=0xc00011cf18 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011cfc8 sp=0xc00011cf38 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011cfe0 sp=0xc00011cfc8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011cfe8 sp=0xc00011cfe0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 40 gp=0xc000103180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011d738 sp=0xc00011d718 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011d7c8 sp=0xc00011d738 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011d7e0 sp=0xc00011d7c8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011d7e8 sp=0xc00011d7e0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 41 gp=0xc000103340 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011df38 sp=0xc00011df18 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011dfc8 sp=0xc00011df38 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011dfe0 sp=0xc00011dfc8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 42 gp=0xc000103500 m=nil [GC worker (idle)]:
runtime.gopark(0x15b928f66cba?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000116738 sp=0xc000116718 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0001167c8 sp=0xc000116738 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0001167e0 sp=0xc0001167c8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0001167e8 sp=0xc0001167e0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 43 gp=0xc0001036c0 m=nil [GC worker (idle)]:
runtime.gopark(0x15b928f66c92?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000116f38 sp=0xc000116f18 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000116fc8 sp=0xc000116f38 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000116fe0 sp=0xc000116fc8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000116fe8 sp=0xc000116fe0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 44 gp=0xc000103880 m=nil [GC worker (idle)]:
runtime.gopark(0x15b928f609aa?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000117738 sp=0xc000117718 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0001177c8 sp=0xc000117738 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0001177e0 sp=0xc0001177c8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0001177e8 sp=0xc0001177e0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 9 gp=0xc0001defc0 m=nil [GC worker (idle)]:
runtime.gopark(0x15b928f60bee?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x578fcb745eae
runtime.gcBgMarkWorker(0xc0000b9960)
        /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x578fcb6f36c9
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x578fcb6f35a5
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x578fcb74d5e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/lib/go/src/runtime/mgc.go:1339 +0x105
goroutine 11 gp=0xc0005048c0 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0x0?, 0x0?, 0x60?, 0x0?, 0x0?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000119e18 sp=0xc000119df8 pc=0x578fcb745eae
runtime.goparkunlock(...)
        /usr/lib/go/src/runtime/proc.go:441
runtime.semacquire1(0xc0004f22d8, 0x0, 0x1, 0x0, 0x18)
        /usr/lib/go/src/runtime/sema.go:188 +0x229 fp=0xc000119e80 sp=0xc000119e18 pc=0x578fcb725889
sync.runtime_SemacquireWaitGroup(0x0?)
        /usr/lib/go/src/runtime/sema.go:110 +0x25 fp=0xc000119eb8 sp=0xc000119e80 pc=0x578fcb7478c5
sync.(*WaitGroup).Wait(0x0?)
        /usr/lib/go/src/sync/waitgroup.go:118 +0x48 fp=0xc000119ee0 sp=0xc000119eb8 pc=0x578fcb759048
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004f22d0, {0x578fcca0ee60, 0xc0000fdd10})
        /build/ollama/src/ollama/runner/llamarunner/runner.go:316 +0x47 fp=0xc000119fb8 sp=0xc000119ee0 pc=0x578fcbae40a7
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2()
        /build/ollama/src/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc000119fe0 sp=0xc000119fb8 pc=0x578fcbae8bc8
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000119fe8 sp=0xc000119fe0 pc=0x578fcb74d5e1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        /build/ollama/src/ollama/runner/llamarunner/runner.go:973 +0xd97
goroutine 12 gp=0xc000504a80 m=nil [IO wait]:
runtime.gopark(0x578fcb7cfb45?, 0xc00004f700?, 0x40?, 0xfa?, 0xb?)
        /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00030f948 sp=0xc00030f928 pc=0x578fcb745eae
runtime.netpollblock(0x578fcb769338?, 0xcb6df7e6?, 0x8f?)
        /usr/lib/go/src/runtime/netpoll.go:575 +0xf7 fp=0xc00030f980 sp=0xc00030f948 pc=0x578fcb70acb7
internal/poll.runtime_pollWait(0x799ef20c6d98, 0x72)
        /usr/lib/go/src/runtime/netpoll.go:351 +0x85 fp=0xc00030f9a0 sp=0xc00030f980 pc=0x578fcb7450c5
internal/poll.(*pollDesc).wait(0xc00004f700?, 0xc000187000?, 0x0)
        /usr/lib/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00030f9c8 sp=0xc00030f9a0 pc=0x578fcb7cc547
internal/poll.(*pollDesc).waitRead(...)
        /usr/lib/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00004f700, {0xc000187000, 0x1000, 0x1000})
        /usr/lib/go/src/internal/poll/fd_unix.go:165 +0x27a fp=0xc00030fa60 sp=0xc00030f9c8 pc=0x578fcb7cd83a
net.(*netFD).Read(0xc00004f700, {0xc000187000?, 0xc00030fad0?, 0x578fcb7cca05?})
        /usr/lib/go/src/net/fd_posix.go:55 +0x25 fp=0xc00030faa8 sp=0xc00030fa60 pc=0x578fcb841de5
net.(*conn).Read(0xc00007a940, {0xc000187000?, 0x0?, 0x0?})
        /usr/lib/go/src/net/net.go:194 +0x45 fp=0xc00030faf0 sp=0xc00030faa8 pc=0x578fcb8501a5
net/http.(*connReader).Read(0xc0004f4780, {0xc000187000, 0x1000, 0x1000})
        /usr/lib/go/src/net/http/server.go:798 +0x159 fp=0xc00030fb40 sp=0xc00030faf0 pc=0x578fcba3bc19
bufio.(*Reader).fill(0xc00043c6c0)
        /usr/lib/go/src/bufio/bufio.go:113 +0x103 fp=0xc00030fb78 sp=0xc00030fb40 pc=0x578fcb867943
bufio.(*Reader).Peek(0xc00043c6c0, 0x4)
        /usr/lib/go/src/bufio/bufio.go:152 +0x53 fp=0xc00030fb98 sp=0xc00030fb78 pc=0x578fcb867a73
net/http.(*conn).serve(0xc0004f23f0, {0x578fcca0ee28, 0xc0004f4660})
        /usr/lib/go/src/net/http/server.go:2137 +0x785 fp=0xc00030ffb8 sp=0xc00030fb98 pc=0x578fcba41a05
net/http.(*Server).Serve.gowrap3()
        /usr/lib/go/src/net/http/server.go:3454 +0x28 fp=0xc00030ffe0 sp=0xc00030ffb8 pc=0x578fcba47168
runtime.goexit({})
        /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00030ffe8 sp=0xc00030ffe0 pc=0x578fcb74d5e1
created by net/http.(*Server).Serve in goroutine 1
        /usr/lib/go/src/net/http/server.go:3454 +0x485
rax    0x0
rbx    0x8e93
rcx    0x799f393ad624
rdx    0x6
rdi    0x8e8d
rsi    0x8e93
rbp    0x799edbfbe9d0
rsp    0x799edbfbe990
r8     0x0
r9     0x0
r10    0x0
r11    0x246
r12    0x578fcc75e09a
r13    0x578fcc75e6aa
r14    0x6
r15    0x799ed66dab50
rip    0x799f393ad624
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
time=2025-03-08T21:22:54.227-05:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2"
time=2025-03-08T21:22:54.314-05:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: GGML_ASSERT(hp>
[GIN] 2025/03/08 - | 500 |  1.269503968s |       127.0.0.1 | POST     "/api/generate"

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.5.13

Originally created by @AdamNiederer on GitHub (Mar 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9605 Originally assigned to: @jessegross on GitHub. ### What is the issue? ``` $ OLLAMA_KV_CACHE_TYPE=q8_0 OLLAMA_FLASH_ATTENTION=1 ollama serve $ ollama run exaone3.5:2.4b-instruct-q4_K_M pulling manifest pulling d6e077ce2bb2... 100% ▕█████ ▏ 1.6 GB/1.6 GB pulling 37cddd3bd818... 100% ▕██████▏ 375 B pulling 8cd06db3b613... 100% ▕██████▏ 62 B pulling 294fd63925d8... 100% ▕██████▏ 13 KB pulling a64d9e642d7b... 100% ▕██████▏ 62 B pulling 1cb9297f8af3... 100% ▕██████▏ 563 B verifying sha256 digest writing manifest success Error: llama runner process has terminated: GGML_ASSERT(hparams.n_embd_head_k % ggml_blck_size(type_k) == 0) failed ``` Also tried with bartowski's IQ4_XS quant, same error. Runs fine without flash attention. ### Relevant log output ```shell time=2025-03-08T21:22:53.060-05:00 level=WARN source=ggml.go:136 msg="key not found" key=exaone.attention.value_length default=80 time=2025-03-08T21:22:53.060-05:00 level=INFO source=server.go:182 msg="enabling flash attention" time=2025-03-08T21:22:53.060-05:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/bin/ollama runner --model /var/lib/ollama/blobs/sha25> time=2025-03-08T21:22:53.060-05:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-08T21:22:53.060-05:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding" time=2025-03-08T21:22:53.061-05:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" time=2025-03-08T21:22:53.065-05:00 level=INFO source=runner.go:931 msg="starting go runner" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 7900 XT, gfx1100 (0x1100), VMM: no, Wave Size: 32 load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-03-08T21:22:53.766-05:00 level=INFO source=runner.go:934 msg=system info="CPU : LLAMAFILE = 1 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU > llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon RX 7900 XT) - 20346 MiB free time=2025-03-08T21:22:53.766-05:00 level=INFO source=runner.go:992 msg="Server listening on 127.0.0.1:34459" llama_model_loader: loaded meta data with 32 key-value pairs and 274 tensors from /var/lib/ollama/blobs/sha256-d6e077ce2bb2d36ad179739ea96d8b8d387f024b0bedede> llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = exaone llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = EXAONE 3.5 2.4B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = EXAONE-3.5 llama_model_loader: - kv 5: general.size_label str = 2.4B llama_model_loader: - kv 6: general.license str = other llama_model_loader: - kv 7: general.license.name str = exaone llama_model_loader: - kv 8: general.license.link str = LICENSE llama_model_loader: - kv 9: general.tags arr[str,4] = ["lg-ai", "exaone", "exaone-3.5", "te... llama_model_loader: - kv 10: general.languages arr[str,2] = ["en", "ko"] llama_model_loader: - kv 11: exaone.embedding_length u32 = 2560 llama_model_loader: - kv 12: exaone.attention.head_count u32 = 32 llama_model_loader: - kv 13: exaone.attention.head_count_kv u32 = 8 llama_model_loader: - kv 14: exaone.context_length u32 = 32768 llama_model_loader: - kv 15: exaone.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 16: exaone.feed_forward_length u32 = 7168 llama_model_loader: - kv 17: exaone.block_count u32 = 30 llama_model_loader: - kv 18: general.file_type u32 = 15 llama_model_loader: - kv 19: exaone.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 20: exaone.rope.dimension_count u32 = 80 llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 22: tokenizer.ggml.pre str = exaone llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,102400] = ["[PAD]", "[BOS]", "[EOS]", "[UNK]", ... llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,102400] = [3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, ... llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,101782] = ["t h", "Ġ a", "Ġ í", "i n", "Ġ t... llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 361 llama_model_loader: - kv 28: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 30: tokenizer.chat_template str = {% for message in messages %}{% if lo... llama_model_loader: - kv 31: general.quantization_version u32 = 2 llama_model_loader: - type f32: 62 tensors llama_model_loader: - type q4_K: 183 tensors llama_model_loader: - type q6_K: 29 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 1.53 GiB (4.92 BPW) time=2025-03-08T21:22:53.813-05:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: special tokens cache size = 362 load: token to piece cache size = 0.6622 MB print_info: arch = exaone print_info: vocab_only = 0 print_info: n_ctx_train = 32768 print_info: n_embd = 2560 print_info: n_layer = 30 print_info: n_head = 32 print_info: n_head_kv = 8 print_info: n_rot = 80 print_info: n_swa = 0 print_info: n_embd_head_k = 80 print_info: n_embd_head_v = 80 print_info: n_gqa = 4 print_info: n_embd_k_gqa = 640 print_info: n_embd_v_gqa = 640 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: n_ff = 7168 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 32768 print_info: rope_finetuned = unknown print_info: ssm_d_conv = 0 print_info: ssm_d_inner = 0 print_info: ssm_d_state = 0 print_info: ssm_dt_rank = 0 print_info: ssm_dt_b_c_rms = 0 print_info: model type = ?B print_info: model params = 2.67 B print_info: general.name = EXAONE 3.5 2.4B Instruct print_info: vocab type = BPE print_info: n_vocab = 102400 print_info: n_merges = 101782 print_info: BOS token = 1 '[BOS]' print_info: EOS token = 361 '[|endofturn|]' print_info: EOT token = 42 '<|endoftext|>' print_info: UNK token = 3 '[UNK]' print_info: PAD token = 0 '[PAD]' print_info: LF token = 560 'Ċ' print_info: EOG token = 42 '<|endoftext|>' print_info: EOG token = 361 '[|endofturn|]' print_info: max token length = 48 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 30 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 31/31 layers to GPU load_tensors: ROCm0 model buffer size = 1424.09 MiB load_tensors: CPU_Mapped model buffer size = 140.62 MiB llama_init_from_model: n_seq_max = 4 llama_init_from_model: n_ctx = 8192 llama_init_from_model: n_ctx_per_seq = 2048 llama_init_from_model: n_batch = 2048 llama_init_from_model: n_ubatch = 512 llama_init_from_model: flash_attn = 1 llama_init_from_model: freq_base = 1000000.0 llama_init_from_model: freq_scale = 1 llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized llama.cpp:10094: GGML_ASSERT(hparams.n_embd_head_k % ggml_blck_size(type_k) == 0) failed ptrace: Operation not permitted. No stack. The program is not being run. SIGABRT: abort PC=0x799f393ad624 m=7 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 10 gp=0xc000504700 m=7 mp=0xc000100808 [syscall]: runtime.cgocall(0x578fcc395310, 0xc000093c28) /usr/lib/go/src/runtime/cgocall.go:167 +0x4b fp=0xc000093c00 sp=0xc000093bc8 pc=0x578fcb742bcb github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x799ed4000c50, {0x2000, 0x800, 0x200, 0x4, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...}) _cgo_gotypes.go:616 +0x4e fp=0xc000093c28 sp=0xc000093c00 pc=0x578fcbac8cce github.com/ollama/ollama/llama.NewContextWithModel.func1(...) /build/ollama/src/ollama/llama/llama.go:279 github.com/ollama/ollama/llama.NewContextWithModel(0xc0003d0028, {{0x2000, 0x800, 0x200, 0x4, 0x8, 0x8, 0xffffffff, 0xffffffff, 0xffffffff, ...}}) /build/ollama/src/ollama/llama/llama.go:279 +0x158 fp=0xc000093dc8 sp=0xc000093c28 pc=0x578fcbacc5f8 github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0004f22d0, {0x1f, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc000318510, 0x0}, ...) /build/ollama/src/ollama/runner/llamarunner/runner.go:855 +0x178 fp=0xc000093f10 sp=0xc000093dc8 pc=0x578fcbae74f8 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() /build/ollama/src/ollama/runner/llamarunner/runner.go:966 +0xda fp=0xc000093fe0 sp=0xc000093f10 pc=0x578fcbae8cda runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x578fcb74d5e1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 /build/ollama/src/ollama/runner/llamarunner/runner.go:966 +0xcb7 goroutine 1 gp=0xc000002380 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00050f5b8 sp=0xc00050f598 pc=0x578fcb745eae runtime.netpollblock(0xc00050f608?, 0xcb6df7e6?, 0x8f?) /usr/lib/go/src/runtime/netpoll.go:575 +0xf7 fp=0xc00050f5f0 sp=0xc00050f5b8 pc=0x578fcb70acb7 internal/poll.runtime_pollWait(0x799ef20c6eb0, 0x72) /usr/lib/go/src/runtime/netpoll.go:351 +0x85 fp=0xc00050f610 sp=0xc00050f5f0 pc=0x578fcb7450c5 internal/poll.(*pollDesc).wait(0xc00004f600?, 0x900000036?, 0x0) /usr/lib/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00050f638 sp=0xc00050f610 pc=0x578fcb7cc547 internal/poll.(*pollDesc).waitRead(...) /usr/lib/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc00004f600) /usr/lib/go/src/internal/poll/fd_unix.go:620 +0x295 fp=0xc00050f6e0 sp=0xc00050f638 pc=0x578fcb7d1915 net.(*netFD).accept(0xc00004f600) /usr/lib/go/src/net/fd_unix.go:172 +0x29 fp=0xc00050f798 sp=0xc00050f6e0 pc=0x578fcb843d89 net.(*TCPListener).accept(0xc0002eeac0) /usr/lib/go/src/net/tcpsock_posix.go:159 +0x1b fp=0xc00050f7e8 sp=0xc00050f798 pc=0x578fcb85973b net.(*TCPListener).Accept(0xc0002eeac0) /usr/lib/go/src/net/tcpsock.go:380 +0x30 fp=0xc00050f818 sp=0xc00050f7e8 pc=0x578fcb8585f0 net/http.(*onceCloseListener).Accept(0xc0004f23f0?) <autogenerated>:1 +0x24 fp=0xc00050f830 sp=0xc00050f818 pc=0x578fcba6f4a4 net/http.(*Server).Serve(0xc000126f00, {0x578fcca0cbe8, 0xc0002eeac0}) /usr/lib/go/src/net/http/server.go:3424 +0x30c fp=0xc00050f960 sp=0xc00050f830 pc=0x578fcba46d6c github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034160, 0x11, 0x12}) /build/ollama/src/ollama/runner/llamarunner/runner.go:993 +0x116a fp=0xc00050fd08 sp=0xc00050f960 pc=0x578fcbae890a github.com/ollama/ollama/runner.Execute({0xc000034150?, 0x0?, 0x0?}) /build/ollama/src/ollama/runner/runner.go:22 +0xd4 fp=0xc00050fd30 sp=0xc00050fd08 pc=0x578fcbd12ff4 github.com/ollama/ollama/cmd.NewCLI.func2(0xc000126d00?, {0x578fcc58e195?, 0x4?, 0x578fcc58e199?}) /build/ollama/src/ollama/cmd/cmd.go:1281 +0x45 fp=0xc00050fd58 sp=0xc00050fd30 pc=0x578fcc328485 github.com/spf13/cobra.(*Command).execute(0xc0004f6f08, {0xc0004ec900, 0x11, 0x12}) /build/ollama/src/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00050fe78 sp=0xc00050fd58 pc=0x578fcb8bd01c github.com/spf13/cobra.(*Command).ExecuteC(0xc0004c6908) /build/ollama/src/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00050ff30 sp=0xc00050fe78 pc=0x578fcb8bd865 github.com/spf13/cobra.(*Command).Execute(...) /build/ollama/src/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /build/ollama/src/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /build/ollama/src/ollama/main.go:12 +0x4d fp=0xc00050ff50 sp=0xc00050ff30 pc=0x578fcc3287ed runtime.main() /usr/lib/go/src/runtime/proc.go:283 +0x29d fp=0xc00050ffe0 sp=0xc00050ff50 pc=0x578fcb7122bd runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00050ffe8 sp=0xc00050ffe0 pc=0x578fcb74d5e1 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x578fcb745eae runtime.goparkunlock(...) /usr/lib/go/src/runtime/proc.go:441 runtime.forcegchelper() /usr/lib/go/src/runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x578fcb7125f8 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x578fcb74d5e1 created by runtime.init.7 in goroutine 1 /usr/lib/go/src/runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x578fcb745eae runtime.goparkunlock(...) /usr/lib/go/src/runtime/proc.go:441 runtime.bgsweep(0xc0000aa000) /usr/lib/go/src/runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x578fcb6fce1f runtime.gcenable.gowrap1() /usr/lib/go/src/runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x578fcb6f1205 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x578fcb74d5e1 created by runtime.gcenable in goroutine 1 /usr/lib/go/src/runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x578fcc740b18?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x578fcb745eae runtime.goparkunlock(...) /usr/lib/go/src/runtime/proc.go:441 runtime.(*scavengerState).park(0x578fcd268240) /usr/lib/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x578fcb6fa869 runtime.bgscavenge(0xc0000aa000) /usr/lib/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x578fcb6fadf9 runtime.gcenable.gowrap2() /usr/lib/go/src/runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x578fcb6f11a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x578fcb74d5e1 created by runtime.gcenable in goroutine 1 /usr/lib/go/src/runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x578fcb745eae runtime.runfinq() /usr/lib/go/src/runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x578fcb6f01c7 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x578fcb74d5e1 created by runtime.createfing in goroutine 1 /usr/lib/go/src/runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc0001de8c0 m=nil [chan receive]: runtime.gopark(0xc000233860?, 0xc000610018?, 0x60?, 0x67?, 0x578fcb82aac8?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x578fcb745eae runtime.chanrecv(0xc0000b8380, 0x0, 0x1) /usr/lib/go/src/runtime/chan.go:664 +0x445 fp=0xc000086790 sp=0xc000086718 pc=0x578fcb6e23c5 runtime.chanrecv1(0x0?, 0x0?) /usr/lib/go/src/runtime/chan.go:506 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x578fcb6e1f52 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) /usr/lib/go/src/runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() /usr/lib/go/src/runtime/mgc.go:1799 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x578fcb6f43af runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x578fcb74d5e1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0001dec40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000102380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0001dee00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc0001028c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 36 gp=0xc000102a80 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011b738 sp=0xc00011b718 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011b7c8 sp=0xc00011b738 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011b7e0 sp=0xc00011b7c8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011b7e8 sp=0xc00011b7e0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 37 gp=0xc000102c40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011bf38 sp=0xc00011bf18 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011bfc8 sp=0xc00011bf38 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011bfe0 sp=0xc00011bfc8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011bfe8 sp=0xc00011bfe0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 38 gp=0xc000102e00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011c738 sp=0xc00011c718 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011c7c8 sp=0xc00011c738 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011c7e0 sp=0xc00011c7c8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011c7e8 sp=0xc00011c7e0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 39 gp=0xc000102fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011cf38 sp=0xc00011cf18 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011cfc8 sp=0xc00011cf38 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011cfe0 sp=0xc00011cfc8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011cfe8 sp=0xc00011cfe0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 40 gp=0xc000103180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011d738 sp=0xc00011d718 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011d7c8 sp=0xc00011d738 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011d7e0 sp=0xc00011d7c8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011d7e8 sp=0xc00011d7e0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 41 gp=0xc000103340 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00011df38 sp=0xc00011df18 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00011dfc8 sp=0xc00011df38 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00011dfe0 sp=0xc00011dfc8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 42 gp=0xc000103500 m=nil [GC worker (idle)]: runtime.gopark(0x15b928f66cba?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000116738 sp=0xc000116718 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0001167c8 sp=0xc000116738 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0001167e0 sp=0xc0001167c8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0001167e8 sp=0xc0001167e0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 43 gp=0xc0001036c0 m=nil [GC worker (idle)]: runtime.gopark(0x15b928f66c92?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000116f38 sp=0xc000116f18 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000116fc8 sp=0xc000116f38 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000116fe0 sp=0xc000116fc8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000116fe8 sp=0xc000116fe0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 44 gp=0xc000103880 m=nil [GC worker (idle)]: runtime.gopark(0x15b928f609aa?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000117738 sp=0xc000117718 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0001177c8 sp=0xc000117738 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0001177e0 sp=0xc0001177c8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0001177e8 sp=0xc0001177e0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc0001defc0 m=nil [GC worker (idle)]: runtime.gopark(0x15b928f60bee?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x578fcb745eae runtime.gcBgMarkWorker(0xc0000b9960) /usr/lib/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x578fcb6f36c9 runtime.gcBgMarkStartWorkers.gowrap1() /usr/lib/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x578fcb6f35a5 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x578fcb74d5e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1339 +0x105 goroutine 11 gp=0xc0005048c0 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0x0?, 0x0?, 0x60?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc000119e18 sp=0xc000119df8 pc=0x578fcb745eae runtime.goparkunlock(...) /usr/lib/go/src/runtime/proc.go:441 runtime.semacquire1(0xc0004f22d8, 0x0, 0x1, 0x0, 0x18) /usr/lib/go/src/runtime/sema.go:188 +0x229 fp=0xc000119e80 sp=0xc000119e18 pc=0x578fcb725889 sync.runtime_SemacquireWaitGroup(0x0?) /usr/lib/go/src/runtime/sema.go:110 +0x25 fp=0xc000119eb8 sp=0xc000119e80 pc=0x578fcb7478c5 sync.(*WaitGroup).Wait(0x0?) /usr/lib/go/src/sync/waitgroup.go:118 +0x48 fp=0xc000119ee0 sp=0xc000119eb8 pc=0x578fcb759048 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004f22d0, {0x578fcca0ee60, 0xc0000fdd10}) /build/ollama/src/ollama/runner/llamarunner/runner.go:316 +0x47 fp=0xc000119fb8 sp=0xc000119ee0 pc=0x578fcbae40a7 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2() /build/ollama/src/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc000119fe0 sp=0xc000119fb8 pc=0x578fcbae8bc8 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000119fe8 sp=0xc000119fe0 pc=0x578fcb74d5e1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 /build/ollama/src/ollama/runner/llamarunner/runner.go:973 +0xd97 goroutine 12 gp=0xc000504a80 m=nil [IO wait]: runtime.gopark(0x578fcb7cfb45?, 0xc00004f700?, 0x40?, 0xfa?, 0xb?) /usr/lib/go/src/runtime/proc.go:435 +0xce fp=0xc00030f948 sp=0xc00030f928 pc=0x578fcb745eae runtime.netpollblock(0x578fcb769338?, 0xcb6df7e6?, 0x8f?) /usr/lib/go/src/runtime/netpoll.go:575 +0xf7 fp=0xc00030f980 sp=0xc00030f948 pc=0x578fcb70acb7 internal/poll.runtime_pollWait(0x799ef20c6d98, 0x72) /usr/lib/go/src/runtime/netpoll.go:351 +0x85 fp=0xc00030f9a0 sp=0xc00030f980 pc=0x578fcb7450c5 internal/poll.(*pollDesc).wait(0xc00004f700?, 0xc000187000?, 0x0) /usr/lib/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00030f9c8 sp=0xc00030f9a0 pc=0x578fcb7cc547 internal/poll.(*pollDesc).waitRead(...) /usr/lib/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc00004f700, {0xc000187000, 0x1000, 0x1000}) /usr/lib/go/src/internal/poll/fd_unix.go:165 +0x27a fp=0xc00030fa60 sp=0xc00030f9c8 pc=0x578fcb7cd83a net.(*netFD).Read(0xc00004f700, {0xc000187000?, 0xc00030fad0?, 0x578fcb7cca05?}) /usr/lib/go/src/net/fd_posix.go:55 +0x25 fp=0xc00030faa8 sp=0xc00030fa60 pc=0x578fcb841de5 net.(*conn).Read(0xc00007a940, {0xc000187000?, 0x0?, 0x0?}) /usr/lib/go/src/net/net.go:194 +0x45 fp=0xc00030faf0 sp=0xc00030faa8 pc=0x578fcb8501a5 net/http.(*connReader).Read(0xc0004f4780, {0xc000187000, 0x1000, 0x1000}) /usr/lib/go/src/net/http/server.go:798 +0x159 fp=0xc00030fb40 sp=0xc00030faf0 pc=0x578fcba3bc19 bufio.(*Reader).fill(0xc00043c6c0) /usr/lib/go/src/bufio/bufio.go:113 +0x103 fp=0xc00030fb78 sp=0xc00030fb40 pc=0x578fcb867943 bufio.(*Reader).Peek(0xc00043c6c0, 0x4) /usr/lib/go/src/bufio/bufio.go:152 +0x53 fp=0xc00030fb98 sp=0xc00030fb78 pc=0x578fcb867a73 net/http.(*conn).serve(0xc0004f23f0, {0x578fcca0ee28, 0xc0004f4660}) /usr/lib/go/src/net/http/server.go:2137 +0x785 fp=0xc00030ffb8 sp=0xc00030fb98 pc=0x578fcba41a05 net/http.(*Server).Serve.gowrap3() /usr/lib/go/src/net/http/server.go:3454 +0x28 fp=0xc00030ffe0 sp=0xc00030ffb8 pc=0x578fcba47168 runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00030ffe8 sp=0xc00030ffe0 pc=0x578fcb74d5e1 created by net/http.(*Server).Serve in goroutine 1 /usr/lib/go/src/net/http/server.go:3454 +0x485 rax 0x0 rbx 0x8e93 rcx 0x799f393ad624 rdx 0x6 rdi 0x8e8d rsi 0x8e93 rbp 0x799edbfbe9d0 rsp 0x799edbfbe990 r8 0x0 r9 0x0 r10 0x0 r11 0x246 r12 0x578fcc75e09a r13 0x578fcc75e6aa r14 0x6 r15 0x799ed66dab50 rip 0x799f393ad624 rflags 0x246 cs 0x33 fs 0x0 gs 0x0 time=2025-03-08T21:22:54.227-05:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2" time=2025-03-08T21:22:54.314-05:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: GGML_ASSERT(hp> [GIN] 2025/03/08 - | 500 | 1.269503968s | 127.0.0.1 | POST "/api/generate" ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.5.13
GiteaMirror added the bug label 2026-04-22 12:54:08 -05:00
Author
Owner

@LFd3v commented on GitHub (Mar 12, 2025):

Just to report that the same happens here on Linux with Nvidia/CUDA using either KV quants (Q4 or Q8):

GGML_ASSERT(hparams.n_embd_head_k % ggml_blck_size(type_k) == 0) failed
<!-- gh-comment-id:2717893676 --> @LFd3v commented on GitHub (Mar 12, 2025): Just to report that the same happens here on Linux with Nvidia/CUDA using either KV quants (Q4 or Q8): GGML_ASSERT(hparams.n_embd_head_k % ggml_blck_size(type_k) == 0) failed
Author
Owner

@LFd3v commented on GitHub (Aug 27, 2025):

Bump. As of now, running ollama v0.11.7, trying to run exaone3.5:2.4b does not causes an error nor breaks the app, but it simply produces an empty response when using quantized KV (Q4 or Q8).

<!-- gh-comment-id:3228415153 --> @LFd3v commented on GitHub (Aug 27, 2025): Bump. As of now, running ollama v0.11.7, trying to run exaone3.5:2.4b does not causes an error nor breaks the app, but it simply produces an empty response when using quantized KV (Q4 or Q8).
Author
Owner

@rick-github commented on GitHub (Oct 11, 2025):

Is this still an issue?

$ ollama -v
ollama version is 0.12.5
$ docker compose logs ollama | grep server.config | tail -1 | tr ' ' \\n | egrep "OLLAMA_FLASH_ATTENTION|OLLAMA_KV_CACHE_TYPE"
OLLAMA_FLASH_ATTENTION:true
OLLAMA_KV_CACHE_TYPE:q4_0
$ ollama run exaone3.5:2.4b-instruct-q4_K_M
>>> hello
Hello! How can I assist you today? Feel free to ask about any specific topic or task you need help with. 😊

>>> who are you?
Hello! I'm EXAONe, developed by LG AI Research. My purpose is to assist users like yourself with information, answering questions, and providing helpful responses across a wide range of topics. Whether you 
need guidance on technology, insights into research, or assistance with tasks—I'm here to help you effectively. How else can I assist you today?
<!-- gh-comment-id:3393664458 --> @rick-github commented on GitHub (Oct 11, 2025): Is this still an issue? ```console $ ollama -v ollama version is 0.12.5 $ docker compose logs ollama | grep server.config | tail -1 | tr ' ' \\n | egrep "OLLAMA_FLASH_ATTENTION|OLLAMA_KV_CACHE_TYPE" OLLAMA_FLASH_ATTENTION:true OLLAMA_KV_CACHE_TYPE:q4_0 $ ollama run exaone3.5:2.4b-instruct-q4_K_M >>> hello Hello! How can I assist you today? Feel free to ask about any specific topic or task you need help with. 😊 >>> who are you? Hello! I'm EXAONe, developed by LG AI Research. My purpose is to assist users like yourself with information, answering questions, and providing helpful responses across a wide range of topics. Whether you need guidance on technology, insights into research, or assistance with tasks—I'm here to help you effectively. How else can I assist you today? ```
Author
Owner

@LFd3v commented on GitHub (Oct 12, 2025):

It is working fine for me on v0.12.5 as well. I guess this can be closed for now. Thanks.

<!-- gh-comment-id:3393772752 --> @LFd3v commented on GitHub (Oct 12, 2025): It is working fine for me on v0.12.5 as well. I guess this can be closed for now. Thanks.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32026