[GH-ISSUE #13224] Error loading mistral-small3.2:24b on AMD GPU #55255

Closed
opened 2026-04-29 08:38:05 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @johanatandromeda on GitHub (Nov 24, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13224

What is the issue?

I get an error loading mistral-small3.2:24b on an AMD RX 7600 XT. The setup is

Proxmox 9 on AMD Ryzen 7 5800X -> VM with CPU x86-64-v3 and GPU with PCI passthrough -> Docker -> Ollama 0.13.0. The VM has 32GB RAM and the GPU is 16GB VRAM

I also get errors with devstral:24b and deepseek-r1:14b. qwen3:30b-a3b-instruct-2507-q4_k_M works.

Relevant log output

ollama             | time=2025-11-24T06:34:43.466Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-b3a2c9a8fef9be8d2ef951aecca36a36b9ea0b70abe9359eab4315bf4cd9be01 --port 32811"
ollama             | time=2025-11-24T06:34:43.466Z level=INFO source=sched.go:443 msg="system memory" total="31.4 GiB" free="7.9 GiB" free_swap="6.7 GiB"
ollama             | time=2025-11-24T06:34:43.466Z level=INFO source=sched.go:450 msg="gpu memory" id=0 library=ROCm available="15.5 GiB" free="16.0 GiB" minimum="457.0 MiB" overhead="0 B"
ollama             | time=2025-11-24T06:34:43.466Z level=INFO source=server.go:459 msg="loading model" "model layers"=41 requested=-1
ollama             | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="9.0 GiB"
ollama             | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:245 msg="model weights" device=CPU size="4.0 GiB"
ollama             | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="3.5 GiB"
ollama             | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.3 GiB"
ollama             | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="2.2 GiB"
ollama             | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:272 msg="total memory" size="20.1 GiB"
ollama             | time=2025-11-24T06:34:43.473Z level=INFO source=runner.go:963 msg="starting go runner"
ollama             | load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
ollama             | /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ollama             | ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ollama             | ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ollama             | ggml_cuda_init: found 1 ROCm devices:
ollama             |   Device 0: AMD Radeon Graphics, gfx1102 (0x1102), VMM: no, Wave Size: 32, ID: 0
ollama             | load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
ollama             | time=2025-11-24T06:34:44.126Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
ollama             | time=2025-11-24T06:34:44.126Z level=INFO source=runner.go:999 msg="Server listening on 127.0.0.1:32811"
ollama             | time=2025-11-24T06:34:44.131Z level=INFO source=runner.go:893 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:32000 KvCacheType: NumThreads:6 GPULayers:29[ID:0 Layers:29(11..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ollama             | ggml_hip_get_device_memory searching for device 0000:00:10.0
ollama             | ggml_backend_cuda_device_get_memory device 0000:00:10.0 utilizing AMD specific memory reporting free: 17135820800 total: 17163091968
ollama             | llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) (0000:00:10.0) - 16341 MiB free
ollama             | time=2025-11-24T06:34:44.134Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding"
ollama             | time=2025-11-24T06:34:44.135Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model"
ollama             | llama_model_loader: loaded meta data with 41 key-value pairs and 363 tensors from /root/.ollama/models/blobs/sha256-b3a2c9a8fef9be8d2ef951aecca36a36b9ea0b70abe9359eab4315bf4cd9be01 (version GGUF V3 (latest))
ollama             | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
ollama             | llama_model_loader: - kv   0:                       general.architecture str              = llama
ollama             | llama_model_loader: - kv   1:                  general.base_model.0.name str              = Devstrall Small 2505
ollama             | llama_model_loader: - kv   2:          general.base_model.0.organization str              = Mistralai
ollama             | llama_model_loader: - kv   3:              general.base_model.0.repo_url str              = https://huggingface.co/mistralai/Devs...
ollama             | llama_model_loader: - kv   4:               general.base_model.0.version str              = 2505
ollama             | llama_model_loader: - kv   5:                   general.base_model.count u32              = 1
ollama             | llama_model_loader: - kv   6:                           general.basename str              = Devstral
ollama             | llama_model_loader: - kv   7:                          general.file_type u32              = 15
ollama             | llama_model_loader: - kv   8:                          general.languages arr[str,24]      = ["en", "fr", "de", "es", "pt", "it", ...
ollama             | llama_model_loader: - kv   9:                            general.license str              = apache-2.0
ollama             | llama_model_loader: - kv  10:                               general.name str              = Devstral Small 2505
ollama             | llama_model_loader: - kv  11:                    general.parameter_count u64              = 23572403200
ollama             | llama_model_loader: - kv  12:               general.quantization_version u32              = 2
ollama             | llama_model_loader: - kv  13:                         general.size_label str              = Small
ollama             | llama_model_loader: - kv  14:                               general.tags arr[str,1]       = ["text2text-generation"]
ollama             | llama_model_loader: - kv  15:                               general.type str              = model
ollama             | llama_model_loader: - kv  16:                            general.version str              = 2505
ollama             | llama_model_loader: - kv  17:                 llama.attention.head_count u32              = 32
ollama             | llama_model_loader: - kv  18:              llama.attention.head_count_kv u32              = 8
ollama             | llama_model_loader: - kv  19:                 llama.attention.key_length u32              = 128
ollama             | llama_model_loader: - kv  20:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
ollama             | llama_model_loader: - kv  21:               llama.attention.value_length u32              = 128
ollama             | llama_model_loader: - kv  22:                          llama.block_count u32              = 40
ollama             | llama_model_loader: - kv  23:                       llama.context_length u32              = 131072
ollama             | llama_model_loader: - kv  24:                     llama.embedding_length u32              = 5120
ollama             | llama_model_loader: - kv  25:                  llama.feed_forward_length u32              = 32768
ollama             | llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
ollama             | llama_model_loader: - kv  27:                       llama.rope.freq_base f32              = 1000000000.000000
ollama             | llama_model_loader: - kv  28:                           llama.vocab_size u32              = 131072
ollama             | llama_model_loader: - kv  29:                    tokenizer.chat_template str              = {%- set today = strftime_now("%Y-%m-%...
ollama             | llama_model_loader: - kv  30:               tokenizer.ggml.add_bos_token bool             = true
ollama             | llama_model_loader: - kv  31:               tokenizer.ggml.add_eos_token bool             = false
ollama             | llama_model_loader: - kv  32:            tokenizer.ggml.add_space_prefix bool             = false
ollama             | llama_model_loader: - kv  33:                tokenizer.ggml.bos_token_id u32              = 1
ollama             | llama_model_loader: - kv  34:                tokenizer.ggml.eos_token_id u32              = 2
ollama             | llama_model_loader: - kv  35:                      tokenizer.ggml.merges arr[str,269443]  = ["Ġ Ġ", "Ġ t", "e r", "i n", "Ġ �...
ollama             | llama_model_loader: - kv  36:                       tokenizer.ggml.model str              = gpt2
ollama             | llama_model_loader: - kv  37:                         tokenizer.ggml.pre str              = tekken
ollama             | llama_model_loader: - kv  38:                  tokenizer.ggml.token_type arr[i32,131072]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
ollama             | llama_model_loader: - kv  39:                      tokenizer.ggml.tokens arr[str,131072]  = ["<unk>", "<s>", "</s>", "[INST]", "[...
ollama             | llama_model_loader: - kv  40:            tokenizer.ggml.unknown_token_id u32              = 0
ollama             | llama_model_loader: - type  f32:   81 tensors
ollama             | llama_model_loader: - type q4_K:  241 tensors
ollama             | llama_model_loader: - type q6_K:   41 tensors
ollama             | print_info: file format = GGUF V3 (latest)
ollama             | print_info: file type   = Q4_K - Medium
ollama             | print_info: file size   = 13.34 GiB (4.86 BPW) 
ollama             | load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
ollama             | load: printing all EOG tokens:
ollama             | load:   - 2 ('</s>')
ollama             | load: special tokens cache size = 1000
ollama             | load: token to piece cache size = 0.8498 MB
ollama             | print_info: arch             = llama
ollama             | print_info: vocab_only       = 0
ollama             | print_info: n_ctx_train      = 131072
ollama             | print_info: n_embd           = 5120
ollama             | print_info: n_layer          = 40
ollama             | print_info: n_head           = 32
ollama             | print_info: n_head_kv        = 8
ollama             | print_info: n_rot            = 128
ollama             | print_info: n_swa            = 0
ollama             | print_info: is_swa_any       = 0
ollama             | print_info: n_embd_head_k    = 128
ollama             | print_info: n_embd_head_v    = 128
ollama             | print_info: n_gqa            = 4
ollama             | print_info: n_embd_k_gqa     = 1024
ollama             | print_info: n_embd_v_gqa     = 1024
ollama             | print_info: f_norm_eps       = 0.0e+00
ollama             | print_info: f_norm_rms_eps   = 1.0e-05
ollama             | print_info: f_clamp_kqv      = 0.0e+00
ollama             | print_info: f_max_alibi_bias = 0.0e+00
ollama             | print_info: f_logit_scale    = 0.0e+00
ollama             | print_info: f_attn_scale     = 0.0e+00
ollama             | print_info: n_ff             = 32768
ollama             | print_info: n_expert         = 0
ollama             | print_info: n_expert_used    = 0
ollama             | print_info: causal attn      = 1
ollama             | print_info: pooling type     = 0
ollama             | print_info: rope type        = 0
ollama             | print_info: rope scaling     = linear
ollama             | print_info: freq_base_train  = 1000000000.0
ollama             | print_info: freq_scale_train = 1
ollama             | print_info: n_ctx_orig_yarn  = 131072
ollama             | print_info: rope_finetuned   = unknown
ollama             | print_info: model type       = 13B
ollama             | print_info: model params     = 23.57 B
ollama             | print_info: general.name     = Devstral Small 2505
ollama             | print_info: vocab type       = BPE
ollama             | print_info: n_vocab          = 131072
ollama             | print_info: n_merges         = 269443
ollama             | print_info: BOS token        = 1 '<s>'
ollama             | print_info: EOS token        = 2 '</s>'
ollama             | print_info: UNK token        = 0 '<unk>'
ollama             | print_info: LF token         = 1010 'Ċ'
ollama             | print_info: EOG token        = 2 '</s>'
ollama             | print_info: max token length = 150
ollama             | load_tensors: loading model tensors, this can take a while... (mmap = false)
ollama             | load_tensors: offloading 29 repeating layers to GPU
ollama             | load_tensors: offloaded 29/41 layers to GPU
ollama             | load_tensors:    ROCm_Host model buffer size =  4102.60 MiB
ollama             | load_tensors:        ROCm0 model buffer size =  9199.77 MiB
ollama             | load_tensors:          CPU model buffer size =   360.00 MiB
ollama             | llama_context: constructing llama_context
ollama             | llama_context: n_seq_max     = 1
ollama             | llama_context: n_ctx         = 32000
ollama             | llama_context: n_ctx_per_seq = 32000
ollama             | llama_context: n_batch       = 512
ollama             | llama_context: n_ubatch      = 512
ollama             | llama_context: causal_attn   = 1
ollama             | llama_context: flash_attn    = disabled
ollama             | llama_context: kv_unified    = false
ollama             | llama_context: freq_base     = 1000000000.0
ollama             | llama_context: freq_scale    = 1
ollama             | llama_context: n_ctx_per_seq (32000) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
ollama             | llama_context:        CPU  output buffer size =     0.52 MiB
ollama             | llama_kv_cache:      ROCm0 KV buffer size =  3625.00 MiB
ollama             | llama_kv_cache:        CPU KV buffer size =  1375.00 MiB
ollama             | llama_kv_cache: size = 5000.00 MiB ( 32000 cells,  40 layers,  1/1 seqs), K (f16): 2500.00 MiB, V (f16): 2500.00 MiB
ollama             | graph_reserve: failed to allocate compute buffers
ollama             | SIGSEGV: segmentation violation
ollama             | PC=0x7f3e3679c78a m=3 sigcode=1 addr=0x7f413461d0d8
ollama             | signal arrived during cgo execution
ollama             | 
ollama             | goroutine 25 gp=0xc000003dc0 m=3 mp=0xc00006b008 [syscall]:
ollama             | runtime.cgocall(0x55e5e3541b50, 0xc000319c00)
ollama             | 	runtime/cgocall.go:167 +0x4b fp=0xc000319bd8 sp=0xc000319ba0 pc=0x55e5e2824b0b
ollama             | github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x7f3e70000dc0, {0x7d00, 0x200, 0x200, 0x1, 0x6, 0x6, 0xffffffff, 0xffffffff, 0xffffffff, ...})
ollama             | 	_cgo_gotypes.go:762 +0x4e fp=0xc000319c00 sp=0xc000319bd8 pc=0x55e5e2bddaae
ollama             | github.com/ollama/ollama/llama.NewContextWithModel.func1(...)
ollama             | 	github.com/ollama/ollama/llama/llama.go:317
ollama             | github.com/ollama/ollama/llama.NewContextWithModel(0xc0001d53a0, {{0x7d00, 0x200, 0x200, 0x1, 0x6, 0x6, 0xffffffff, 0xffffffff, 0xffffffff, ...}})
ollama             | 	github.com/ollama/ollama/llama/llama.go:317 +0x158 fp=0xc000319da0 sp=0xc000319c00 pc=0x55e5e2be1cd8
ollama             | github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0000d0dc0, {{0xc0001d4d90, 0x1, 0x1}, 0x1d, 0x0, 0x0, {0xc0001d4d88, 0x1, 0x2}, ...}, ...)
ollama             | 	github.com/ollama/ollama/runner/llamarunner/runner.go:845 +0x178 fp=0xc000319ee8 sp=0xc000319da0 pc=0x55e5e2c9a418
ollama             | github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2()
ollama             | 	github.com/ollama/ollama/runner/llamarunner/runner.go:932 +0x115 fp=0xc000319fe0 sp=0xc000319ee8 pc=0x55e5e2c9b635
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc000319fe8 sp=0xc000319fe0 pc=0x55e5e282fe21
ollama             | created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 7
ollama             | 	github.com/ollama/ollama/runner/llamarunner/runner.go:932 +0x88a
ollama             | 
ollama             | goroutine 1 gp=0xc000002380 m=nil [IO wait]:
ollama             | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc00051d790 sp=0xc00051d770 pc=0x55e5e2827f8e
ollama             | runtime.netpollblock(0xc00051d7e0?, 0xe27c16c6?, 0xe5?)
ollama             | 	runtime/netpoll.go:575 +0xf7 fp=0xc00051d7c8 sp=0xc00051d790 pc=0x55e5e27ed2b7
ollama             | internal/poll.runtime_pollWait(0x7f3ec3e13de0, 0x72)
ollama             | 	runtime/netpoll.go:351 +0x85 fp=0xc00051d7e8 sp=0xc00051d7c8 pc=0x55e5e28271a5
ollama             | internal/poll.(*pollDesc).wait(0xc0000a4d00?, 0x900000036?, 0x0)
ollama             | 	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00051d810 sp=0xc00051d7e8 pc=0x55e5e28af0e7
ollama             | internal/poll.(*pollDesc).waitRead(...)
ollama             | 	internal/poll/fd_poll_runtime.go:89
ollama             | internal/poll.(*FD).Accept(0xc0000a4d00)
ollama             | 	internal/poll/fd_unix.go:620 +0x295 fp=0xc00051d8b8 sp=0xc00051d810 pc=0x55e5e28b44b5
ollama             | net.(*netFD).accept(0xc0000a4d00)
ollama             | 	net/fd_unix.go:172 +0x29 fp=0xc00051d970 sp=0xc00051d8b8 pc=0x55e5e2927389
ollama             | net.(*TCPListener).accept(0xc000092f40)
ollama             | 	net/tcpsock_posix.go:159 +0x1b fp=0xc00051d9c0 sp=0xc00051d970 pc=0x55e5e293cd3b
ollama             | net.(*TCPListener).Accept(0xc000092f40)
ollama             | 	net/tcpsock.go:380 +0x30 fp=0xc00051d9f0 sp=0xc00051d9c0 pc=0x55e5e293bbf0
ollama             | net/http.(*onceCloseListener).Accept(0xc0000ce3f0?)
ollama             | 	<autogenerated>:1 +0x24 fp=0xc00051da08 sp=0xc00051d9f0 pc=0x55e5e2b533c4
ollama             | net/http.(*Server).Serve(0xc0001f9700, {0x55e5e3d4ff60, 0xc000092f40})
ollama             | 	net/http/server.go:3424 +0x30c fp=0xc00051db38 sp=0xc00051da08 pc=0x55e5e2b2ac8c
ollama             | github.com/ollama/ollama/runner/llamarunner.Execute({0xc00012e140, 0x4, 0x4})
ollama             | 	github.com/ollama/ollama/runner/llamarunner/runner.go:1000 +0x8f5 fp=0xc00051dd08 sp=0xc00051db38 pc=0x55e5e2c9bff5
ollama             | github.com/ollama/ollama/runner.Execute({0xc00012e130?, 0x0?, 0x0?})
ollama             | 	github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc00051dd30 sp=0xc00051dd08 pc=0x55e5e2d42454
ollama             | github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001f9400?, {0x55e5e38510ab?, 0x4?, 0x55e5e38510af?})
ollama             | 	github.com/ollama/ollama/cmd/cmd.go:1841 +0x45 fp=0xc00051dd58 sp=0xc00051dd30 pc=0x55e5e34d1d85
ollama             | github.com/spf13/cobra.(*Command).execute(0xc0000d5508, {0xc000092d40, 0x4, 0x4})
ollama             | 	github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00051de78 sp=0xc00051dd58 pc=0x55e5e29a09dc
ollama             | github.com/spf13/cobra.(*Command).ExecuteC(0xc00054cf08)
ollama             | 	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00051df30 sp=0xc00051de78 pc=0x55e5e29a1225
ollama             | github.com/spf13/cobra.(*Command).Execute(...)
ollama             | 	github.com/spf13/cobra@v1.7.0/command.go:992
ollama             | github.com/spf13/cobra.(*Command).ExecuteContext(...)
ollama             | 	github.com/spf13/cobra@v1.7.0/command.go:985
ollama             | main.main()
ollama             | 	github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00051df50 sp=0xc00051df30 pc=0x55e5e34d286d
ollama             | runtime.main()
ollama             | 	runtime/proc.go:283 +0x29d fp=0xc00051dfe0 sp=0xc00051df50 pc=0x55e5e27f493d
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc00051dfe8 sp=0xc00051dfe0 pc=0x55e5e282fe21
ollama             | 
ollama             | goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
ollama             | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000064fa8 sp=0xc000064f88 pc=0x55e5e2827f8e
ollama             | runtime.goparkunlock(...)
ollama             | 	runtime/proc.go:441
ollama             | runtime.forcegchelper()
ollama             | 	runtime/proc.go:348 +0xb8 fp=0xc000064fe0 sp=0xc000064fa8 pc=0x55e5e27f4c78
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc000064fe8 sp=0xc000064fe0 pc=0x55e5e282fe21
ollama             | created by runtime.init.7 in goroutine 1
ollama             | 	runtime/proc.go:336 +0x1a
ollama             | 
ollama             | goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
ollama             | runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000065780 sp=0xc000065760 pc=0x55e5e2827f8e
ollama             | runtime.goparkunlock(...)
ollama             | 	runtime/proc.go:441
ollama             | runtime.bgsweep(0xc000090000)
ollama             | 	runtime/mgcsweep.go:316 +0xdf fp=0xc0000657c8 sp=0xc000065780 pc=0x55e5e27df41f
ollama             | runtime.gcenable.gowrap1()
ollama             | 	runtime/mgc.go:204 +0x25 fp=0xc0000657e0 sp=0xc0000657c8 pc=0x55e5e27d3805
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000657e8 sp=0xc0000657e0 pc=0x55e5e282fe21
ollama             | created by runtime.gcenable in goroutine 1
ollama             | 	runtime/mgc.go:204 +0x66
ollama             | 
ollama             | goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
ollama             | runtime.gopark(0x10000?, 0x55e5e3a1b148?, 0x0?, 0x0?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000065f78 sp=0xc000065f58 pc=0x55e5e2827f8e
ollama             | runtime.goparkunlock(...)
ollama             | 	runtime/proc.go:441
ollama             | runtime.(*scavengerState).park(0x55e5e4612100)
ollama             | 	runtime/mgcscavenge.go:425 +0x49 fp=0xc000065fa8 sp=0xc000065f78 pc=0x55e5e27dce69
ollama             | runtime.bgscavenge(0xc000090000)
ollama             | 	runtime/mgcscavenge.go:658 +0x59 fp=0xc000065fc8 sp=0xc000065fa8 pc=0x55e5e27dd3f9
ollama             | runtime.gcenable.gowrap2()
ollama             | 	runtime/mgc.go:205 +0x25 fp=0xc000065fe0 sp=0xc000065fc8 pc=0x55e5e27d37a5
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc000065fe8 sp=0xc000065fe0 pc=0x55e5e282fe21
ollama             | created by runtime.gcenable in goroutine 1
ollama             | 	runtime/mgc.go:205 +0xa5
ollama             | 
ollama             | goroutine 18 gp=0xc000102700 m=nil [finalizer wait]:
ollama             | runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000064688?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000064630 sp=0xc000064610 pc=0x55e5e2827f8e
ollama             | runtime.runfinq()
ollama             | 	runtime/mfinal.go:196 +0x107 fp=0xc0000647e0 sp=0xc000064630 pc=0x55e5e27d27c7
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000647e8 sp=0xc0000647e0 pc=0x55e5e282fe21
ollama             | created by runtime.createfing in goroutine 1
ollama             | 	runtime/mfinal.go:166 +0x3d
ollama             | 
ollama             | goroutine 19 gp=0xc000103180 m=nil [chan receive]:
ollama             | runtime.gopark(0xc0001d3860?, 0xc000590018?, 0x60?, 0x7?, 0x55e5e290dfc8?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000060718 sp=0xc0000606f8 pc=0x55e5e2827f8e
ollama             | runtime.chanrecv(0xc000110310, 0x0, 0x1)
ollama             | 	runtime/chan.go:664 +0x445 fp=0xc000060790 sp=0xc000060718 pc=0x55e5e27c42a5
ollama             | runtime.chanrecv1(0x0?, 0x0?)
ollama             | 	runtime/chan.go:506 +0x12 fp=0xc0000607b8 sp=0xc000060790 pc=0x55e5e27c3e32
ollama             | runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
ollama             | 	runtime/mgc.go:1796
ollama             | runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
ollama             | 	runtime/mgc.go:1799 +0x2f fp=0xc0000607e0 sp=0xc0000607b8 pc=0x55e5e27d69af
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000607e8 sp=0xc0000607e0 pc=0x55e5e282fe21
ollama             | created by unique.runtime_registerUniqueMapCleanup in goroutine 1
ollama             | 	runtime/mgc.go:1794 +0x85
ollama             | 
ollama             | goroutine 20 gp=0xc000103500 m=nil [GC worker (idle)]:
ollama             | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000060f38 sp=0xc000060f18 pc=0x55e5e2827f8e
ollama             | runtime.gcBgMarkWorker(0xc000111730)
ollama             | 	runtime/mgc.go:1423 +0xe9 fp=0xc000060fc8 sp=0xc000060f38 pc=0x55e5e27d5cc9
ollama             | runtime.gcBgMarkStartWorkers.gowrap1()
ollama             | 	runtime/mgc.go:1339 +0x25 fp=0xc000060fe0 sp=0xc000060fc8 pc=0x55e5e27d5ba5
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc000060fe8 sp=0xc000060fe0 pc=0x55e5e282fe21
ollama             | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama             | 	runtime/mgc.go:1339 +0x105
ollama             | 
ollama             | goroutine 21 gp=0xc0001036c0 m=nil [GC worker (idle)]:
ollama             | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000061738 sp=0xc000061718 pc=0x55e5e2827f8e
ollama             | runtime.gcBgMarkWorker(0xc000111730)
ollama             | 	runtime/mgc.go:1423 +0xe9 fp=0xc0000617c8 sp=0xc000061738 pc=0x55e5e27d5cc9
ollama             | runtime.gcBgMarkStartWorkers.gowrap1()
ollama             | 	runtime/mgc.go:1339 +0x25 fp=0xc0000617e0 sp=0xc0000617c8 pc=0x55e5e27d5ba5
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000617e8 sp=0xc0000617e0 pc=0x55e5e282fe21
ollama             | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama             | 	runtime/mgc.go:1339 +0x105
ollama             | 
ollama             | goroutine 34 gp=0xc000504000 m=nil [GC worker (idle)]:
ollama             | runtime.gopark(0x2b4209e582fe?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x55e5e2827f8e
ollama             | runtime.gcBgMarkWorker(0xc000111730)
ollama             | 	runtime/mgc.go:1423 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x55e5e27d5cc9
ollama             | runtime.gcBgMarkStartWorkers.gowrap1()
ollama             | 	runtime/mgc.go:1339 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x55e5e27d5ba5
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x55e5e282fe21
ollama             | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama             | 	runtime/mgc.go:1339 +0x105
ollama             | 
ollama             | goroutine 5 gp=0xc000003a40 m=nil [GC worker (idle)]:
ollama             | runtime.gopark(0x2b4209e586dc?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000066738 sp=0xc000066718 pc=0x55e5e2827f8e
ollama             | runtime.gcBgMarkWorker(0xc000111730)
ollama             | 	runtime/mgc.go:1423 +0xe9 fp=0xc0000667c8 sp=0xc000066738 pc=0x55e5e27d5cc9
ollama             | runtime.gcBgMarkStartWorkers.gowrap1()
ollama             | 	runtime/mgc.go:1339 +0x25 fp=0xc0000667e0 sp=0xc0000667c8 pc=0x55e5e27d5ba5
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000667e8 sp=0xc0000667e0 pc=0x55e5e282fe21
ollama             | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama             | 	runtime/mgc.go:1339 +0x105
ollama             | 
ollama             | goroutine 22 gp=0xc000103880 m=nil [GC worker (idle)]:
ollama             | runtime.gopark(0x2b4209e59078?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000061f38 sp=0xc000061f18 pc=0x55e5e2827f8e
ollama             | runtime.gcBgMarkWorker(0xc000111730)
ollama             | 	runtime/mgc.go:1423 +0xe9 fp=0xc000061fc8 sp=0xc000061f38 pc=0x55e5e27d5cc9
ollama             | runtime.gcBgMarkStartWorkers.gowrap1()
ollama             | 	runtime/mgc.go:1339 +0x25 fp=0xc000061fe0 sp=0xc000061fc8 pc=0x55e5e27d5ba5
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc000061fe8 sp=0xc000061fe0 pc=0x55e5e282fe21
ollama             | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama             | 	runtime/mgc.go:1339 +0x105
ollama             | 
ollama             | goroutine 23 gp=0xc000103a40 m=nil [GC worker (idle)]:
ollama             | runtime.gopark(0x2b4209e58254?, 0x0?, 0x0?, 0x0?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000062738 sp=0xc000062718 pc=0x55e5e2827f8e
ollama             | runtime.gcBgMarkWorker(0xc000111730)
ollama             | 	runtime/mgc.go:1423 +0xe9 fp=0xc0000627c8 sp=0xc000062738 pc=0x55e5e27d5cc9
ollama             | runtime.gcBgMarkStartWorkers.gowrap1()
ollama             | 	runtime/mgc.go:1339 +0x25 fp=0xc0000627e0 sp=0xc0000627c8 pc=0x55e5e27d5ba5
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000627e8 sp=0xc0000627e0 pc=0x55e5e282fe21
ollama             | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama             | 	runtime/mgc.go:1339 +0x105
ollama             | 
ollama             | goroutine 6 gp=0xc000504c40 m=nil [sync.WaitGroup.Wait]:
ollama             | runtime.gopark(0x0?, 0x0?, 0x60?, 0x40?, 0x0?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc00050d620 sp=0xc00050d600 pc=0x55e5e2827f8e
ollama             | runtime.goparkunlock(...)
ollama             | 	runtime/proc.go:441
ollama             | runtime.semacquire1(0xc0000d0de0, 0x0, 0x1, 0x0, 0x18)
ollama             | 	runtime/sema.go:188 +0x229 fp=0xc00050d688 sp=0xc00050d620 pc=0x55e5e2807f09
ollama             | sync.runtime_SemacquireWaitGroup(0x0?)
ollama             | 	runtime/sema.go:110 +0x25 fp=0xc00050d6c0 sp=0xc00050d688 pc=0x55e5e28298c5
ollama             | sync.(*WaitGroup).Wait(0x0?)
ollama             | 	sync/waitgroup.go:118 +0x48 fp=0xc00050d6e8 sp=0xc00050d6c0 pc=0x55e5e283b768
ollama             | github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0000d0dc0, {0x55e5e3d52580, 0xc00009ebe0})
ollama             | 	github.com/ollama/ollama/runner/llamarunner/runner.go:359 +0x4b fp=0xc00050d7b8 sp=0xc00050d6e8 pc=0x55e5e2c96dcb
ollama             | github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
ollama             | 	github.com/ollama/ollama/runner/llamarunner/runner.go:979 +0x28 fp=0xc00050d7e0 sp=0xc00050d7b8 pc=0x55e5e2c9c268
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc00050d7e8 sp=0xc00050d7e0 pc=0x55e5e282fe21
ollama             | created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
ollama             | 	github.com/ollama/ollama/runner/llamarunner/runner.go:979 +0x4c5
ollama             | 
ollama             | goroutine 7 gp=0xc000504e00 m=nil [IO wait]:
ollama             | runtime.gopark(0x55e5e28b26e5?, 0xc0000a4d80?, 0x40?, 0x9a?, 0xb?)
ollama             | 	runtime/proc.go:435 +0xce fp=0xc000049948 sp=0xc000049928 pc=0x55e5e2827f8e
ollama             | runtime.netpollblock(0x55e5e284b638?, 0xe27c16c6?, 0xe5?)
ollama             | 	runtime/netpoll.go:575 +0xf7 fp=0xc000049980 sp=0xc000049948 pc=0x55e5e27ed2b7
ollama             | internal/poll.runtime_pollWait(0x7f3ec3e13cc8, 0x72)
ollama             | 	runtime/netpoll.go:351 +0x85 fp=0xc0000499a0 sp=0xc000049980 pc=0x55e5e28271a5
ollama             | internal/poll.(*pollDesc).wait(0xc0000a4d80?, 0xc0000f8000?, 0x0)
ollama             | 	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000499c8 sp=0xc0000499a0 pc=0x55e5e28af0e7
ollama             | internal/poll.(*pollDesc).waitRead(...)
ollama             | 	internal/poll/fd_poll_runtime.go:89
ollama             | internal/poll.(*FD).Read(0xc0000a4d80, {0xc0000f8000, 0x1000, 0x1000})
ollama             | 	internal/poll/fd_unix.go:165 +0x27a fp=0xc000049a60 sp=0xc0000499c8 pc=0x55e5e28b03da
ollama             | net.(*netFD).Read(0xc0000a4d80, {0xc0000f8000?, 0xc000049ad0?, 0x55e5e28af5a5?})
ollama             | 	net/fd_posix.go:55 +0x25 fp=0xc000049aa8 sp=0xc000049a60 pc=0x55e5e29253e5
ollama             | net.(*conn).Read(0xc000526580, {0xc0000f8000?, 0x0?, 0x0?})
ollama             | 	net/net.go:194 +0x45 fp=0xc000049af0 sp=0xc000049aa8 pc=0x55e5e29337a5
ollama             | net/http.(*connReader).Read(0xc000608fc0, {0xc0000f8000, 0x1000, 0x1000})
ollama             | 	net/http/server.go:798 +0x159 fp=0xc000049b40 sp=0xc000049af0 pc=0x55e5e2b1fb39
ollama             | bufio.(*Reader).fill(0xc000034840)
ollama             | 	bufio/bufio.go:113 +0x103 fp=0xc000049b78 sp=0xc000049b40 pc=0x55e5e294af43
ollama             | bufio.(*Reader).Peek(0xc000034840, 0x4)
ollama             | 	bufio/bufio.go:152 +0x53 fp=0xc000049b98 sp=0xc000049b78 pc=0x55e5e294b073
ollama             | net/http.(*conn).serve(0xc0000ce3f0, {0x55e5e3d52548, 0xc000608ed0})
ollama             | 	net/http/server.go:2137 +0x785 fp=0xc000049fb8 sp=0xc000049b98 pc=0x55e5e2b25925
ollama             | net/http.(*Server).Serve.gowrap3()
ollama             | 	net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x55e5e2b2b088
ollama             | runtime.goexit({})
ollama             | 	runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x55e5e282fe21
ollama             | created by net/http.(*Server).Serve in goroutine 1
ollama             | 	net/http/server.go:3454 +0x485
ollama             | 
ollama             | rax    0x7f3e73f19720
ollama             | rbx    0x1
ollama             | rcx    0x580e06a0
ollama             | rdx    0x2
ollama             | rdi    0x7f39580e7f90
ollama             | rsi    0x0
ollama             | rbp    0x7f3958119450
ollama             | rsp    0x7f3e74bfeaa8
ollama             | r8     0x7f3e73f1b
ollama             | r9     0x7
ollama             | r10    0x7f3e73f1be80
ollama             | r11    0xca517d945fa33498
ollama             | r12    0x7f3958119450
ollama             | r13    0x0
ollama             | r14    0x0
ollama             | r15    0x7f3e70000dc0
ollama             | rip    0x7f3e3679c78a
ollama             | rflags 0x10202
ollama             | cs     0x33
ollama             | fs     0x0
ollama             | gs     0x0
ollama             | time=2025-11-24T06:34:48.710Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server not responding"
ollama             | time=2025-11-24T06:34:48.961Z level=INFO source=sched.go:470 msg="Load failed" model=/root/.ollama/models/blobs/sha256-b3a2c9a8fef9be8d2ef951aecca36a36b9ea0b70abe9359eab4315bf4cd9be01 error="llama runner process has terminated: exit status 2"

OS

Docker

GPU

AMD

CPU

AMD

Ollama version

0.13.0

Originally created by @johanatandromeda on GitHub (Nov 24, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13224 ### What is the issue? I get an error loading mistral-small3.2:24b on an AMD RX 7600 XT. The setup is Proxmox 9 on AMD Ryzen 7 5800X -> VM with CPU x86-64-v3 and GPU with PCI passthrough -> Docker -> Ollama 0.13.0. The VM has 32GB RAM and the GPU is 16GB VRAM I also get errors with devstral:24b and deepseek-r1:14b. qwen3:30b-a3b-instruct-2507-q4_k_M works. ### Relevant log output ```shell ollama | time=2025-11-24T06:34:43.466Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-b3a2c9a8fef9be8d2ef951aecca36a36b9ea0b70abe9359eab4315bf4cd9be01 --port 32811" ollama | time=2025-11-24T06:34:43.466Z level=INFO source=sched.go:443 msg="system memory" total="31.4 GiB" free="7.9 GiB" free_swap="6.7 GiB" ollama | time=2025-11-24T06:34:43.466Z level=INFO source=sched.go:450 msg="gpu memory" id=0 library=ROCm available="15.5 GiB" free="16.0 GiB" minimum="457.0 MiB" overhead="0 B" ollama | time=2025-11-24T06:34:43.466Z level=INFO source=server.go:459 msg="loading model" "model layers"=41 requested=-1 ollama | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="9.0 GiB" ollama | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:245 msg="model weights" device=CPU size="4.0 GiB" ollama | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="3.5 GiB" ollama | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.3 GiB" ollama | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="2.2 GiB" ollama | time=2025-11-24T06:34:43.466Z level=INFO source=device.go:272 msg="total memory" size="20.1 GiB" ollama | time=2025-11-24T06:34:43.473Z level=INFO source=runner.go:963 msg="starting go runner" ollama | load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so ollama | /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ollama | ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ollama | ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ollama | ggml_cuda_init: found 1 ROCm devices: ollama | Device 0: AMD Radeon Graphics, gfx1102 (0x1102), VMM: no, Wave Size: 32, ID: 0 ollama | load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so ollama | time=2025-11-24T06:34:44.126Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) ollama | time=2025-11-24T06:34:44.126Z level=INFO source=runner.go:999 msg="Server listening on 127.0.0.1:32811" ollama | time=2025-11-24T06:34:44.131Z level=INFO source=runner.go:893 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:32000 KvCacheType: NumThreads:6 GPULayers:29[ID:0 Layers:29(11..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ollama | ggml_hip_get_device_memory searching for device 0000:00:10.0 ollama | ggml_backend_cuda_device_get_memory device 0000:00:10.0 utilizing AMD specific memory reporting free: 17135820800 total: 17163091968 ollama | llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) (0000:00:10.0) - 16341 MiB free ollama | time=2025-11-24T06:34:44.134Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding" ollama | time=2025-11-24T06:34:44.135Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model" ollama | llama_model_loader: loaded meta data with 41 key-value pairs and 363 tensors from /root/.ollama/models/blobs/sha256-b3a2c9a8fef9be8d2ef951aecca36a36b9ea0b70abe9359eab4315bf4cd9be01 (version GGUF V3 (latest)) ollama | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. ollama | llama_model_loader: - kv 0: general.architecture str = llama ollama | llama_model_loader: - kv 1: general.base_model.0.name str = Devstrall Small 2505 ollama | llama_model_loader: - kv 2: general.base_model.0.organization str = Mistralai ollama | llama_model_loader: - kv 3: general.base_model.0.repo_url str = https://huggingface.co/mistralai/Devs... ollama | llama_model_loader: - kv 4: general.base_model.0.version str = 2505 ollama | llama_model_loader: - kv 5: general.base_model.count u32 = 1 ollama | llama_model_loader: - kv 6: general.basename str = Devstral ollama | llama_model_loader: - kv 7: general.file_type u32 = 15 ollama | llama_model_loader: - kv 8: general.languages arr[str,24] = ["en", "fr", "de", "es", "pt", "it", ... ollama | llama_model_loader: - kv 9: general.license str = apache-2.0 ollama | llama_model_loader: - kv 10: general.name str = Devstral Small 2505 ollama | llama_model_loader: - kv 11: general.parameter_count u64 = 23572403200 ollama | llama_model_loader: - kv 12: general.quantization_version u32 = 2 ollama | llama_model_loader: - kv 13: general.size_label str = Small ollama | llama_model_loader: - kv 14: general.tags arr[str,1] = ["text2text-generation"] ollama | llama_model_loader: - kv 15: general.type str = model ollama | llama_model_loader: - kv 16: general.version str = 2505 ollama | llama_model_loader: - kv 17: llama.attention.head_count u32 = 32 ollama | llama_model_loader: - kv 18: llama.attention.head_count_kv u32 = 8 ollama | llama_model_loader: - kv 19: llama.attention.key_length u32 = 128 ollama | llama_model_loader: - kv 20: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 ollama | llama_model_loader: - kv 21: llama.attention.value_length u32 = 128 ollama | llama_model_loader: - kv 22: llama.block_count u32 = 40 ollama | llama_model_loader: - kv 23: llama.context_length u32 = 131072 ollama | llama_model_loader: - kv 24: llama.embedding_length u32 = 5120 ollama | llama_model_loader: - kv 25: llama.feed_forward_length u32 = 32768 ollama | llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 ollama | llama_model_loader: - kv 27: llama.rope.freq_base f32 = 1000000000.000000 ollama | llama_model_loader: - kv 28: llama.vocab_size u32 = 131072 ollama | llama_model_loader: - kv 29: tokenizer.chat_template str = {%- set today = strftime_now("%Y-%m-%... ollama | llama_model_loader: - kv 30: tokenizer.ggml.add_bos_token bool = true ollama | llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = false ollama | llama_model_loader: - kv 32: tokenizer.ggml.add_space_prefix bool = false ollama | llama_model_loader: - kv 33: tokenizer.ggml.bos_token_id u32 = 1 ollama | llama_model_loader: - kv 34: tokenizer.ggml.eos_token_id u32 = 2 ollama | llama_model_loader: - kv 35: tokenizer.ggml.merges arr[str,269443] = ["Ġ Ġ", "Ġ t", "e r", "i n", "Ġ �... ollama | llama_model_loader: - kv 36: tokenizer.ggml.model str = gpt2 ollama | llama_model_loader: - kv 37: tokenizer.ggml.pre str = tekken ollama | llama_model_loader: - kv 38: tokenizer.ggml.token_type arr[i32,131072] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... ollama | llama_model_loader: - kv 39: tokenizer.ggml.tokens arr[str,131072] = ["<unk>", "<s>", "</s>", "[INST]", "[... ollama | llama_model_loader: - kv 40: tokenizer.ggml.unknown_token_id u32 = 0 ollama | llama_model_loader: - type f32: 81 tensors ollama | llama_model_loader: - type q4_K: 241 tensors ollama | llama_model_loader: - type q6_K: 41 tensors ollama | print_info: file format = GGUF V3 (latest) ollama | print_info: file type = Q4_K - Medium ollama | print_info: file size = 13.34 GiB (4.86 BPW) ollama | load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect ollama | load: printing all EOG tokens: ollama | load: - 2 ('</s>') ollama | load: special tokens cache size = 1000 ollama | load: token to piece cache size = 0.8498 MB ollama | print_info: arch = llama ollama | print_info: vocab_only = 0 ollama | print_info: n_ctx_train = 131072 ollama | print_info: n_embd = 5120 ollama | print_info: n_layer = 40 ollama | print_info: n_head = 32 ollama | print_info: n_head_kv = 8 ollama | print_info: n_rot = 128 ollama | print_info: n_swa = 0 ollama | print_info: is_swa_any = 0 ollama | print_info: n_embd_head_k = 128 ollama | print_info: n_embd_head_v = 128 ollama | print_info: n_gqa = 4 ollama | print_info: n_embd_k_gqa = 1024 ollama | print_info: n_embd_v_gqa = 1024 ollama | print_info: f_norm_eps = 0.0e+00 ollama | print_info: f_norm_rms_eps = 1.0e-05 ollama | print_info: f_clamp_kqv = 0.0e+00 ollama | print_info: f_max_alibi_bias = 0.0e+00 ollama | print_info: f_logit_scale = 0.0e+00 ollama | print_info: f_attn_scale = 0.0e+00 ollama | print_info: n_ff = 32768 ollama | print_info: n_expert = 0 ollama | print_info: n_expert_used = 0 ollama | print_info: causal attn = 1 ollama | print_info: pooling type = 0 ollama | print_info: rope type = 0 ollama | print_info: rope scaling = linear ollama | print_info: freq_base_train = 1000000000.0 ollama | print_info: freq_scale_train = 1 ollama | print_info: n_ctx_orig_yarn = 131072 ollama | print_info: rope_finetuned = unknown ollama | print_info: model type = 13B ollama | print_info: model params = 23.57 B ollama | print_info: general.name = Devstral Small 2505 ollama | print_info: vocab type = BPE ollama | print_info: n_vocab = 131072 ollama | print_info: n_merges = 269443 ollama | print_info: BOS token = 1 '<s>' ollama | print_info: EOS token = 2 '</s>' ollama | print_info: UNK token = 0 '<unk>' ollama | print_info: LF token = 1010 'Ċ' ollama | print_info: EOG token = 2 '</s>' ollama | print_info: max token length = 150 ollama | load_tensors: loading model tensors, this can take a while... (mmap = false) ollama | load_tensors: offloading 29 repeating layers to GPU ollama | load_tensors: offloaded 29/41 layers to GPU ollama | load_tensors: ROCm_Host model buffer size = 4102.60 MiB ollama | load_tensors: ROCm0 model buffer size = 9199.77 MiB ollama | load_tensors: CPU model buffer size = 360.00 MiB ollama | llama_context: constructing llama_context ollama | llama_context: n_seq_max = 1 ollama | llama_context: n_ctx = 32000 ollama | llama_context: n_ctx_per_seq = 32000 ollama | llama_context: n_batch = 512 ollama | llama_context: n_ubatch = 512 ollama | llama_context: causal_attn = 1 ollama | llama_context: flash_attn = disabled ollama | llama_context: kv_unified = false ollama | llama_context: freq_base = 1000000000.0 ollama | llama_context: freq_scale = 1 ollama | llama_context: n_ctx_per_seq (32000) < n_ctx_train (131072) -- the full capacity of the model will not be utilized ollama | llama_context: CPU output buffer size = 0.52 MiB ollama | llama_kv_cache: ROCm0 KV buffer size = 3625.00 MiB ollama | llama_kv_cache: CPU KV buffer size = 1375.00 MiB ollama | llama_kv_cache: size = 5000.00 MiB ( 32000 cells, 40 layers, 1/1 seqs), K (f16): 2500.00 MiB, V (f16): 2500.00 MiB ollama | graph_reserve: failed to allocate compute buffers ollama | SIGSEGV: segmentation violation ollama | PC=0x7f3e3679c78a m=3 sigcode=1 addr=0x7f413461d0d8 ollama | signal arrived during cgo execution ollama | ollama | goroutine 25 gp=0xc000003dc0 m=3 mp=0xc00006b008 [syscall]: ollama | runtime.cgocall(0x55e5e3541b50, 0xc000319c00) ollama | runtime/cgocall.go:167 +0x4b fp=0xc000319bd8 sp=0xc000319ba0 pc=0x55e5e2824b0b ollama | github.com/ollama/ollama/llama._Cfunc_llama_init_from_model(0x7f3e70000dc0, {0x7d00, 0x200, 0x200, 0x1, 0x6, 0x6, 0xffffffff, 0xffffffff, 0xffffffff, ...}) ollama | _cgo_gotypes.go:762 +0x4e fp=0xc000319c00 sp=0xc000319bd8 pc=0x55e5e2bddaae ollama | github.com/ollama/ollama/llama.NewContextWithModel.func1(...) ollama | github.com/ollama/ollama/llama/llama.go:317 ollama | github.com/ollama/ollama/llama.NewContextWithModel(0xc0001d53a0, {{0x7d00, 0x200, 0x200, 0x1, 0x6, 0x6, 0xffffffff, 0xffffffff, 0xffffffff, ...}}) ollama | github.com/ollama/ollama/llama/llama.go:317 +0x158 fp=0xc000319da0 sp=0xc000319c00 pc=0x55e5e2be1cd8 ollama | github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc0000d0dc0, {{0xc0001d4d90, 0x1, 0x1}, 0x1d, 0x0, 0x0, {0xc0001d4d88, 0x1, 0x2}, ...}, ...) ollama | github.com/ollama/ollama/runner/llamarunner/runner.go:845 +0x178 fp=0xc000319ee8 sp=0xc000319da0 pc=0x55e5e2c9a418 ollama | github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2() ollama | github.com/ollama/ollama/runner/llamarunner/runner.go:932 +0x115 fp=0xc000319fe0 sp=0xc000319ee8 pc=0x55e5e2c9b635 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc000319fe8 sp=0xc000319fe0 pc=0x55e5e282fe21 ollama | created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 7 ollama | github.com/ollama/ollama/runner/llamarunner/runner.go:932 +0x88a ollama | ollama | goroutine 1 gp=0xc000002380 m=nil [IO wait]: ollama | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00051d790 sp=0xc00051d770 pc=0x55e5e2827f8e ollama | runtime.netpollblock(0xc00051d7e0?, 0xe27c16c6?, 0xe5?) ollama | runtime/netpoll.go:575 +0xf7 fp=0xc00051d7c8 sp=0xc00051d790 pc=0x55e5e27ed2b7 ollama | internal/poll.runtime_pollWait(0x7f3ec3e13de0, 0x72) ollama | runtime/netpoll.go:351 +0x85 fp=0xc00051d7e8 sp=0xc00051d7c8 pc=0x55e5e28271a5 ollama | internal/poll.(*pollDesc).wait(0xc0000a4d00?, 0x900000036?, 0x0) ollama | internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00051d810 sp=0xc00051d7e8 pc=0x55e5e28af0e7 ollama | internal/poll.(*pollDesc).waitRead(...) ollama | internal/poll/fd_poll_runtime.go:89 ollama | internal/poll.(*FD).Accept(0xc0000a4d00) ollama | internal/poll/fd_unix.go:620 +0x295 fp=0xc00051d8b8 sp=0xc00051d810 pc=0x55e5e28b44b5 ollama | net.(*netFD).accept(0xc0000a4d00) ollama | net/fd_unix.go:172 +0x29 fp=0xc00051d970 sp=0xc00051d8b8 pc=0x55e5e2927389 ollama | net.(*TCPListener).accept(0xc000092f40) ollama | net/tcpsock_posix.go:159 +0x1b fp=0xc00051d9c0 sp=0xc00051d970 pc=0x55e5e293cd3b ollama | net.(*TCPListener).Accept(0xc000092f40) ollama | net/tcpsock.go:380 +0x30 fp=0xc00051d9f0 sp=0xc00051d9c0 pc=0x55e5e293bbf0 ollama | net/http.(*onceCloseListener).Accept(0xc0000ce3f0?) ollama | <autogenerated>:1 +0x24 fp=0xc00051da08 sp=0xc00051d9f0 pc=0x55e5e2b533c4 ollama | net/http.(*Server).Serve(0xc0001f9700, {0x55e5e3d4ff60, 0xc000092f40}) ollama | net/http/server.go:3424 +0x30c fp=0xc00051db38 sp=0xc00051da08 pc=0x55e5e2b2ac8c ollama | github.com/ollama/ollama/runner/llamarunner.Execute({0xc00012e140, 0x4, 0x4}) ollama | github.com/ollama/ollama/runner/llamarunner/runner.go:1000 +0x8f5 fp=0xc00051dd08 sp=0xc00051db38 pc=0x55e5e2c9bff5 ollama | github.com/ollama/ollama/runner.Execute({0xc00012e130?, 0x0?, 0x0?}) ollama | github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc00051dd30 sp=0xc00051dd08 pc=0x55e5e2d42454 ollama | github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001f9400?, {0x55e5e38510ab?, 0x4?, 0x55e5e38510af?}) ollama | github.com/ollama/ollama/cmd/cmd.go:1841 +0x45 fp=0xc00051dd58 sp=0xc00051dd30 pc=0x55e5e34d1d85 ollama | github.com/spf13/cobra.(*Command).execute(0xc0000d5508, {0xc000092d40, 0x4, 0x4}) ollama | github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00051de78 sp=0xc00051dd58 pc=0x55e5e29a09dc ollama | github.com/spf13/cobra.(*Command).ExecuteC(0xc00054cf08) ollama | github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00051df30 sp=0xc00051de78 pc=0x55e5e29a1225 ollama | github.com/spf13/cobra.(*Command).Execute(...) ollama | github.com/spf13/cobra@v1.7.0/command.go:992 ollama | github.com/spf13/cobra.(*Command).ExecuteContext(...) ollama | github.com/spf13/cobra@v1.7.0/command.go:985 ollama | main.main() ollama | github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00051df50 sp=0xc00051df30 pc=0x55e5e34d286d ollama | runtime.main() ollama | runtime/proc.go:283 +0x29d fp=0xc00051dfe0 sp=0xc00051df50 pc=0x55e5e27f493d ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00051dfe8 sp=0xc00051dfe0 pc=0x55e5e282fe21 ollama | ollama | goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: ollama | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc000064fa8 sp=0xc000064f88 pc=0x55e5e2827f8e ollama | runtime.goparkunlock(...) ollama | runtime/proc.go:441 ollama | runtime.forcegchelper() ollama | runtime/proc.go:348 +0xb8 fp=0xc000064fe0 sp=0xc000064fa8 pc=0x55e5e27f4c78 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc000064fe8 sp=0xc000064fe0 pc=0x55e5e282fe21 ollama | created by runtime.init.7 in goroutine 1 ollama | runtime/proc.go:336 +0x1a ollama | ollama | goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: ollama | runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc000065780 sp=0xc000065760 pc=0x55e5e2827f8e ollama | runtime.goparkunlock(...) ollama | runtime/proc.go:441 ollama | runtime.bgsweep(0xc000090000) ollama | runtime/mgcsweep.go:316 +0xdf fp=0xc0000657c8 sp=0xc000065780 pc=0x55e5e27df41f ollama | runtime.gcenable.gowrap1() ollama | runtime/mgc.go:204 +0x25 fp=0xc0000657e0 sp=0xc0000657c8 pc=0x55e5e27d3805 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc0000657e8 sp=0xc0000657e0 pc=0x55e5e282fe21 ollama | created by runtime.gcenable in goroutine 1 ollama | runtime/mgc.go:204 +0x66 ollama | ollama | goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: ollama | runtime.gopark(0x10000?, 0x55e5e3a1b148?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc000065f78 sp=0xc000065f58 pc=0x55e5e2827f8e ollama | runtime.goparkunlock(...) ollama | runtime/proc.go:441 ollama | runtime.(*scavengerState).park(0x55e5e4612100) ollama | runtime/mgcscavenge.go:425 +0x49 fp=0xc000065fa8 sp=0xc000065f78 pc=0x55e5e27dce69 ollama | runtime.bgscavenge(0xc000090000) ollama | runtime/mgcscavenge.go:658 +0x59 fp=0xc000065fc8 sp=0xc000065fa8 pc=0x55e5e27dd3f9 ollama | runtime.gcenable.gowrap2() ollama | runtime/mgc.go:205 +0x25 fp=0xc000065fe0 sp=0xc000065fc8 pc=0x55e5e27d37a5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc000065fe8 sp=0xc000065fe0 pc=0x55e5e282fe21 ollama | created by runtime.gcenable in goroutine 1 ollama | runtime/mgc.go:205 +0xa5 ollama | ollama | goroutine 18 gp=0xc000102700 m=nil [finalizer wait]: ollama | runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000064688?) ollama | runtime/proc.go:435 +0xce fp=0xc000064630 sp=0xc000064610 pc=0x55e5e2827f8e ollama | runtime.runfinq() ollama | runtime/mfinal.go:196 +0x107 fp=0xc0000647e0 sp=0xc000064630 pc=0x55e5e27d27c7 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc0000647e8 sp=0xc0000647e0 pc=0x55e5e282fe21 ollama | created by runtime.createfing in goroutine 1 ollama | runtime/mfinal.go:166 +0x3d ollama | ollama | goroutine 19 gp=0xc000103180 m=nil [chan receive]: ollama | runtime.gopark(0xc0001d3860?, 0xc000590018?, 0x60?, 0x7?, 0x55e5e290dfc8?) ollama | runtime/proc.go:435 +0xce fp=0xc000060718 sp=0xc0000606f8 pc=0x55e5e2827f8e ollama | runtime.chanrecv(0xc000110310, 0x0, 0x1) ollama | runtime/chan.go:664 +0x445 fp=0xc000060790 sp=0xc000060718 pc=0x55e5e27c42a5 ollama | runtime.chanrecv1(0x0?, 0x0?) ollama | runtime/chan.go:506 +0x12 fp=0xc0000607b8 sp=0xc000060790 pc=0x55e5e27c3e32 ollama | runtime.unique_runtime_registerUniqueMapCleanup.func2(...) ollama | runtime/mgc.go:1796 ollama | runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() ollama | runtime/mgc.go:1799 +0x2f fp=0xc0000607e0 sp=0xc0000607b8 pc=0x55e5e27d69af ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc0000607e8 sp=0xc0000607e0 pc=0x55e5e282fe21 ollama | created by unique.runtime_registerUniqueMapCleanup in goroutine 1 ollama | runtime/mgc.go:1794 +0x85 ollama | ollama | goroutine 20 gp=0xc000103500 m=nil [GC worker (idle)]: ollama | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc000060f38 sp=0xc000060f18 pc=0x55e5e2827f8e ollama | runtime.gcBgMarkWorker(0xc000111730) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc000060fc8 sp=0xc000060f38 pc=0x55e5e27d5cc9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc000060fe0 sp=0xc000060fc8 pc=0x55e5e27d5ba5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc000060fe8 sp=0xc000060fe0 pc=0x55e5e282fe21 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 21 gp=0xc0001036c0 m=nil [GC worker (idle)]: ollama | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc000061738 sp=0xc000061718 pc=0x55e5e2827f8e ollama | runtime.gcBgMarkWorker(0xc000111730) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc0000617c8 sp=0xc000061738 pc=0x55e5e27d5cc9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc0000617e0 sp=0xc0000617c8 pc=0x55e5e27d5ba5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc0000617e8 sp=0xc0000617e0 pc=0x55e5e282fe21 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 34 gp=0xc000504000 m=nil [GC worker (idle)]: ollama | runtime.gopark(0x2b4209e582fe?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x55e5e2827f8e ollama | runtime.gcBgMarkWorker(0xc000111730) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x55e5e27d5cc9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x55e5e27d5ba5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x55e5e282fe21 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 5 gp=0xc000003a40 m=nil [GC worker (idle)]: ollama | runtime.gopark(0x2b4209e586dc?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc000066738 sp=0xc000066718 pc=0x55e5e2827f8e ollama | runtime.gcBgMarkWorker(0xc000111730) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc0000667c8 sp=0xc000066738 pc=0x55e5e27d5cc9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc0000667e0 sp=0xc0000667c8 pc=0x55e5e27d5ba5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc0000667e8 sp=0xc0000667e0 pc=0x55e5e282fe21 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 22 gp=0xc000103880 m=nil [GC worker (idle)]: ollama | runtime.gopark(0x2b4209e59078?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc000061f38 sp=0xc000061f18 pc=0x55e5e2827f8e ollama | runtime.gcBgMarkWorker(0xc000111730) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc000061fc8 sp=0xc000061f38 pc=0x55e5e27d5cc9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc000061fe0 sp=0xc000061fc8 pc=0x55e5e27d5ba5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc000061fe8 sp=0xc000061fe0 pc=0x55e5e282fe21 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 23 gp=0xc000103a40 m=nil [GC worker (idle)]: ollama | runtime.gopark(0x2b4209e58254?, 0x0?, 0x0?, 0x0?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc000062738 sp=0xc000062718 pc=0x55e5e2827f8e ollama | runtime.gcBgMarkWorker(0xc000111730) ollama | runtime/mgc.go:1423 +0xe9 fp=0xc0000627c8 sp=0xc000062738 pc=0x55e5e27d5cc9 ollama | runtime.gcBgMarkStartWorkers.gowrap1() ollama | runtime/mgc.go:1339 +0x25 fp=0xc0000627e0 sp=0xc0000627c8 pc=0x55e5e27d5ba5 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc0000627e8 sp=0xc0000627e0 pc=0x55e5e282fe21 ollama | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama | runtime/mgc.go:1339 +0x105 ollama | ollama | goroutine 6 gp=0xc000504c40 m=nil [sync.WaitGroup.Wait]: ollama | runtime.gopark(0x0?, 0x0?, 0x60?, 0x40?, 0x0?) ollama | runtime/proc.go:435 +0xce fp=0xc00050d620 sp=0xc00050d600 pc=0x55e5e2827f8e ollama | runtime.goparkunlock(...) ollama | runtime/proc.go:441 ollama | runtime.semacquire1(0xc0000d0de0, 0x0, 0x1, 0x0, 0x18) ollama | runtime/sema.go:188 +0x229 fp=0xc00050d688 sp=0xc00050d620 pc=0x55e5e2807f09 ollama | sync.runtime_SemacquireWaitGroup(0x0?) ollama | runtime/sema.go:110 +0x25 fp=0xc00050d6c0 sp=0xc00050d688 pc=0x55e5e28298c5 ollama | sync.(*WaitGroup).Wait(0x0?) ollama | sync/waitgroup.go:118 +0x48 fp=0xc00050d6e8 sp=0xc00050d6c0 pc=0x55e5e283b768 ollama | github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0000d0dc0, {0x55e5e3d52580, 0xc00009ebe0}) ollama | github.com/ollama/ollama/runner/llamarunner/runner.go:359 +0x4b fp=0xc00050d7b8 sp=0xc00050d6e8 pc=0x55e5e2c96dcb ollama | github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() ollama | github.com/ollama/ollama/runner/llamarunner/runner.go:979 +0x28 fp=0xc00050d7e0 sp=0xc00050d7b8 pc=0x55e5e2c9c268 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc00050d7e8 sp=0xc00050d7e0 pc=0x55e5e282fe21 ollama | created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 ollama | github.com/ollama/ollama/runner/llamarunner/runner.go:979 +0x4c5 ollama | ollama | goroutine 7 gp=0xc000504e00 m=nil [IO wait]: ollama | runtime.gopark(0x55e5e28b26e5?, 0xc0000a4d80?, 0x40?, 0x9a?, 0xb?) ollama | runtime/proc.go:435 +0xce fp=0xc000049948 sp=0xc000049928 pc=0x55e5e2827f8e ollama | runtime.netpollblock(0x55e5e284b638?, 0xe27c16c6?, 0xe5?) ollama | runtime/netpoll.go:575 +0xf7 fp=0xc000049980 sp=0xc000049948 pc=0x55e5e27ed2b7 ollama | internal/poll.runtime_pollWait(0x7f3ec3e13cc8, 0x72) ollama | runtime/netpoll.go:351 +0x85 fp=0xc0000499a0 sp=0xc000049980 pc=0x55e5e28271a5 ollama | internal/poll.(*pollDesc).wait(0xc0000a4d80?, 0xc0000f8000?, 0x0) ollama | internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000499c8 sp=0xc0000499a0 pc=0x55e5e28af0e7 ollama | internal/poll.(*pollDesc).waitRead(...) ollama | internal/poll/fd_poll_runtime.go:89 ollama | internal/poll.(*FD).Read(0xc0000a4d80, {0xc0000f8000, 0x1000, 0x1000}) ollama | internal/poll/fd_unix.go:165 +0x27a fp=0xc000049a60 sp=0xc0000499c8 pc=0x55e5e28b03da ollama | net.(*netFD).Read(0xc0000a4d80, {0xc0000f8000?, 0xc000049ad0?, 0x55e5e28af5a5?}) ollama | net/fd_posix.go:55 +0x25 fp=0xc000049aa8 sp=0xc000049a60 pc=0x55e5e29253e5 ollama | net.(*conn).Read(0xc000526580, {0xc0000f8000?, 0x0?, 0x0?}) ollama | net/net.go:194 +0x45 fp=0xc000049af0 sp=0xc000049aa8 pc=0x55e5e29337a5 ollama | net/http.(*connReader).Read(0xc000608fc0, {0xc0000f8000, 0x1000, 0x1000}) ollama | net/http/server.go:798 +0x159 fp=0xc000049b40 sp=0xc000049af0 pc=0x55e5e2b1fb39 ollama | bufio.(*Reader).fill(0xc000034840) ollama | bufio/bufio.go:113 +0x103 fp=0xc000049b78 sp=0xc000049b40 pc=0x55e5e294af43 ollama | bufio.(*Reader).Peek(0xc000034840, 0x4) ollama | bufio/bufio.go:152 +0x53 fp=0xc000049b98 sp=0xc000049b78 pc=0x55e5e294b073 ollama | net/http.(*conn).serve(0xc0000ce3f0, {0x55e5e3d52548, 0xc000608ed0}) ollama | net/http/server.go:2137 +0x785 fp=0xc000049fb8 sp=0xc000049b98 pc=0x55e5e2b25925 ollama | net/http.(*Server).Serve.gowrap3() ollama | net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x55e5e2b2b088 ollama | runtime.goexit({}) ollama | runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x55e5e282fe21 ollama | created by net/http.(*Server).Serve in goroutine 1 ollama | net/http/server.go:3454 +0x485 ollama | ollama | rax 0x7f3e73f19720 ollama | rbx 0x1 ollama | rcx 0x580e06a0 ollama | rdx 0x2 ollama | rdi 0x7f39580e7f90 ollama | rsi 0x0 ollama | rbp 0x7f3958119450 ollama | rsp 0x7f3e74bfeaa8 ollama | r8 0x7f3e73f1b ollama | r9 0x7 ollama | r10 0x7f3e73f1be80 ollama | r11 0xca517d945fa33498 ollama | r12 0x7f3958119450 ollama | r13 0x0 ollama | r14 0x0 ollama | r15 0x7f3e70000dc0 ollama | rip 0x7f3e3679c78a ollama | rflags 0x10202 ollama | cs 0x33 ollama | fs 0x0 ollama | gs 0x0 ollama | time=2025-11-24T06:34:48.710Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server not responding" ollama | time=2025-11-24T06:34:48.961Z level=INFO source=sched.go:470 msg="Load failed" model=/root/.ollama/models/blobs/sha256-b3a2c9a8fef9be8d2ef951aecca36a36b9ea0b70abe9359eab4315bf4cd9be01 error="llama runner process has terminated: exit status 2" ``` ### OS Docker ### GPU AMD ### CPU AMD ### Ollama version 0.13.0
GiteaMirror added the bug label 2026-04-29 08:38:05 -05:00
Author
Owner

@jessegross commented on GitHub (Nov 24, 2025):

By default, Devstral is running under the old llama engine, which is not as good at managing memory allocations as the Ollama engine. However, it has been implemented in the Ollama engine, so you should be able turn it on by setting OLLAMA_NEW_ENGINE=1 in the server environment.

In addition, it looks like you might be using the non-Ollama versions of these models, such as from Hugging Face. I would recommend using the versions directly through Ollama.

<!-- gh-comment-id:3573165506 --> @jessegross commented on GitHub (Nov 24, 2025): By default, Devstral is running under the old llama engine, which is not as good at managing memory allocations as the Ollama engine. However, it has been implemented in the Ollama engine, so you should be able turn it on by setting OLLAMA_NEW_ENGINE=1 in the server environment. In addition, it looks like you might be using the non-Ollama versions of these models, such as from Hugging Face. I would recommend using the versions directly through Ollama.
Author
Owner

@johanatandromeda commented on GitHub (Nov 25, 2025):

Thanks. The OLLAMA_NEW_ENGINE=1 solved the problems!

I think I pull the models through ollama. My pulling is

ollama pull llama3.2; ollama pull llama3.1:8b; ollama pull qwen3:0.6b; ollama pull qwen3:1.7b; ollama pull qwen3:4b; ollama pull qwen3:8b; ollama pull qwen3:14b; ollama pull deepseek-r1:14b; ollama pull gpt-oss:20b; ollama pull gemma3:4b; ollama pull gemma3:270m; ollama pull gemma3:12b; ollama pull devstral:24b; ollama pull mistral-small3.2:24b; ollama pull mistral-nemo:12b; ollama pull aliafshar/gemma3-it-qat-tools:27b; ollama pull qwen3:30b-a3b-instruct-2507-q4_k_M
<!-- gh-comment-id:3574003735 --> @johanatandromeda commented on GitHub (Nov 25, 2025): Thanks. The OLLAMA_NEW_ENGINE=1 solved the problems! I think I pull the models through ollama. My pulling is ``` ollama pull llama3.2; ollama pull llama3.1:8b; ollama pull qwen3:0.6b; ollama pull qwen3:1.7b; ollama pull qwen3:4b; ollama pull qwen3:8b; ollama pull qwen3:14b; ollama pull deepseek-r1:14b; ollama pull gpt-oss:20b; ollama pull gemma3:4b; ollama pull gemma3:270m; ollama pull gemma3:12b; ollama pull devstral:24b; ollama pull mistral-small3.2:24b; ollama pull mistral-nemo:12b; ollama pull aliafshar/gemma3-it-qat-tools:27b; ollama pull qwen3:30b-a3b-instruct-2507-q4_k_M ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55255