[GH-ISSUE #8262] Segmentation Fault in AMD GPGPU Applications on 780M #5282

New Issue

GiteaMirror · 2026-04-12T16:27:26-05:00

GiteaMirror commented

2026-04-12 16:27:26 -05:00

Originally created by @zw963 on GitHub (Dec 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/8262

What is the issue?

Hi, I start my ollama model failed again when try use AMD 780M iGPU.

following is the log for HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/bin/ollama serve

 ╰──➤ $ HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/bin/ollama serve
2024/12/28 21:16:53 routes.go:1259: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/zw963/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-12-28T21:16:53.340+08:00 level=INFO source=images.go:757 msg="total blobs: 32"
time=2024-12-28T21:16:53.340+08:00 level=INFO source=images.go:764 msg="total unused blobs removed: 0"
time=2024-12-28T21:16:53.341+08:00 level=INFO source=routes.go:1310 msg="Listening on 127.0.0.1:11434 (version 0.5.4)"
time=2024-12-28T21:16:53.341+08:00 level=INFO source=routes.go:1339 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm_avx]"
time=2024-12-28T21:16:53.341+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2024-12-28T21:16:53.365+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-12-28T21:16:53.366+08:00 level=INFO source=amd_linux.go:391 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.0
time=2024-12-28T21:16:53.366+08:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="16.0 GiB" available="14.8 GiB"
^[[O[GIN] 2024/12/28 - 21:17:00 | 200 |      31.846µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/12/28 - 21:17:00 | 200 |   19.231074ms |       127.0.0.1 | POST     "/api/show"
time=2024-12-28T21:17:01.006+08:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 gpu=0 parallel=4 available=15936040960 required="8.8 GiB"
time=2024-12-28T21:17:01.006+08:00 level=INFO source=server.go:104 msg="system memory" total="46.8 GiB" free="42.7 GiB" free_swap="63.0 GiB"
time=2024-12-28T21:17:01.006+08:00 level=INFO source=memory.go:356 msg="offload to rocm" layers.requested=-1 layers.model=43 layers.offload=43 layers.split="" memory.available="[14.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.8 GiB" memory.required.partial="8.8 GiB" memory.required.kv="2.6 GiB" memory.required.allocations="[8.8 GiB]" memory.weights.total="7.0 GiB" memory.weights.repeating="6.3 GiB" memory.weights.nonrepeating="717.8 MiB" memory.graph.full="507.0 MiB" memory.graph.partial="1.2 GiB"
time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/lib/ollama/runners/rocm_avx/ollama_llama_server runner --model /home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 --ctx-size 8192 --batch-size 512 --n-gpu-layers 43 --threads 8 --parallel 4 --port 12215"
time=2024-12-28T21:17:01.007+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
time=2024-12-28T21:17:01.036+08:00 level=INFO source=runner.go:945 msg="starting go runner"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon 780M, compute capability 11.0, VMM: no
time=2024-12-28T21:17:02.349+08:00 level=INFO source=runner.go:946 msg=system info="ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=8
llama_load_model_from_file: using device ROCm0 (AMD Radeon 780M) - 23866 MiB free
time=2024-12-28T21:17:02.349+08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:12215"
llama_model_loader: loaded meta data with 29 key-value pairs and 464 tensors from /home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma2
llama_model_loader: - kv   1:                               general.name str              = gemma-2-9b-it
llama_model_loader: - kv   2:                      gemma2.context_length u32              = 8192
llama_model_loader: - kv   3:                    gemma2.embedding_length u32              = 3584
llama_model_loader: - kv   4:                         gemma2.block_count u32              = 42
llama_model_loader: - kv   5:                 gemma2.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                gemma2.attention.head_count u32              = 16
llama_model_loader: - kv   7:             gemma2.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:    gemma2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv   9:                gemma2.attention.key_length u32              = 256
llama_model_loader: - kv  10:              gemma2.attention.value_length u32              = 256
llama_model_loader: - kv  11:                          general.file_type u32              = 2
llama_model_loader: - kv  12:              gemma2.attn_logit_softcapping f32              = 50.000000
llama_model_loader: - kv  13:             gemma2.final_logit_softcapping f32              = 30.000000
llama_model_loader: - kv  14:            gemma2.attention.sliding_window u32              = 4096
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,256000]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  18:                      tokenizer.ggml.scores arr[f32,256000]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  19:                  tokenizer.ggml.token_type arr[i32,256000]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  20:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  21:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  22:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  23:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  24:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  25:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  26:                    tokenizer.chat_template str              = {{ bos_token }}{% if messages[0]['rol...
llama_model_loader: - kv  27:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  28:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  169 tensors
llama_model_loader: - type q4_0:  294 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 108
llm_load_vocab: token to piece cache size = 1.6014 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = gemma2
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 256000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 3584
llm_load_print_meta: n_layer          = 42
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 256
llm_load_print_meta: n_swa            = 4096
llm_load_print_meta: n_embd_head_k    = 256
llm_load_print_meta: n_embd_head_v    = 256
llm_load_print_meta: n_gqa            = 2
llm_load_print_meta: n_embd_k_gqa     = 2048
llm_load_print_meta: n_embd_v_gqa     = 2048
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 9B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 9.24 B
llm_load_print_meta: model size       = 5.06 GiB (4.71 BPW)
llm_load_print_meta: general.name     = gemma-2-9b-it
llm_load_print_meta: BOS token        = 2 '<bos>'
llm_load_print_meta: EOS token        = 1 '<eos>'
llm_load_print_meta: EOT token        = 107 '<end_of_turn>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: PAD token        = 0 '<pad>'
llm_load_print_meta: LF token         = 227 '<0x0A>'
llm_load_print_meta: EOG token        = 1 '<eos>'
llm_load_print_meta: EOG token        = 107 '<end_of_turn>'
llm_load_print_meta: max token length = 93
time=2024-12-28T21:17:02.535+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model"
llm_load_tensors: offloading 42 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors:   CPU_Mapped model buffer size =   717.77 MiB
llm_load_tensors:        ROCm0 model buffer size =  5185.21 MiB

Following is the failed logs when i try run ollama run gemma2 in another opened terminal.

SIGSEGV: segmentation violation
PC=0x713070f0fe2b m=5 sigcode=1 addr=0x18
signal arrived during cgo execution

goroutine 20 gp=0xc000104a80 m=5 mp=0xc000100008 [syscall]:
runtime.cgocall(0x5693bccf4990, 0xc000204b78)
        runtime/cgocall.go:167 +0x4b fp=0xc000204b50 sp=0xc000204b18 pc=0x5693bcaa896b
github.com/ollama/ollama/llama._Cfunc_llama_load_model_from_file(0x712ed4000be0, {0x0, 0x2b, 0x1, 0x0, 0x0, 0x0, 0x5693bccf41e0, 0xc000208000, 0x0, ...})
        _cgo_gotypes.go:707 +0x50 fp=0xc000204b78 sp=0xc000204b50 pc=0x5693bcb53250
github.com/ollama/ollama/llama.LoadModelFromFile.func1({0x7ffc8d222d0e?, 0x0?}, {0x0, 0x2b, 0x1, 0x0, 0x0, 0x0, 0x5693bccf41e0, 0xc000208000, ...})
        github.com/ollama/ollama/llama/llama.go:311 +0x127 fp=0xc000204c78 sp=0xc000204b78 pc=0x5693bcb55e67
github.com/ollama/ollama/llama.LoadModelFromFile({0x7ffc8d222d0e, 0x68}, {0x2b, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc00011e1b0, ...})
        github.com/ollama/ollama/llama/llama.go:311 +0x2d6 fp=0xc000204dc8 sp=0xc000204c78 pc=0x5693bcb55b56
github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc0001461b0, {0x2b, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc00011e1b0, 0x0}, ...)
        github.com/ollama/ollama/llama/runner/runner.go:859 +0xc5 fp=0xc000204f10 sp=0xc000204dc8 pc=0x5693bccf1c25
github.com/ollama/ollama/llama/runner.Execute.gowrap1()
        github.com/ollama/ollama/llama/runner/runner.go:979 +0xda fp=0xc000204fe0 sp=0xc000204f10 pc=0x5693bccf357a
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000204fe8 sp=0xc000204fe0 pc=0x5693bcab63a1
created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1
        github.com/ollama/ollama/llama/runner/runner.go:979 +0xd0d

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc0000637b0 sp=0xc000063790 pc=0x5693bcaae76e
runtime.netpollblock(0xc000063800?, 0xbca46fc6?, 0x93?)
        runtime/netpoll.go:575 +0xf7 fp=0xc0000637e8 sp=0xc0000637b0 pc=0x5693bca734d7
internal/poll.runtime_pollWait(0x712f89fca730, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000063808 sp=0xc0000637e8 pc=0x5693bcaada65
internal/poll.(*pollDesc).wait(0xc000190100?, 0x2c?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000063830 sp=0xc000063808 pc=0x5693bcb038a7
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000190100)
        internal/poll/fd_unix.go:620 +0x295 fp=0xc0000638d8 sp=0xc000063830 pc=0x5693bcb04e15
net.(*netFD).accept(0xc000190100)
        net/fd_unix.go:172 +0x29 fp=0xc000063990 sp=0xc0000638d8 pc=0x5693bcb7d7a9
net.(*TCPListener).accept(0xc00012e6c0)
        net/tcpsock_posix.go:159 +0x1e fp=0xc0000639e0 sp=0xc000063990 pc=0x5693bcb8ddfe
net.(*TCPListener).Accept(0xc00012e6c0)
        net/tcpsock.go:372 +0x30 fp=0xc000063a10 sp=0xc0000639e0 pc=0x5693bcb8d130
net/http.(*onceCloseListener).Accept(0xc000146240?)
        <autogenerated>:1 +0x24 fp=0xc000063a28 sp=0xc000063a10 pc=0x5693bcccbd04
net/http.(*Server).Serve(0xc00018e4b0, {0x5693bd0cbeb8, 0xc00012e6c0})
        net/http/server.go:3330 +0x30c fp=0xc000063b58 sp=0xc000063a28 pc=0x5693bccbda4c
github.com/ollama/ollama/llama/runner.Execute({0xc000132010?, 0x5693bcab5ffc?, 0x0?})
        github.com/ollama/ollama/llama/runner/runner.go:1005 +0x11a9 fp=0xc000063ef8 sp=0xc000063b58 pc=0x5693bccf3149
main.main()
        github.com/ollama/ollama/cmd/runner/main.go:11 +0x54 fp=0xc000063f50 sp=0xc000063ef8 pc=0x5693bccf40d4
runtime.main()
        runtime/proc.go:272 +0x29d fp=0xc000063fe0 sp=0xc000063f50 pc=0x5693bca7aabd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000063fe8 sp=0xc000063fe0 pc=0x5693bcab63a1

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000098fa8 sp=0xc000098f88 pc=0x5693bcaae76e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.forcegchelper()
        runtime/proc.go:337 +0xb8 fp=0xc000098fe0 sp=0xc000098fa8 pc=0x5693bca7adf8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000098fe8 sp=0xc000098fe0 pc=0x5693bcab63a1
created by runtime.init.7 in goroutine 1
        runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000099780 sp=0xc000099760 pc=0x5693bcaae76e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.bgsweep(0xc000026400)
        runtime/mgcsweep.go:277 +0x94 fp=0xc0000997c8 sp=0xc000099780 pc=0x5693bca65634
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc0000997e0 sp=0xc0000997c8 pc=0x5693bca59ee5
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000997e8 sp=0xc0000997e0 pc=0x5693bcab63a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0xc000026400?, 0x5693bcfb8fc0?, 0x1?, 0x0?, 0xc000007340?)
        runtime/proc.go:424 +0xce fp=0xc000099f78 sp=0xc000099f58 pc=0x5693bcaae76e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.(*scavengerState).park(0x5693bd2b6380)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc000099fa8 sp=0xc000099f78 pc=0x5693bca63069
runtime.bgscavenge(0xc000026400)
        runtime/mgcscavenge.go:653 +0x3c fp=0xc000099fc8 sp=0xc000099fa8 pc=0x5693bca635dc
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc000099fe0 sp=0xc000099fc8 pc=0x5693bca59e85
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000099fe8 sp=0xc000099fe0 pc=0x5693bcab63a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 18 gp=0xc000104700 m=nil [finalizer wait]:
runtime.gopark(0xc000098648?, 0x5693bca503e5?, 0xb0?, 0x1?, 0xc0000061c0?)
        runtime/proc.go:424 +0xce fp=0xc000098620 sp=0xc000098600 pc=0x5693bcaae76e
runtime.runfinq()
        runtime/mfinal.go:193 +0x107 fp=0xc0000987e0 sp=0xc000098620 pc=0x5693bca58f67
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000987e8 sp=0xc0000987e0 pc=0x5693bcab63a1
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:163 +0x3d

goroutine 19 gp=0xc0001048c0 m=nil [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000094718 sp=0xc0000946f8 pc=0x5693bcaae76e
runtime.chanrecv(0xc0001120e0, 0x0, 0x1)
        runtime/chan.go:639 +0x41c fp=0xc000094790 sp=0xc000094718 pc=0x5693bca49bbc
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:489 +0x12 fp=0xc0000947b8 sp=0xc000094790 pc=0x5693bca49792
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
        runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1784 +0x2f fp=0xc0000947e0 sp=0xc0000947b8 pc=0x5693bca5cd4f
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000947e8 sp=0xc0000947e0 pc=0x5693bcab63a1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1779 +0x96

goroutine 21 gp=0xc000104c40 m=nil [semacquire]:
runtime.gopark(0x0?, 0x0?, 0x20?, 0x81?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000095618 sp=0xc0000955f8 pc=0x5693bcaae76e
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.semacquire1(0xc0001461b8, 0x0, 0x1, 0x0, 0x12)
        runtime/sema.go:178 +0x22c fp=0xc000095680 sp=0xc000095618 pc=0x5693bca8da8c
sync.runtime_Semacquire(0x0?)
        runtime/sema.go:71 +0x25 fp=0xc0000956b8 sp=0xc000095680 pc=0x5693bcaaf9a5
sync.(*WaitGroup).Wait(0x0?)
        sync/waitgroup.go:118 +0x48 fp=0xc0000956e0 sp=0xc0000956b8 pc=0x5693bcacbc48
github.com/ollama/ollama/llama/runner.(*Server).run(0xc0001461b0, {0x5693bd0cc4a0, 0xc000196050})
        github.com/ollama/ollama/llama/runner/runner.go:315 +0x47 fp=0xc0000957b8 sp=0xc0000956e0 pc=0x5693bccee2c7
github.com/ollama/ollama/llama/runner.Execute.gowrap2()
        github.com/ollama/ollama/llama/runner/runner.go:984 +0x28 fp=0xc0000957e0 sp=0xc0000957b8 pc=0x5693bccf3468
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000957e8 sp=0xc0000957e0 pc=0x5693bcab63a1
created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1
        github.com/ollama/ollama/llama/runner/runner.go:984 +0xde5

goroutine 22 gp=0xc000105340 m=nil [IO wait]:
runtime.gopark(0xc0002a6000?, 0xc000185958?, 0x3e?, 0x1?, 0xb?)
        runtime/proc.go:424 +0xce fp=0xc000185918 sp=0xc0001858f8 pc=0x5693bcaae76e
runtime.netpollblock(0x5693bcae9f98?, 0xbca46fc6?, 0x93?)
        runtime/netpoll.go:575 +0xf7 fp=0xc000185950 sp=0xc000185918 pc=0x5693bca734d7
internal/poll.runtime_pollWait(0x712f89fca618, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000185970 sp=0xc000185950 pc=0x5693bcaada65
internal/poll.(*pollDesc).wait(0xc000190180?, 0xc0001b8000?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000185998 sp=0xc000185970 pc=0x5693bcb038a7
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000190180, {0xc0001b8000, 0x1000, 0x1000})
        internal/poll/fd_unix.go:165 +0x27a fp=0xc000185a30 sp=0xc000185998 pc=0x5693bcb043fa
net.(*netFD).Read(0xc000190180, {0xc0001b8000?, 0xc000185aa0?, 0x5693bcb03d65?})
        net/fd_posix.go:55 +0x25 fp=0xc000185a78 sp=0xc000185a30 pc=0x5693bcb7c6c5
net.(*conn).Read(0xc000124098, {0xc0001b8000?, 0x0?, 0xc00012d058?})
        net/net.go:189 +0x45 fp=0xc000185ac0 sp=0xc000185a78 pc=0x5693bcb860c5
net.(*TCPConn).Read(0xc00012d050?, {0xc0001b8000?, 0xc000190180?, 0xc000185af8?})
        <autogenerated>:1 +0x25 fp=0xc000185af0 sp=0xc000185ac0 pc=0x5693bcb93165
net/http.(*connReader).Read(0xc00012d050, {0xc0001b8000, 0x1000, 0x1000})
        net/http/server.go:798 +0x14b fp=0xc000185b40 sp=0xc000185af0 pc=0x5693bccb434b
bufio.(*Reader).fill(0xc000130480)
        bufio/bufio.go:110 +0x103 fp=0xc000185b78 sp=0xc000185b40 pc=0x5693bcc72f63
bufio.(*Reader).Peek(0xc000130480, 0x4)
        bufio/bufio.go:148 +0x53 fp=0xc000185b98 sp=0xc000185b78 pc=0x5693bcc73093
net/http.(*conn).serve(0xc000146240, {0x5693bd0cc468, 0xc00012cf60})
        net/http/server.go:2127 +0x738 fp=0xc000185fb8 sp=0xc000185b98 pc=0x5693bccb9698
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3360 +0x28 fp=0xc000185fe0 sp=0xc000185fb8 pc=0x5693bccbde48
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000185fe8 sp=0xc000185fe0 pc=0x5693bcab63a1
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3360 +0x485

rax    0x712ed72c8ad0
rbx    0x712ed72ced40
rcx    0x713070da6663
rdx    0x712ed4005130
rdi    0x712ed72ced40
rsi    0x3
rbp    0x712edbff61d0
rsp    0x712edbff61a0
r8     0x0
r9     0x0
r10    0x4
r11    0xa66e143e45c2eb86
r12    0x0
r13    0x18
r14    0xffffffffffffffc0
r15    0x712dc3ef8e80
rip    0x713070f0fe2b
rflags 0x10206
cs     0x33
fs     0x0
gs     0x0
time=2024-12-28T21:17:03.119+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
time=2024-12-28T21:17:03.370+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2024/12/28 - 21:17:03 | 500 |  2.421283447s |       127.0.0.1 | POST     "/api/generate"

I----------------

Following is my packages info:

 ╰──➤ $ 1  pacman -Q |grep 'ollama\|rocm'
ollama 0.5.4-1
ollama-rocm 0.5.4-1
python-pytorch-rocm 2.5.1-7
rocm-clang-ocl 6.1.2-1
rocm-cmake 6.2.4-1
rocm-core 6.2.4-2
rocm-device-libs 6.2.4-1
rocm-hip-libraries 6.2.2-1
rocm-hip-runtime 6.2.2-1
rocm-hip-sdk 6.2.2-1
rocm-language-runtime 6.2.2-1
rocm-llvm 6.2.4-1
rocm-opencl-runtime 6.2.4-1
rocm-opencl-sdk 6.2.2-1
rocm-smi-lib 6.2.4-1
rocminfo 6.2.4-1

Thanks

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.5.4-1 tested both on arch linux installed version and github release page downloaded version.
It works before and broken after i update my arch linux before create this issue.

Originally created by @zw963 on GitHub (Dec 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/8262 ### What is the issue? Hi, I start my ollama model failed again when try use AMD 780M iGPU. following is the log for `HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/bin/ollama serve` ```sh ╰──➤ $ HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/bin/ollama serve 2024/12/28 21:16:53 routes.go:1259: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/zw963/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2024-12-28T21:16:53.340+08:00 level=INFO source=images.go:757 msg="total blobs: 32" time=2024-12-28T21:16:53.340+08:00 level=INFO source=images.go:764 msg="total unused blobs removed: 0" time=2024-12-28T21:16:53.341+08:00 level=INFO source=routes.go:1310 msg="Listening on 127.0.0.1:11434 (version 0.5.4)" time=2024-12-28T21:16:53.341+08:00 level=INFO source=routes.go:1339 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm_avx]" time=2024-12-28T21:16:53.341+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs" time=2024-12-28T21:16:53.365+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-12-28T21:16:53.366+08:00 level=INFO source=amd_linux.go:391 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.0 time=2024-12-28T21:16:53.366+08:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="16.0 GiB" available="14.8 GiB" ^[[O[GIN] 2024/12/28 - 21:17:00 | 200 | 31.846µs | 127.0.0.1 | HEAD "/" [GIN] 2024/12/28 - 21:17:00 | 200 | 19.231074ms | 127.0.0.1 | POST "/api/show" time=2024-12-28T21:17:01.006+08:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 gpu=0 parallel=4 available=15936040960 required="8.8 GiB" time=2024-12-28T21:17:01.006+08:00 level=INFO source=server.go:104 msg="system memory" total="46.8 GiB" free="42.7 GiB" free_swap="63.0 GiB" time=2024-12-28T21:17:01.006+08:00 level=INFO source=memory.go:356 msg="offload to rocm" layers.requested=-1 layers.model=43 layers.offload=43 layers.split="" memory.available="[14.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.8 GiB" memory.required.partial="8.8 GiB" memory.required.kv="2.6 GiB" memory.required.allocations="[8.8 GiB]" memory.weights.total="7.0 GiB" memory.weights.repeating="6.3 GiB" memory.weights.nonrepeating="717.8 MiB" memory.graph.full="507.0 MiB" memory.graph.partial="1.2 GiB" time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/lib/ollama/runners/rocm_avx/ollama_llama_server runner --model /home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 --ctx-size 8192 --batch-size 512 --n-gpu-layers 43 --threads 8 --parallel 4 --port 12215" time=2024-12-28T21:17:01.007+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding" time=2024-12-28T21:17:01.007+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" time=2024-12-28T21:17:01.036+08:00 level=INFO source=runner.go:945 msg="starting go runner" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon 780M, compute capability 11.0, VMM: no time=2024-12-28T21:17:02.349+08:00 level=INFO source=runner.go:946 msg=system info="ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=8 llama_load_model_from_file: using device ROCm0 (AMD Radeon 780M) - 23866 MiB free time=2024-12-28T21:17:02.349+08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:12215" llama_model_loader: loaded meta data with 29 key-value pairs and 464 tensors from /home/zw963/.ollama/models/blobs/sha256-ff1d1fc78170d787ee1201778e2dd65ea211654ca5fb7d69b5a2e7b123a50373 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = gemma2 llama_model_loader: - kv 1: general.name str = gemma-2-9b-it llama_model_loader: - kv 2: gemma2.context_length u32 = 8192 llama_model_loader: - kv 3: gemma2.embedding_length u32 = 3584 llama_model_loader: - kv 4: gemma2.block_count u32 = 42 llama_model_loader: - kv 5: gemma2.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: gemma2.attention.head_count u32 = 16 llama_model_loader: - kv 7: gemma2.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: gemma2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 9: gemma2.attention.key_length u32 = 256 llama_model_loader: - kv 10: gemma2.attention.value_length u32 = 256 llama_model_loader: - kv 11: general.file_type u32 = 2 llama_model_loader: - kv 12: gemma2.attn_logit_softcapping f32 = 50.000000 llama_model_loader: - kv 13: gemma2.final_logit_softcapping f32 = 30.000000 llama_model_loader: - kv 14: gemma2.attention.sliding_window u32 = 4096 llama_model_loader: - kv 15: tokenizer.ggml.model str = llama llama_model_loader: - kv 16: tokenizer.ggml.pre str = default llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,256000] = ["<pad>", "<eos>", "<bos>", "<unk>", ... llama_model_loader: - kv 18: tokenizer.ggml.scores arr[f32,256000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 19: tokenizer.ggml.token_type arr[i32,256000] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... llama_model_loader: - kv 20: tokenizer.ggml.bos_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.ggml.eos_token_id u32 = 1 llama_model_loader: - kv 22: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 23: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 24: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 25: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 26: tokenizer.chat_template str = {{ bos_token }}{% if messages[0]['rol... llama_model_loader: - kv 27: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 169 tensors llama_model_loader: - type q4_0: 294 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 108 llm_load_vocab: token to piece cache size = 1.6014 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = gemma2 llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 256000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 3584 llm_load_print_meta: n_layer = 42 llm_load_print_meta: n_head = 16 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 256 llm_load_print_meta: n_swa = 4096 llm_load_print_meta: n_embd_head_k = 256 llm_load_print_meta: n_embd_head_v = 256 llm_load_print_meta: n_gqa = 2 llm_load_print_meta: n_embd_k_gqa = 2048 llm_load_print_meta: n_embd_v_gqa = 2048 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 9B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 9.24 B llm_load_print_meta: model size = 5.06 GiB (4.71 BPW) llm_load_print_meta: general.name = gemma-2-9b-it llm_load_print_meta: BOS token = 2 '<bos>' llm_load_print_meta: EOS token = 1 '<eos>' llm_load_print_meta: EOT token = 107 '<end_of_turn>' llm_load_print_meta: UNK token = 3 '<unk>' llm_load_print_meta: PAD token = 0 '<pad>' llm_load_print_meta: LF token = 227 '<0x0A>' llm_load_print_meta: EOG token = 1 '<eos>' llm_load_print_meta: EOG token = 107 '<end_of_turn>' llm_load_print_meta: max token length = 93 time=2024-12-28T21:17:02.535+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model" llm_load_tensors: offloading 42 repeating layers to GPU llm_load_tensors: offloading output layer to GPU llm_load_tensors: offloaded 43/43 layers to GPU llm_load_tensors: CPU_Mapped model buffer size = 717.77 MiB llm_load_tensors: ROCm0 model buffer size = 5185.21 MiB ``` ---------------- Following is the failed logs when i try run `ollama run gemma2` in another opened terminal. ``` SIGSEGV: segmentation violation PC=0x713070f0fe2b m=5 sigcode=1 addr=0x18 signal arrived during cgo execution goroutine 20 gp=0xc000104a80 m=5 mp=0xc000100008 [syscall]: runtime.cgocall(0x5693bccf4990, 0xc000204b78) runtime/cgocall.go:167 +0x4b fp=0xc000204b50 sp=0xc000204b18 pc=0x5693bcaa896b github.com/ollama/ollama/llama._Cfunc_llama_load_model_from_file(0x712ed4000be0, {0x0, 0x2b, 0x1, 0x0, 0x0, 0x0, 0x5693bccf41e0, 0xc000208000, 0x0, ...}) _cgo_gotypes.go:707 +0x50 fp=0xc000204b78 sp=0xc000204b50 pc=0x5693bcb53250 github.com/ollama/ollama/llama.LoadModelFromFile.func1({0x7ffc8d222d0e?, 0x0?}, {0x0, 0x2b, 0x1, 0x0, 0x0, 0x0, 0x5693bccf41e0, 0xc000208000, ...}) github.com/ollama/ollama/llama/llama.go:311 +0x127 fp=0xc000204c78 sp=0xc000204b78 pc=0x5693bcb55e67 github.com/ollama/ollama/llama.LoadModelFromFile({0x7ffc8d222d0e, 0x68}, {0x2b, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc00011e1b0, ...}) github.com/ollama/ollama/llama/llama.go:311 +0x2d6 fp=0xc000204dc8 sp=0xc000204c78 pc=0x5693bcb55b56 github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc0001461b0, {0x2b, 0x0, 0x1, 0x0, {0x0, 0x0, 0x0}, 0xc00011e1b0, 0x0}, ...) github.com/ollama/ollama/llama/runner/runner.go:859 +0xc5 fp=0xc000204f10 sp=0xc000204dc8 pc=0x5693bccf1c25 github.com/ollama/ollama/llama/runner.Execute.gowrap1() github.com/ollama/ollama/llama/runner/runner.go:979 +0xda fp=0xc000204fe0 sp=0xc000204f10 pc=0x5693bccf357a runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000204fe8 sp=0xc000204fe0 pc=0x5693bcab63a1 created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1 github.com/ollama/ollama/llama/runner/runner.go:979 +0xd0d goroutine 1 gp=0xc0000061c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc0000637b0 sp=0xc000063790 pc=0x5693bcaae76e runtime.netpollblock(0xc000063800?, 0xbca46fc6?, 0x93?) runtime/netpoll.go:575 +0xf7 fp=0xc0000637e8 sp=0xc0000637b0 pc=0x5693bca734d7 internal/poll.runtime_pollWait(0x712f89fca730, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000063808 sp=0xc0000637e8 pc=0x5693bcaada65 internal/poll.(*pollDesc).wait(0xc000190100?, 0x2c?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000063830 sp=0xc000063808 pc=0x5693bcb038a7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000190100) internal/poll/fd_unix.go:620 +0x295 fp=0xc0000638d8 sp=0xc000063830 pc=0x5693bcb04e15 net.(*netFD).accept(0xc000190100) net/fd_unix.go:172 +0x29 fp=0xc000063990 sp=0xc0000638d8 pc=0x5693bcb7d7a9 net.(*TCPListener).accept(0xc00012e6c0) net/tcpsock_posix.go:159 +0x1e fp=0xc0000639e0 sp=0xc000063990 pc=0x5693bcb8ddfe net.(*TCPListener).Accept(0xc00012e6c0) net/tcpsock.go:372 +0x30 fp=0xc000063a10 sp=0xc0000639e0 pc=0x5693bcb8d130 net/http.(*onceCloseListener).Accept(0xc000146240?) <autogenerated>:1 +0x24 fp=0xc000063a28 sp=0xc000063a10 pc=0x5693bcccbd04 net/http.(*Server).Serve(0xc00018e4b0, {0x5693bd0cbeb8, 0xc00012e6c0}) net/http/server.go:3330 +0x30c fp=0xc000063b58 sp=0xc000063a28 pc=0x5693bccbda4c github.com/ollama/ollama/llama/runner.Execute({0xc000132010?, 0x5693bcab5ffc?, 0x0?}) github.com/ollama/ollama/llama/runner/runner.go:1005 +0x11a9 fp=0xc000063ef8 sp=0xc000063b58 pc=0x5693bccf3149 main.main() github.com/ollama/ollama/cmd/runner/main.go:11 +0x54 fp=0xc000063f50 sp=0xc000063ef8 pc=0x5693bccf40d4 runtime.main() runtime/proc.go:272 +0x29d fp=0xc000063fe0 sp=0xc000063f50 pc=0x5693bca7aabd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000063fe8 sp=0xc000063fe0 pc=0x5693bcab63a1 goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000098fa8 sp=0xc000098f88 pc=0x5693bcaae76e runtime.goparkunlock(...) runtime/proc.go:430 runtime.forcegchelper() runtime/proc.go:337 +0xb8 fp=0xc000098fe0 sp=0xc000098fa8 pc=0x5693bca7adf8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000098fe8 sp=0xc000098fe0 pc=0x5693bcab63a1 created by runtime.init.7 in goroutine 1 runtime/proc.go:325 +0x1a goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000099780 sp=0xc000099760 pc=0x5693bcaae76e runtime.goparkunlock(...) runtime/proc.go:430 runtime.bgsweep(0xc000026400) runtime/mgcsweep.go:277 +0x94 fp=0xc0000997c8 sp=0xc000099780 pc=0x5693bca65634 runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000997e0 sp=0xc0000997c8 pc=0x5693bca59ee5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000997e8 sp=0xc0000997e0 pc=0x5693bcab63a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: runtime.gopark(0xc000026400?, 0x5693bcfb8fc0?, 0x1?, 0x0?, 0xc000007340?) runtime/proc.go:424 +0xce fp=0xc000099f78 sp=0xc000099f58 pc=0x5693bcaae76e runtime.goparkunlock(...) runtime/proc.go:430 runtime.(*scavengerState).park(0x5693bd2b6380) runtime/mgcscavenge.go:425 +0x49 fp=0xc000099fa8 sp=0xc000099f78 pc=0x5693bca63069 runtime.bgscavenge(0xc000026400) runtime/mgcscavenge.go:653 +0x3c fp=0xc000099fc8 sp=0xc000099fa8 pc=0x5693bca635dc runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000099fe0 sp=0xc000099fc8 pc=0x5693bca59e85 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000099fe8 sp=0xc000099fe0 pc=0x5693bcab63a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 18 gp=0xc000104700 m=nil [finalizer wait]: runtime.gopark(0xc000098648?, 0x5693bca503e5?, 0xb0?, 0x1?, 0xc0000061c0?) runtime/proc.go:424 +0xce fp=0xc000098620 sp=0xc000098600 pc=0x5693bcaae76e runtime.runfinq() runtime/mfinal.go:193 +0x107 fp=0xc0000987e0 sp=0xc000098620 pc=0x5693bca58f67 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000987e8 sp=0xc0000987e0 pc=0x5693bcab63a1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:163 +0x3d goroutine 19 gp=0xc0001048c0 m=nil [chan receive]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000094718 sp=0xc0000946f8 pc=0x5693bcaae76e runtime.chanrecv(0xc0001120e0, 0x0, 0x1) runtime/chan.go:639 +0x41c fp=0xc000094790 sp=0xc000094718 pc=0x5693bca49bbc runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:489 +0x12 fp=0xc0000947b8 sp=0xc000094790 pc=0x5693bca49792 runtime.unique_runtime_registerUniqueMapCleanup.func1(...) runtime/mgc.go:1781 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1784 +0x2f fp=0xc0000947e0 sp=0xc0000947b8 pc=0x5693bca5cd4f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000947e8 sp=0xc0000947e0 pc=0x5693bcab63a1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1779 +0x96 goroutine 21 gp=0xc000104c40 m=nil [semacquire]: runtime.gopark(0x0?, 0x0?, 0x20?, 0x81?, 0x0?) runtime/proc.go:424 +0xce fp=0xc000095618 sp=0xc0000955f8 pc=0x5693bcaae76e runtime.goparkunlock(...) runtime/proc.go:430 runtime.semacquire1(0xc0001461b8, 0x0, 0x1, 0x0, 0x12) runtime/sema.go:178 +0x22c fp=0xc000095680 sp=0xc000095618 pc=0x5693bca8da8c sync.runtime_Semacquire(0x0?) runtime/sema.go:71 +0x25 fp=0xc0000956b8 sp=0xc000095680 pc=0x5693bcaaf9a5 sync.(*WaitGroup).Wait(0x0?) sync/waitgroup.go:118 +0x48 fp=0xc0000956e0 sp=0xc0000956b8 pc=0x5693bcacbc48 github.com/ollama/ollama/llama/runner.(*Server).run(0xc0001461b0, {0x5693bd0cc4a0, 0xc000196050}) github.com/ollama/ollama/llama/runner/runner.go:315 +0x47 fp=0xc0000957b8 sp=0xc0000956e0 pc=0x5693bccee2c7 github.com/ollama/ollama/llama/runner.Execute.gowrap2() github.com/ollama/ollama/llama/runner/runner.go:984 +0x28 fp=0xc0000957e0 sp=0xc0000957b8 pc=0x5693bccf3468 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000957e8 sp=0xc0000957e0 pc=0x5693bcab63a1 created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1 github.com/ollama/ollama/llama/runner/runner.go:984 +0xde5 goroutine 22 gp=0xc000105340 m=nil [IO wait]: runtime.gopark(0xc0002a6000?, 0xc000185958?, 0x3e?, 0x1?, 0xb?) runtime/proc.go:424 +0xce fp=0xc000185918 sp=0xc0001858f8 pc=0x5693bcaae76e runtime.netpollblock(0x5693bcae9f98?, 0xbca46fc6?, 0x93?) runtime/netpoll.go:575 +0xf7 fp=0xc000185950 sp=0xc000185918 pc=0x5693bca734d7 internal/poll.runtime_pollWait(0x712f89fca618, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000185970 sp=0xc000185950 pc=0x5693bcaada65 internal/poll.(*pollDesc).wait(0xc000190180?, 0xc0001b8000?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000185998 sp=0xc000185970 pc=0x5693bcb038a7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000190180, {0xc0001b8000, 0x1000, 0x1000}) internal/poll/fd_unix.go:165 +0x27a fp=0xc000185a30 sp=0xc000185998 pc=0x5693bcb043fa net.(*netFD).Read(0xc000190180, {0xc0001b8000?, 0xc000185aa0?, 0x5693bcb03d65?}) net/fd_posix.go:55 +0x25 fp=0xc000185a78 sp=0xc000185a30 pc=0x5693bcb7c6c5 net.(*conn).Read(0xc000124098, {0xc0001b8000?, 0x0?, 0xc00012d058?}) net/net.go:189 +0x45 fp=0xc000185ac0 sp=0xc000185a78 pc=0x5693bcb860c5 net.(*TCPConn).Read(0xc00012d050?, {0xc0001b8000?, 0xc000190180?, 0xc000185af8?}) <autogenerated>:1 +0x25 fp=0xc000185af0 sp=0xc000185ac0 pc=0x5693bcb93165 net/http.(*connReader).Read(0xc00012d050, {0xc0001b8000, 0x1000, 0x1000}) net/http/server.go:798 +0x14b fp=0xc000185b40 sp=0xc000185af0 pc=0x5693bccb434b bufio.(*Reader).fill(0xc000130480) bufio/bufio.go:110 +0x103 fp=0xc000185b78 sp=0xc000185b40 pc=0x5693bcc72f63 bufio.(*Reader).Peek(0xc000130480, 0x4) bufio/bufio.go:148 +0x53 fp=0xc000185b98 sp=0xc000185b78 pc=0x5693bcc73093 net/http.(*conn).serve(0xc000146240, {0x5693bd0cc468, 0xc00012cf60}) net/http/server.go:2127 +0x738 fp=0xc000185fb8 sp=0xc000185b98 pc=0x5693bccb9698 net/http.(*Server).Serve.gowrap3() net/http/server.go:3360 +0x28 fp=0xc000185fe0 sp=0xc000185fb8 pc=0x5693bccbde48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000185fe8 sp=0xc000185fe0 pc=0x5693bcab63a1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3360 +0x485 rax 0x712ed72c8ad0 rbx 0x712ed72ced40 rcx 0x713070da6663 rdx 0x712ed4005130 rdi 0x712ed72ced40 rsi 0x3 rbp 0x712edbff61d0 rsp 0x712edbff61a0 r8 0x0 r9 0x0 r10 0x4 r11 0xa66e143e45c2eb86 r12 0x0 r13 0x18 r14 0xffffffffffffffc0 r15 0x712dc3ef8e80 rip 0x713070f0fe2b rflags 0x10206 cs 0x33 fs 0x0 gs 0x0 time=2024-12-28T21:17:03.119+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" time=2024-12-28T21:17:03.370+08:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 2" [GIN] 2024/12/28 - 21:17:03 | 500 | 2.421283447s | 127.0.0.1 | POST "/api/generate" ``` I---------------- Following is my packages info: ``` ╰──➤ $ 1 pacman -Q |grep 'ollama\|rocm' ollama 0.5.4-1 ollama-rocm 0.5.4-1 python-pytorch-rocm 2.5.1-7 rocm-clang-ocl 6.1.2-1 rocm-cmake 6.2.4-1 rocm-core 6.2.4-2 rocm-device-libs 6.2.4-1 rocm-hip-libraries 6.2.2-1 rocm-hip-runtime 6.2.2-1 rocm-hip-sdk 6.2.2-1 rocm-language-runtime 6.2.2-1 rocm-llvm 6.2.4-1 rocm-opencl-runtime 6.2.4-1 rocm-opencl-sdk 6.2.2-1 rocm-smi-lib 6.2.4-1 rocminfo 6.2.4-1 ``` Thanks ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.5.4-1 tested both on arch linux installed version and github release page downloaded version. It works before and broken after i update my arch linux before create this issue.

GiteaMirror added the bug label 2026-04-12 16:27:26 -05:00

GiteaMirror closed this issue

2026-04-12 16:27:28 -05:00

GiteaMirror commented

2026-04-12 16:27:29 -05:00

@copycraft commented on GitHub (Dec 28, 2024):

i see ou use arch btw. chad

@copycraft commented on GitHub (Dec 28, 2024): i see ou use arch btw. chad

GiteaMirror commented

2026-04-12 16:27:32 -05:00

@zw963 commented on GitHub (Dec 28, 2024):

I tried remove the following arch package:

ollama 0.5.4-1
ollama-rocm 0.5.4-1

then installed from our github release page, then run downloaded save version ollama, get exactly same issue.

and, following is my upgrade log: (before upgrade it, ollama work well)

[2024-12-28T21:05:50+0800] [ALPM] upgraded python-pytorch-rocm (2.5.1-4 -> 2.5.1-7)
[2024-12-28T21:05:44+0800] [ALPM] upgraded ollama-rocm (0.5.2-1 -> 0.5.4-1)
[2024-12-28T21:05:43+0800] [ALPM] upgraded ollama (0.5.2-1 -> 0.5.4-1)

So, i tried download the 0.5.2 from github release page, run it instead, but still not work.

Thanks.

@zw963 commented on GitHub (Dec 28, 2024): I tried remove the following arch package: ollama 0.5.4-1 ollama-rocm 0.5.4-1 then installed from our github release page, then run downloaded save version ollama, get exactly same issue. and, following is my upgrade log: (before upgrade it, ollama work well) ``` [2024-12-28T21:05:50+0800] [ALPM] upgraded python-pytorch-rocm (2.5.1-4 -> 2.5.1-7) [2024-12-28T21:05:44+0800] [ALPM] upgraded ollama-rocm (0.5.2-1 -> 0.5.4-1) [2024-12-28T21:05:43+0800] [ALPM] upgraded ollama (0.5.2-1 -> 0.5.4-1) ``` So, i tried download the 0.5.2 from github release page, run it instead, but still not work. Thanks.

GiteaMirror commented

2026-04-12 16:27:33 -05:00

@zw963 commented on GitHub (Jan 5, 2025):

Okay, i can confirm this issue caused by 6.12 linux kernel.

https://gitlab.freedesktop.org/drm/amd/-/issues/3821

For arch linux user, use lts version linux or use an older 6.11 version linux kernel can workaround it.

@zw963 commented on GitHub (Jan 5, 2025): Okay, i can confirm this issue caused by 6.12 linux kernel. https://gitlab.freedesktop.org/drm/amd/-/issues/3821 For arch linux user, use lts version linux or use an older 6.11 version linux kernel can workaround it. ![image](https://github.com/user-attachments/assets/f4d658ae-ca48-4584-abf2-f076d87ade0a)

GiteaMirror commented

2026-04-12 16:27:35 -05:00

@dlasky commented on GitHub (Jan 17, 2025):

have the same issue on arch with 6.12

@dlasky commented on GitHub (Jan 17, 2025): have the same issue on arch with 6.12

GiteaMirror commented

2026-04-12 16:27:36 -05:00

@waltercool commented on GitHub (Jan 30, 2025):

Same issue on Gentoo with 6.13, but using my 7700S dGPU instead.

It works fine when using my 780M iGPU

@waltercool commented on GitHub (Jan 30, 2025): Same issue on Gentoo with 6.13, but using my 7700S dGPU instead. It works fine when using my 780M iGPU

GiteaMirror commented

2026-04-12 16:27:36 -05:00

@Cyanic commented on GitHub (Jan 31, 2025):

Fedora 41 6.12.11-200.fc41 780M iGPU same error.

Anyone resolve this?

@Cyanic commented on GitHub (Jan 31, 2025): Fedora 41 6.12.11-200.fc41 780M iGPU same error. Anyone resolve this?

GiteaMirror commented

2026-04-12 16:27:37 -05:00

@zw963 commented on GitHub (Feb 2, 2025):

Fedora 41 6.12.11-200.fc41 780M iGPU same error.

Anyone resolve this?

Check my post for answer.

https://github.com/ollama/ollama/issues/8262#issuecomment-2571703020

@zw963 commented on GitHub (Feb 2, 2025): > Fedora 41 6.12.11-200.fc41 780M iGPU same error. > > Anyone resolve this? Check my post for answer. https://github.com/ollama/ollama/issues/8262#issuecomment-2571703020

GiteaMirror commented

2026-04-12 16:27:38 -05:00

@kiolygenius commented on GitHub (Feb 6, 2025):

Fedora 41 6.12.11-200.fc41 780M iGPU same error.
Anyone resolve this?

Check my post for answer.

#8262 (comment)

Now archlinux's linux-lts package is 6.12.

@kiolygenius commented on GitHub (Feb 6, 2025): > > Fedora 41 6.12.11-200.fc41 780M iGPU same error. > > Anyone resolve this? > > Check my post for answer. > > [#8262 (comment)](https://github.com/ollama/ollama/issues/8262#issuecomment-2571703020) Now archlinux's linux-lts package is 6.12.

GiteaMirror commented

2026-04-12 16:27:39 -05:00

@zw963 commented on GitHub (Feb 7, 2025):

Fedora 41 6.12.11-200.fc41 780M iGPU same error.
Anyone resolve this?

Check my post for answer.
#8262 (comment)

Now archlinux's linux-lts package is 6.12.

I still use linux-amd package in arch linux, it still use kernel 6.11

@zw963 commented on GitHub (Feb 7, 2025): > > > Fedora 41 6.12.11-200.fc41 780M iGPU same error. > > > Anyone resolve this? > > > > > > Check my post for answer. > > [#8262 (comment)](https://github.com/ollama/ollama/issues/8262#issuecomment-2571703020) > > Now archlinux's linux-lts package is 6.12. I still use `linux-amd` package in arch linux, it still use kernel 6.11

GiteaMirror commented

2026-04-12 16:27:40 -05:00

@Cyanic commented on GitHub (Feb 10, 2025):

Can confirm that downgrading from 6.12 to kernel 6.11 worked.

@Cyanic commented on GitHub (Feb 10, 2025): Can confirm that downgrading from 6.12 to kernel 6.11 worked.

GiteaMirror commented

2026-04-12 16:27:40 -05:00

@zw963 commented on GitHub (Feb 15, 2025):

After reference this reply, I can confirm 780M works with kernel 6.12, 6.1.3 use 100% GPU.

 ╰──➤ $ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL
deepseek-r1:14b    ea35dfe18182    9.9 GB    100% GPU     Forever

 ╭─ 16:35  zw963  ~  ➦ ruby-3.3.5 ➦ crystal system ➦ elixir 1.17.3-otp-27 ➦ elixir-ls 0.23.0
 ╰──➤ $ uname -a
Linux mingfan 6.12.13-x64v2-xanmod1-1 #1 SMP PREEMPT_DYNAMIC Mon, 10 Feb 2025 04:36:57 +0000 x86_64 GNU/Linux

The KEY POINT is: you must set HSA_OVERRIDE_GFX_VERSION=11.0.2 instead of 11.0.0 or 11.0.1.

@zw963 commented on GitHub (Feb 15, 2025): After reference [this reply](https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-12207487), I can confirm 780M works with kernel 6.12, 6.1.3 use 100% GPU. ``` ╰──➤ $ ollama ps NAME ID SIZE PROCESSOR UNTIL deepseek-r1:14b ea35dfe18182 9.9 GB 100% GPU Forever ╭─ 16:35 zw963  ~  ➦ ruby-3.3.5 ➦ crystal system ➦ elixir 1.17.3-otp-27 ➦ elixir-ls 0.23.0 ╰──➤ $ uname -a Linux mingfan 6.12.13-x64v2-xanmod1-1 #1 SMP PREEMPT_DYNAMIC Mon, 10 Feb 2025 04:36:57 +0000 x86_64 GNU/Linux ``` The `KEY POINT` is: you must set `HSA_OVERRIDE_GFX_VERSION=11.0.2` instead of `11.0.0` or `11.0.1`.

GiteaMirror commented

2026-04-12 16:27:40 -05:00

@zw963 commented on GitHub (Jun 11, 2025):

The KEY POINT is: you must set HSA_OVERRIDE_GFX_VERSION=11.0.2 instead of 11.0.0 or 11.0.1.

This issue fixed now, you can set 11.0.0 instead.

@zw963 commented on GitHub (Jun 11, 2025): > The `KEY POINT` is: you must set `HSA_OVERRIDE_GFX_VERSION=11.0.2` instead of `11.0.0` or `11.0.1`. This issue fixed now, you can set 11.0.0 instead.

GiteaMirror commented

2026-04-12 16:27:42 -05:00

@Mubelotix commented on GitHub (Nov 13, 2025):

After reference this reply, I can confirm 780M works with kernel 6.12, 6.1.3 use 100% GPU.

 ╰──➤ $ ollama ps
NAME               ID              SIZE      PROCESSOR    UNTIL
deepseek-r1:14b    ea35dfe18182    9.9 GB    100% GPU     Forever

 ╭─ 16:35  zw963  ~  ➦ ruby-3.3.5 ➦ crystal system ➦ elixir 1.17.3-otp-27 ➦ elixir-ls 0.23.0
 ╰──➤ $ uname -a
Linux mingfan 6.12.13-x64v2-xanmod1-1 #1 SMP PREEMPT_DYNAMIC Mon, 10 Feb 2025 04:36:57 +0000 x86_64 GNU/Linux

The KEY POINT is: you must set HSA_OVERRIDE_GFX_VERSION=11.0.2 instead of 11.0.0 or 11.0.1.

Thank you, I owe you

This wasn't fixed btw, this workaround is still required

@Mubelotix commented on GitHub (Nov 13, 2025): > After reference [this reply](https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-12207487), I can confirm 780M works with kernel 6.12, 6.1.3 use 100% GPU. > > ``` > ╰──➤ $ ollama ps > NAME ID SIZE PROCESSOR UNTIL > deepseek-r1:14b ea35dfe18182 9.9 GB 100% GPU Forever > > ╭─ 16:35 zw963  ~  ➦ ruby-3.3.5 ➦ crystal system ➦ elixir 1.17.3-otp-27 ➦ elixir-ls 0.23.0 > ╰──➤ $ uname -a > Linux mingfan 6.12.13-x64v2-xanmod1-1 #1 SMP PREEMPT_DYNAMIC Mon, 10 Feb 2025 04:36:57 +0000 x86_64 GNU/Linux > ``` > > The `KEY POINT` is: you must set `HSA_OVERRIDE_GFX_VERSION=11.0.2` instead of `11.0.0` or `11.0.1`. Thank you, I owe you This wasn't fixed btw, this workaround is still required

GiteaMirror commented

2026-04-12 16:27:42 -05:00

@zw963 commented on GitHub (Jan 12, 2026):

Just for clarify, HSA_OVERRIDE_GFX_VERSION is not needed since ROCm 7, 780M was supported, like a charm!

@zw963 commented on GitHub (Jan 12, 2026): Just for clarify, HSA_OVERRIDE_GFX_VERSION is not needed since ROCm 7, 780M was supported, like a charm!

GiteaMirror referenced this issue

2026-04-12 23:37:27 -05:00

[PR #5282] [MERGED] Docs for `api/embed` #11729

GiteaMirror referenced this issue

2026-04-16 05:50:07 -05:00

[PR #5282] [MERGED] Docs for `api/embed` #17000

GiteaMirror referenced this issue

2026-04-19 16:13:05 -05:00

[PR #5282] [MERGED] Docs for `api/embed` #22269

GiteaMirror referenced this issue

2026-04-22 22:16:54 -05:00

[PR #5282] [MERGED] Docs for `api/embed` #37602

GiteaMirror referenced this issue

2026-04-24 22:41:35 -05:00

[PR #5282] [MERGED] Docs for `api/embed` #42977

GiteaMirror referenced this issue

2026-04-29 13:18:25 -05:00

[PR #5282] [MERGED] Docs for `api/embed` #58426

GiteaMirror referenced this issue

2026-05-05 05:58:21 -05:00

[PR #5282] [MERGED] Docs for `api/embed` #74023

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#5282