[GH-ISSUE #6552] Ollama run codestral gives Error: llama runner process has terminated #4121

Open
opened 2026-04-12 15:01:15 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @anonymux1 on GitHub (Aug 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6552

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Trying to run codestral:22b on a 6800xt but get this error everytime :
Error: llama runner process has terminated: signal: segmentation fault (core dumped)

I have 16G RAM and 16G VRAM. What is the issue here? i was able to successfully run other models like starcoder2:3b

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

0.3.8

Originally created by @anonymux1 on GitHub (Aug 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6552 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Trying to run codestral:22b on a 6800xt but get this error everytime : Error: llama runner process has terminated: signal: segmentation fault (core dumped) I have 16G RAM and 16G VRAM. What is the issue here? i was able to successfully run other models like starcoder2:3b ### OS Linux ### GPU AMD ### CPU Intel ### Ollama version 0.3.8
GiteaMirror added the bugamd labels 2026-04-12 15:01:15 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 29, 2024):

Server logs will help in debugging.

<!-- gh-comment-id:2317588719 --> @rick-github commented on GitHub (Aug 29, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will help in debugging.
Author
Owner

@anonymux1 commented on GitHub (Aug 29, 2024):

Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 13.928751ms | 127.0.0.1 | POST "/api/show"
Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 13.445927ms | 127.0.0.1 | POST "/api/show"
Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 16.279468ms | 127.0.0.1 | POST "/api/show"
Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 13.850537ms | 127.0.0.1 | POST "/api/show"
Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 17.324191ms | 127.0.0.1 | POST "/api/show"
Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 14.241153ms | 127.0.0.1 | POST "/api/show"
Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.222+05:30 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-22a849aafe3ded20e9b6551b02684d8fa911537c35895dd2a1bf9eb70da8f69e gpu=0 parallel=1 available=15913861120 required="12.9 GiB"
Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.224+05:30 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=57 layers.offload=57 layers.split="" memory.available="[14.8 GiB]" memory.required.full="12.9 GiB" memory.required.partial="12.9 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[12.9 GiB]" memory.weights.total="11.9 GiB" memory.weights.repeating="11.7 GiB" memory.weights.nonrepeating="157.5 MiB" memory.graph.full="244.0 MiB" memory.graph.partial="256.3 MiB"
Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.226+05:30 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1087687516/runners/rocm_v60102/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-22a849aafe3ded20e9b6551b02684d8fa911537c35895dd2a1bf9eb70da8f69e --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 57 --no-mmap --parallel 1 --port 44581"
Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.226+05:30 level=INFO source=sched.go:450 msg="loaded runners" count=1
Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.226+05:30 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.226+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Aug 29 20:11:30 iMacPro ollama[44312]: INFO [main] build info | build=1 commit="1e6f655" tid="124110683927360" timestamp=1724942490
Aug 29 20:11:30 iMacPro ollama[44312]: INFO [main] system info | n_threads=6 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="124110683927360" timestamp=1724942490 total_threads=6
Aug 29 20:11:30 iMacPro ollama[44312]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="5" port="44581" tid="124110683927360" timestamp=1724942490
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: loaded meta data with 25 key-value pairs and 507 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-22a849aafe3ded20e9b6551b02684d8fa911537c35895dd2a1bf9eb70da8f69e (version GGUF V3 (latest))
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 0: general.architecture str = llama
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 1: general.name str = Codestral-22B-v0.1
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 2: llama.block_count u32 = 56
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 3: llama.context_length u32 = 32768
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 4: llama.embedding_length u32 = 6144
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 16384
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 6: llama.attention.head_count u32 = 48
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 8: llama.rope.freq_base f32 = 1000000.000000
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 10: general.file_type u32 = 2
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 11: llama.vocab_size u32 = 32768
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 13: tokenizer.ggml.add_space_prefix bool = true
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 14: tokenizer.ggml.model str = llama
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 15: tokenizer.ggml.pre str = default
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,32768] = ["", "", "", "[INST]", "[...
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,32768] = [0.000000, 0.000000, 0.000000, 0.0000...
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,32768] = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 1
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 2
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 24: general.quantization_version u32 = 2
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - type f32: 113 tensors
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - type q4_0: 393 tensors
Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - type q6_K: 1 tensors
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_vocab: special tokens cache size = 771
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_vocab: token to piece cache size = 0.1731 MB
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: format = GGUF V3 (latest)
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: arch = llama
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: vocab type = SPM
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_vocab = 32768
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_merges = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: vocab_only = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_ctx_train = 32768
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_embd = 6144
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_layer = 56
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_head = 48
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_head_kv = 8
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_rot = 128
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_swa = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_embd_head_k = 128
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_embd_head_v = 128
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_gqa = 6
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_embd_k_gqa = 1024
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_embd_v_gqa = 1024
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: f_norm_eps = 0.0e+00
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: f_clamp_kqv = 0.0e+00
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: f_logit_scale = 0.0e+00
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_ff = 16384
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_expert = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_expert_used = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: causal attn = 1
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: pooling type = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: rope type = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: rope scaling = linear
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: freq_base_train = 1000000.0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: freq_scale_train = 1
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_ctx_orig_yarn = 32768
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: rope_finetuned = unknown
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: ssm_d_conv = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: ssm_d_inner = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: ssm_d_state = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: ssm_dt_rank = 0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: model type = ?B
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: model ftype = Q4_0
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: model params = 22.25 B
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: model size = 11.71 GiB (4.52 BPW)
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: general.name = Codestral-22B-v0.1
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: BOS token = 1 ''
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: EOS token = 2 '
'
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: UNK token = 0 ''
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: LF token = 781 '<0x0A>'
Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: max token length = 48
Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.477+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
Aug 29 20:11:31 iMacPro ollama[32080]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
Aug 29 20:11:31 iMacPro ollama[32080]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 29 20:11:31 iMacPro ollama[32080]: ggml_cuda_init: found 1 ROCm devices:
Aug 29 20:11:31 iMacPro ollama[32080]: Device 0: AMD Radeon RX 6800 XT, compute capability 10.3, VMM: no
Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: ggml ctx size = 0.47 MiB
Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: offloading 56 repeating layers to GPU
Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: offloading non-repeating layers to GPU
Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: offloaded 57/57 layers to GPU
Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: ROCm0 buffer size = 11878.15 MiB
Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: ROCm_Host buffer size = 108.00 MiB
Aug 29 20:11:31 iMacPro ollama[32080]: time=2024-08-29T20:11:31.571+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Aug 29 20:11:31 iMacPro ollama[32080]: time=2024-08-29T20:11:31.822+05:30 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: segmentation fault (core dumped)"

<!-- gh-comment-id:2318062795 --> @anonymux1 commented on GitHub (Aug 29, 2024): Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 13.928751ms | 127.0.0.1 | POST "/api/show" Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 13.445927ms | 127.0.0.1 | POST "/api/show" Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 16.279468ms | 127.0.0.1 | POST "/api/show" Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 13.850537ms | 127.0.0.1 | POST "/api/show" Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 17.324191ms | 127.0.0.1 | POST "/api/show" Aug 29 20:11:29 iMacPro ollama[32080]: [GIN] 2024/08/29 - 20:11:29 | 200 | 14.241153ms | 127.0.0.1 | POST "/api/show" Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.222+05:30 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-22a849aafe3ded20e9b6551b02684d8fa911537c35895dd2a1bf9eb70da8f69e gpu=0 parallel=1 available=15913861120 required="12.9 GiB" Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.224+05:30 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=57 layers.offload=57 layers.split="" memory.available="[14.8 GiB]" memory.required.full="12.9 GiB" memory.required.partial="12.9 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[12.9 GiB]" memory.weights.total="11.9 GiB" memory.weights.repeating="11.7 GiB" memory.weights.nonrepeating="157.5 MiB" memory.graph.full="244.0 MiB" memory.graph.partial="256.3 MiB" Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.226+05:30 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1087687516/runners/rocm_v60102/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-22a849aafe3ded20e9b6551b02684d8fa911537c35895dd2a1bf9eb70da8f69e --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 57 --no-mmap --parallel 1 --port 44581" Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.226+05:30 level=INFO source=sched.go:450 msg="loaded runners" count=1 Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.226+05:30 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.226+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error" Aug 29 20:11:30 iMacPro ollama[44312]: INFO [main] build info | build=1 commit="1e6f655" tid="124110683927360" timestamp=1724942490 Aug 29 20:11:30 iMacPro ollama[44312]: INFO [main] system info | n_threads=6 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="124110683927360" timestamp=1724942490 total_threads=6 Aug 29 20:11:30 iMacPro ollama[44312]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="5" port="44581" tid="124110683927360" timestamp=1724942490 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: loaded meta data with 25 key-value pairs and 507 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-22a849aafe3ded20e9b6551b02684d8fa911537c35895dd2a1bf9eb70da8f69e (version GGUF V3 (latest)) Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 0: general.architecture str = llama Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 1: general.name str = Codestral-22B-v0.1 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 2: llama.block_count u32 = 56 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 3: llama.context_length u32 = 32768 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 4: llama.embedding_length u32 = 6144 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 16384 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 6: llama.attention.head_count u32 = 48 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 8: llama.rope.freq_base f32 = 1000000.000000 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 10: general.file_type u32 = 2 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 11: llama.vocab_size u32 = 32768 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 13: tokenizer.ggml.add_space_prefix bool = true Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 14: tokenizer.ggml.model str = llama Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 15: tokenizer.ggml.pre str = default Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,32768] = ["<unk>", "<s>", "</s>", "[INST]", "[... Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,32768] = [0.000000, 0.000000, 0.000000, 0.0000... Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,32768] = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 1 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 2 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - kv 24: general.quantization_version u32 = 2 Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - type f32: 113 tensors Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - type q4_0: 393 tensors Aug 29 20:11:30 iMacPro ollama[32080]: llama_model_loader: - type q6_K: 1 tensors Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_vocab: special tokens cache size = 771 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_vocab: token to piece cache size = 0.1731 MB Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: format = GGUF V3 (latest) Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: arch = llama Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: vocab type = SPM Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_vocab = 32768 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_merges = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: vocab_only = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_ctx_train = 32768 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_embd = 6144 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_layer = 56 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_head = 48 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_head_kv = 8 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_rot = 128 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_swa = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_embd_head_k = 128 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_embd_head_v = 128 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_gqa = 6 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_embd_k_gqa = 1024 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_embd_v_gqa = 1024 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: f_norm_eps = 0.0e+00 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: f_logit_scale = 0.0e+00 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_ff = 16384 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_expert = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_expert_used = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: causal attn = 1 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: pooling type = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: rope type = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: rope scaling = linear Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: freq_base_train = 1000000.0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: freq_scale_train = 1 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: n_ctx_orig_yarn = 32768 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: rope_finetuned = unknown Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: ssm_d_conv = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: ssm_d_inner = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: ssm_d_state = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: ssm_dt_rank = 0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: model type = ?B Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: model ftype = Q4_0 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: model params = 22.25 B Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: model size = 11.71 GiB (4.52 BPW) Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: general.name = Codestral-22B-v0.1 Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: BOS token = 1 '<s>' Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: EOS token = 2 '</s>' Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: UNK token = 0 '<unk>' Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: LF token = 781 '<0x0A>' Aug 29 20:11:30 iMacPro ollama[32080]: llm_load_print_meta: max token length = 48 Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.477+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" Aug 29 20:11:31 iMacPro ollama[32080]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 29 20:11:31 iMacPro ollama[32080]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 29 20:11:31 iMacPro ollama[32080]: ggml_cuda_init: found 1 ROCm devices: Aug 29 20:11:31 iMacPro ollama[32080]: Device 0: AMD Radeon RX 6800 XT, compute capability 10.3, VMM: no Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: ggml ctx size = 0.47 MiB Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: offloading 56 repeating layers to GPU Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: offloading non-repeating layers to GPU Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: offloaded 57/57 layers to GPU Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: ROCm0 buffer size = 11878.15 MiB Aug 29 20:11:31 iMacPro ollama[32080]: llm_load_tensors: ROCm_Host buffer size = 108.00 MiB Aug 29 20:11:31 iMacPro ollama[32080]: time=2024-08-29T20:11:31.571+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error" Aug 29 20:11:31 iMacPro ollama[32080]: time=2024-08-29T20:11:31.822+05:30 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: segmentation fault (core dumped)"
Author
Owner

@anonymux1 commented on GitHub (Aug 29, 2024):

Successfully ran a small llm like, starcoder2:3b but same error when i tried a slightly smaller model: ollama run codestral:22b-v0.1-q3_K_L
pulling manifest
pulling e12cecf18621... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 GB
pulling 36ee4ce5634b... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 83 B
pulling 5b68668f65de... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling 5dea4f4d0fff... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 63 B
pulling 9388242c6c41... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 488 B
verifying sha256 digest
writing manifest
success
Error: llama runner process has terminated: signal: segmentation fault (core dumped)

journalctl -u ollama Output:

Aug 29 20:47:28 iMacPro ollama[52949]: [GIN] 2024/08/29 - 20:47:28 | 200 | 8m17s | 127.0.0.1 | POST "/api/pull"
Aug 29 20:47:28 iMacPro ollama[52949]: [GIN] 2024/08/29 - 20:47:28 | 200 | 9.630687ms | 127.0.0.1 | POST "/api/show"
Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.068+05:30 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/home/Ollama/blobs/sha256-e12cecf18621cf0e2893065a1dfa8d2960f061b15840d1ef81544bc375ef0eec gpu=0 parallel=4 available=16134053888 required="14.1 GiB"
Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.070+05:30 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=57 layers.offload=57 layers.split="" memory.available="[15.0 GiB]" memory.required.full="14.1 GiB" memory.required.partial="14.1 GiB" memory.required.kv="1.8 GiB" memory.required.allocations="[14.1 GiB]" memory.weights.total="12.4 GiB" memory.weights.repeating="12.3 GiB" memory.weights.nonrepeating="157.5 MiB" memory.graph.full="832.0 MiB" memory.graph.partial="860.3 MiB"
Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.071+05:30 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama2276286463/runners/rocm_v60102/ollama_llama_server --model /home/Ollama/blobs/sha256-e12cecf18621cf0e2893065a1dfa8d2960f061b15840d1ef81544bc375ef0eec --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 57 --no-mmap --parallel 4 --port 42705"
Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.072+05:30 level=INFO source=sched.go:450 msg="loaded runners" count=1
Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.072+05:30 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.072+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Aug 29 20:47:28 iMacPro ollama[55053]: INFO [main] build info | build=1 commit="1e6f655" tid="140525588689728" timestamp=1724944648
Aug 29 20:47:28 iMacPro ollama[55053]: INFO [main] system info | n_threads=6 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140525588689728" timestamp=1724944648 total_threads=6
Aug 29 20:47:28 iMacPro ollama[55053]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="6" port="42705" tid="140525588689728" timestamp=1724944648
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: loaded meta data with 25 key-value pairs and 507 tensors from /home/Ollama/blobs/sha256-e12cecf18621cf0e2893065a1dfa8d2960f061b15840d1ef81544bc375ef0eec (version GGUF V3 (latest))
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 0: general.architecture str = llama
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 1: general.name str = Codestral-22B-v0.1
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 2: llama.block_count u32 = 56
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 3: llama.context_length u32 = 32768
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 4: llama.embedding_length u32 = 6144
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 16384
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 6: llama.attention.head_count u32 = 48
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 8: llama.rope.freq_base f32 = 1000000.000000
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 10: general.file_type u32 = 13
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 11: llama.vocab_size u32 = 32768
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 13: tokenizer.ggml.add_space_prefix bool = true
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 14: tokenizer.ggml.model str = llama
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 15: tokenizer.ggml.pre str = default
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,32768] = ["", "", "", "[INST]", "[...
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,32768] = [0.000000, 0.000000, 0.000000, 0.0000...
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,32768] = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 1
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 2
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 24: general.quantization_version u32 = 2
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - type f32: 113 tensors
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - type q3_K: 225 tensors
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - type q5_K: 168 tensors
Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - type q6_K: 1 tensors
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_vocab: special tokens cache size = 771
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_vocab: token to piece cache size = 0.1731 MB
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: format = GGUF V3 (latest)
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: arch = llama
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: vocab type = SPM
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_vocab = 32768
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_merges = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: vocab_only = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_ctx_train = 32768
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_embd = 6144
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_layer = 56
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_head = 48
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_head_kv = 8
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_rot = 128
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_swa = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_embd_head_k = 128
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_embd_head_v = 128
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_gqa = 6
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_embd_k_gqa = 1024
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_embd_v_gqa = 1024
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: f_norm_eps = 0.0e+00
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: f_clamp_kqv = 0.0e+00
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: f_logit_scale = 0.0e+00
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_ff = 16384
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_expert = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_expert_used = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: causal attn = 1
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: pooling type = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: rope type = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: rope scaling = linear
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: freq_base_train = 1000000.0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: freq_scale_train = 1
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_ctx_orig_yarn = 32768
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: rope_finetuned = unknown
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: ssm_d_conv = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: ssm_d_inner = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: ssm_d_state = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: ssm_dt_rank = 0
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: model type = ?B
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: model ftype = Q3_K - Large
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: model params = 22.25 B
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: model size = 10.92 GiB (4.22 BPW)
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: general.name = Codestral-22B-v0.1
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: BOS token = 1 ''
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: EOS token = 2 '
'
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: UNK token = 0 ''
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: LF token = 781 '<0x0A>'
Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: max token length = 48
Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.323+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
Aug 29 20:47:29 iMacPro ollama[52949]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
Aug 29 20:47:29 iMacPro ollama[52949]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 29 20:47:29 iMacPro ollama[52949]: ggml_cuda_init: found 1 ROCm devices:
Aug 29 20:47:29 iMacPro ollama[52949]: Device 0: AMD Radeon RX 6800 XT, compute capability 10.3, VMM: no
Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: ggml ctx size = 0.47 MiB
Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: offloading 56 repeating layers to GPU
Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: offloading non-repeating layers to GPU
Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: offloaded 57/57 layers to GPU
Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: ROCm0 buffer size = 11103.77 MiB
Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: ROCm_Host buffer size = 82.50 MiB
Aug 29 20:47:30 iMacPro ollama[52949]: time=2024-08-29T20:47:30.186+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Aug 29 20:47:30 iMacPro ollama[52949]: time=2024-08-29T20:47:30.436+05:30 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: segmentation fault (core dumped)"

<!-- gh-comment-id:2318092263 --> @anonymux1 commented on GitHub (Aug 29, 2024): Successfully ran a small llm like, starcoder2:3b but same error when i tried a slightly smaller model: ollama run codestral:22b-v0.1-q3_K_L pulling manifest pulling e12cecf18621... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 GB pulling 36ee4ce5634b... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 83 B pulling 5b68668f65de... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB pulling 5dea4f4d0fff... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 63 B pulling 9388242c6c41... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 488 B verifying sha256 digest writing manifest success Error: llama runner process has terminated: signal: segmentation fault (core dumped) journalctl -u ollama Output: Aug 29 20:47:28 iMacPro ollama[52949]: [GIN] 2024/08/29 - 20:47:28 | 200 | 8m17s | 127.0.0.1 | POST "/api/pull" Aug 29 20:47:28 iMacPro ollama[52949]: [GIN] 2024/08/29 - 20:47:28 | 200 | 9.630687ms | 127.0.0.1 | POST "/api/show" Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.068+05:30 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/home/Ollama/blobs/sha256-e12cecf18621cf0e2893065a1dfa8d2960f061b15840d1ef81544bc375ef0eec gpu=0 parallel=4 available=16134053888 required="14.1 GiB" Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.070+05:30 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=57 layers.offload=57 layers.split="" memory.available="[15.0 GiB]" memory.required.full="14.1 GiB" memory.required.partial="14.1 GiB" memory.required.kv="1.8 GiB" memory.required.allocations="[14.1 GiB]" memory.weights.total="12.4 GiB" memory.weights.repeating="12.3 GiB" memory.weights.nonrepeating="157.5 MiB" memory.graph.full="832.0 MiB" memory.graph.partial="860.3 MiB" Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.071+05:30 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama2276286463/runners/rocm_v60102/ollama_llama_server --model /home/Ollama/blobs/sha256-e12cecf18621cf0e2893065a1dfa8d2960f061b15840d1ef81544bc375ef0eec --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 57 --no-mmap --parallel 4 --port 42705" Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.072+05:30 level=INFO source=sched.go:450 msg="loaded runners" count=1 Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.072+05:30 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.072+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error" Aug 29 20:47:28 iMacPro ollama[55053]: INFO [main] build info | build=1 commit="1e6f655" tid="140525588689728" timestamp=1724944648 Aug 29 20:47:28 iMacPro ollama[55053]: INFO [main] system info | n_threads=6 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140525588689728" timestamp=1724944648 total_threads=6 Aug 29 20:47:28 iMacPro ollama[55053]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="6" port="42705" tid="140525588689728" timestamp=1724944648 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: loaded meta data with 25 key-value pairs and 507 tensors from /home/Ollama/blobs/sha256-e12cecf18621cf0e2893065a1dfa8d2960f061b15840d1ef81544bc375ef0eec (version GGUF V3 (latest)) Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 0: general.architecture str = llama Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 1: general.name str = Codestral-22B-v0.1 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 2: llama.block_count u32 = 56 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 3: llama.context_length u32 = 32768 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 4: llama.embedding_length u32 = 6144 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 16384 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 6: llama.attention.head_count u32 = 48 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 8: llama.rope.freq_base f32 = 1000000.000000 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 10: general.file_type u32 = 13 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 11: llama.vocab_size u32 = 32768 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 13: tokenizer.ggml.add_space_prefix bool = true Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 14: tokenizer.ggml.model str = llama Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 15: tokenizer.ggml.pre str = default Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,32768] = ["<unk>", "<s>", "</s>", "[INST]", "[... Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,32768] = [0.000000, 0.000000, 0.000000, 0.0000... Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,32768] = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 1 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 2 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - kv 24: general.quantization_version u32 = 2 Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - type f32: 113 tensors Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - type q3_K: 225 tensors Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - type q5_K: 168 tensors Aug 29 20:47:28 iMacPro ollama[52949]: llama_model_loader: - type q6_K: 1 tensors Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_vocab: special tokens cache size = 771 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_vocab: token to piece cache size = 0.1731 MB Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: format = GGUF V3 (latest) Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: arch = llama Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: vocab type = SPM Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_vocab = 32768 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_merges = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: vocab_only = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_ctx_train = 32768 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_embd = 6144 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_layer = 56 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_head = 48 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_head_kv = 8 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_rot = 128 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_swa = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_embd_head_k = 128 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_embd_head_v = 128 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_gqa = 6 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_embd_k_gqa = 1024 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_embd_v_gqa = 1024 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: f_norm_eps = 0.0e+00 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: f_logit_scale = 0.0e+00 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_ff = 16384 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_expert = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_expert_used = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: causal attn = 1 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: pooling type = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: rope type = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: rope scaling = linear Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: freq_base_train = 1000000.0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: freq_scale_train = 1 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: n_ctx_orig_yarn = 32768 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: rope_finetuned = unknown Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: ssm_d_conv = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: ssm_d_inner = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: ssm_d_state = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: ssm_dt_rank = 0 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: model type = ?B Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: model ftype = Q3_K - Large Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: model params = 22.25 B Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: model size = 10.92 GiB (4.22 BPW) Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: general.name = Codestral-22B-v0.1 Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: BOS token = 1 '<s>' Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: EOS token = 2 '</s>' Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: UNK token = 0 '<unk>' Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: LF token = 781 '<0x0A>' Aug 29 20:47:28 iMacPro ollama[52949]: llm_load_print_meta: max token length = 48 Aug 29 20:47:28 iMacPro ollama[52949]: time=2024-08-29T20:47:28.323+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" Aug 29 20:47:29 iMacPro ollama[52949]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 29 20:47:29 iMacPro ollama[52949]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 29 20:47:29 iMacPro ollama[52949]: ggml_cuda_init: found 1 ROCm devices: Aug 29 20:47:29 iMacPro ollama[52949]: Device 0: AMD Radeon RX 6800 XT, compute capability 10.3, VMM: no Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: ggml ctx size = 0.47 MiB Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: offloading 56 repeating layers to GPU Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: offloading non-repeating layers to GPU Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: offloaded 57/57 layers to GPU Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: ROCm0 buffer size = 11103.77 MiB Aug 29 20:47:29 iMacPro ollama[52949]: llm_load_tensors: ROCm_Host buffer size = 82.50 MiB Aug 29 20:47:30 iMacPro ollama[52949]: time=2024-08-29T20:47:30.186+05:30 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error" Aug 29 20:47:30 iMacPro ollama[52949]: time=2024-08-29T20:47:30.436+05:30 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: segmentation fault (core dumped)"
Author
Owner

@dhiltgen commented on GitHub (Sep 3, 2024):

I have a 6800 test system and was able to load the model. From the logs, it looks like it should fit on your GPU

Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.224+05:30 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=57 layers.offload=57 layers.split="" memory.available="[14.8 GiB]" memory.required.full="12.9 GiB" memory.required.partial="12.9 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[12.9 GiB]" memory.weights.total="11.9 GiB" memory.weights.repeating="11.7 GiB" memory.weights.nonrepeating="157.5 MiB" memory.graph.full="244.0 MiB" memory.graph.partial="256.3 MiB"

It looks like there are other GPU VRAM consumers on your system though (14.8G available, not 16G). It's possible the reporting may be inaccurate, or our prediction may be off. I'm curious if the crash disappears if you set num_gpu to a smaller value to allocate less on the GPU.

<!-- gh-comment-id:2327597472 --> @dhiltgen commented on GitHub (Sep 3, 2024): I have a 6800 test system and was able to load the model. From the logs, it looks like it should fit on your GPU ``` Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.224+05:30 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=57 layers.offload=57 layers.split="" memory.available="[14.8 GiB]" memory.required.full="12.9 GiB" memory.required.partial="12.9 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[12.9 GiB]" memory.weights.total="11.9 GiB" memory.weights.repeating="11.7 GiB" memory.weights.nonrepeating="157.5 MiB" memory.graph.full="244.0 MiB" memory.graph.partial="256.3 MiB" ``` It looks like there are other GPU VRAM consumers on your system though (14.8G available, not 16G). It's possible the reporting may be inaccurate, or our prediction may be off. I'm curious if the crash disappears if you set `num_gpu` to a smaller value to allocate less on the GPU.
Author
Owner

@anonymux1 commented on GitHub (Sep 4, 2024):

I have a 6800 test system and was able to load the model. From the logs, it looks like it should fit on your GPU

Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.224+05:30 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=57 layers.offload=57 layers.split="" memory.available="[14.8 GiB]" memory.required.full="12.9 GiB" memory.required.partial="12.9 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[12.9 GiB]" memory.weights.total="11.9 GiB" memory.weights.repeating="11.7 GiB" memory.weights.nonrepeating="157.5 MiB" memory.graph.full="244.0 MiB" memory.graph.partial="256.3 MiB"

It looks like there are other GPU VRAM consumers on your system though (14.8G available, not 16G). It's possible the reporting may be inaccurate, or our prediction may be off. I'm curious if the crash disappears if you set num_gpu to a smaller value to allocate less on the GPU.

I have a 6800 test system and was able to load the model. From the logs, it looks like it should fit on your GPU

Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.224+05:30 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=57 layers.offload=57 layers.split="" memory.available="[14.8 GiB]" memory.required.full="12.9 GiB" memory.required.partial="12.9 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[12.9 GiB]" memory.weights.total="11.9 GiB" memory.weights.repeating="11.7 GiB" memory.weights.nonrepeating="157.5 MiB" memory.graph.full="244.0 MiB" memory.graph.partial="256.3 MiB"

It looks like there are other GPU VRAM consumers on your system though (14.8G available, not 16G). It's possible the reporting may be inaccurate, or our prediction may be off. I'm curious if the crash disappears if you set num_gpu to a smaller value to allocate less on the GPU.

yea i am on the gnome desktop on my ubuntu machine which is probably consuming the gpu mem.. can't think Of anything else that could be

<!-- gh-comment-id:2327973755 --> @anonymux1 commented on GitHub (Sep 4, 2024): > I have a 6800 test system and was able to load the model. From the logs, it looks like it should fit on your GPU > > ``` > Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.224+05:30 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=57 layers.offload=57 layers.split="" memory.available="[14.8 GiB]" memory.required.full="12.9 GiB" memory.required.partial="12.9 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[12.9 GiB]" memory.weights.total="11.9 GiB" memory.weights.repeating="11.7 GiB" memory.weights.nonrepeating="157.5 MiB" memory.graph.full="244.0 MiB" memory.graph.partial="256.3 MiB" > ``` > > It looks like there are other GPU VRAM consumers on your system though (14.8G available, not 16G). It's possible the reporting may be inaccurate, or our prediction may be off. I'm curious if the crash disappears if you set `num_gpu` to a smaller value to allocate less on the GPU. > I have a 6800 test system and was able to load the model. From the logs, it looks like it should fit on your GPU > > ``` > Aug 29 20:11:30 iMacPro ollama[32080]: time=2024-08-29T20:11:30.224+05:30 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=57 layers.offload=57 layers.split="" memory.available="[14.8 GiB]" memory.required.full="12.9 GiB" memory.required.partial="12.9 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[12.9 GiB]" memory.weights.total="11.9 GiB" memory.weights.repeating="11.7 GiB" memory.weights.nonrepeating="157.5 MiB" memory.graph.full="244.0 MiB" memory.graph.partial="256.3 MiB" > ``` > > It looks like there are other GPU VRAM consumers on your system though (14.8G available, not 16G). It's possible the reporting may be inaccurate, or our prediction may be off. I'm curious if the crash disappears if you set `num_gpu` to a smaller value to allocate less on the GPU. yea i am on the gnome desktop on my ubuntu machine which is probably consuming the gpu mem.. can't think Of anything else that could be
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4121