ollama run qwen2:72b-instruct-q2_K but Error: llama runner process has terminated: signal: aborted (core dumped) #3139

Open
opened 2025-11-12 11:26:27 -06:00 by GiteaMirror · 5 comments
Owner

Originally created by @mikestut on GitHub (Jun 10, 2024).

What is the issue?

6月 11 01:17:54 Venue-vPro ollama[2760]: time=2024-06-11T01:17:54.332+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll>
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_vocab: special tokens cache size = 421
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_vocab: token to piece cache size = 1.8703 MB
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: format = GGUF V3 (latest)
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: arch = qwen2
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: vocab type = BPE
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_vocab = 152064
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_merges = 151387
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_ctx_train = 32768
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_embd = 8192
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_head = 64
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_head_kv = 8
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_layer = 80
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_rot = 128
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_embd_head_k = 128
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_embd_head_v = 128
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_gqa = 8
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_embd_k_gqa = 1024
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_embd_v_gqa = 1024
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: f_norm_eps = 0.0e+00
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: f_norm_rms_eps = 1.0e-06
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: f_clamp_kqv = 0.0e+00
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: f_logit_scale = 0.0e+00
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_ff = 29568
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_expert = 0
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_expert_used = 0
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: causal attn = 1
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: pooling type = 0
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: rope type = 2
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: rope scaling = linear
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: freq_base_train = 1000000.0
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: freq_scale_train = 1
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_yarn_orig_ctx = 32768
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: rope_finetuned = unknown
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: ssm_d_conv = 0
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: ssm_d_inner = 0
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: ssm_d_state = 0
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: ssm_dt_rank = 0
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: model type = 70B
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: model ftype = Q2_K - Medium
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: model params = 72.71 B
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: model size = 27.76 GiB (3.28 BPW)
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: general.name = Qwen2-72B-Instruct
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: EOS token = 151645 '<|im_end|>'
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: LF token = 148848 'ÄĬ'
6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: EOT token = 151645 '<|im_end|>'
6月 11 01:17:54 Venue-vPro ollama[2760]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
6月 11 01:17:54 Venue-vPro ollama[2760]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
6月 11 01:17:54 Venue-vPro ollama[2760]: ggml_cuda_init: found 2 CUDA devices:
6月 11 01:17:54 Venue-vPro ollama[2760]: Device 0: Tesla M40 24GB, compute capability 5.2, VMM: yes
6月 11 01:17:54 Venue-vPro ollama[2760]: Device 1: Tesla M40 24GB, compute capability 5.2, VMM: yes
6月 11 01:17:55 Venue-vPro ollama[2760]: llm_load_tensors: ggml ctx size = 1.38 MiB
6月 11 01:17:55 Venue-vPro ollama[2760]: time=2024-06-11T01:17:55.789+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll>
6月 11 01:17:56 Venue-vPro ollama[2760]: time=2024-06-11T01:17:56.152+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll>
6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: offloading 80 repeating layers to GPU
6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: offloading non-repeating layers to GPU
6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: offloaded 81/81 layers to GPU
6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: CPU buffer size = 389.81 MiB
6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: CUDA0 buffer size = 13868.58 MiB
6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: CUDA1 buffer size = 14166.62 MiB
6月 11 01:18:00 Venue-vPro ollama[2760]: time=2024-06-11T01:18:00.123+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll>
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: n_ctx = 2048
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: n_batch = 512
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: n_ubatch = 512
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: flash_attn = 0
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: freq_base = 1000000.0
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: freq_scale = 1
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_kv_cache_init: CUDA0 KV buffer size = 328.00 MiB
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_kv_cache_init: CUDA1 KV buffer size = 312.00 MiB
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: KV self size = 640.00 MiB, K (f16): 320.00 MiB, V (f16): 320.00 MiB
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: CUDA_Host output buffer size = 0.61 MiB
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: CUDA0 compute buffer size = 400.01 MiB
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: CUDA1 compute buffer size = 400.02 MiB
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: CUDA_Host compute buffer size = 32.02 MiB
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: graph nodes = 2806
6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: graph splits = 3
6月 11 01:18:00 Venue-vPro ollama[2760]: time=2024-06-11T01:18:00.374+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll>
6月 11 01:18:04 Venue-vPro ollama[2760]: GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda/dmmv.cu:653: false
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2814]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2815]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2816]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2817]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2818]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2819]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2820]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2821]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2822]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2823]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2824]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2825]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2826]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2827]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2828]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2829]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2830]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2831]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2832]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2833]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2834]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2835]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2836]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2837]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2838]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2839]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2840]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2841]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2842]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2843]
6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2844]
6月 11 01:18:04 Venue-vPro ollama[2849]: [Thread debugging using libthread_db enabled]
6月 11 01:18:04 Venue-vPro ollama[2849]: Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
6月 11 01:18:05 Venue-vPro ollama[2760]: time=2024-06-11T01:18:05.095+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll>
6月 11 01:18:05 Venue-vPro ollama[2849]: 0x00007f4b23780c7f in __GI___wait4 (pid=2849, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
6月 11 01:18:05 Venue-vPro ollama[2760]: 27 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
6月 11 01:18:05 Venue-vPro ollama[2849]: #0 0x00007f4b23780c7f in __GI___wait4 (pid=2849, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.>
6月 11 01:18:05 Venue-vPro ollama[2849]: 27 in ../sysdeps/unix/sysv/linux/wait4.c
6月 11 01:18:05 Venue-vPro ollama[2849]: #1 0x00000000005febbb in ggml_print_backtrace ()
6月 11 01:18:05 Venue-vPro ollama[2849]: #2 0x00000000006b5dbc in ggml_cuda_op_dequantize_mul_mat_vec(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor >
6月 11 01:18:05 Venue-vPro ollama[2849]: #3 0x000000000068356a in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_te>
6月 11 01:18:05 Venue-vPro ollama[2849]: #4 0x00000000006866db in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) ()
6月 11 01:18:05 Venue-vPro ollama[2849]: #5 0x000000000064a42b in ggml_backend_sched_graph_compute_async ()
6月 11 01:18:05 Venue-vPro ollama[2849]: #6 0x000000000055c91f in llama_decode ()
6月 11 01:18:05 Venue-vPro ollama[2849]: #7 0x00000000004ffbe4 in llama_init_from_gpt_params(gpt_params&) ()
6月 11 01:18:05 Venue-vPro ollama[2849]: #8 0x00000000004a158d in llama_server_context::load_model(gpt_params const&) ()
6月 11 01:18:05 Venue-vPro ollama[2849]: #9 0x0000000000432ed6 in main ()
6月 11 01:18:05 Venue-vPro ollama[2849]: [Inferior 1 (process 2813) detached]
6月 11 01:18:05 Venue-vPro ollama[2760]: time=2024-06-11T01:18:05.395+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll>
6月 11 01:18:05 Venue-vPro ollama[2760]: time=2024-06-11T01:18:05.646+08:00 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner pr>
6月 11 01:18:05 Venue-vPro ollama[2760]: [GIN] 2024/06/11 - 01:18:05 | 500 | 13.241382363s | 127.0.0.1 | POST "/api/chat"
6月 11 01:18:10 Venue-vPro ollama[2760]: time=2024-06-11T01:18:10.885+08:00 level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" secon>
6月 11 01:18:11 Venue-vPro ollama[2760]: time=2024-06-11T01:18:11.134+08:00 level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" secon>
6月 11 01:18:11 Venue-vPro ollama[2760]: time=2024-06-11T01:18:11.385+08:00 level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" secon>

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.42

Originally created by @mikestut on GitHub (Jun 10, 2024). ### What is the issue? 6月 11 01:17:54 Venue-vPro ollama[2760]: time=2024-06-11T01:17:54.332+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll> 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_vocab: special tokens cache size = 421 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_vocab: token to piece cache size = 1.8703 MB 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: format = GGUF V3 (latest) 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: arch = qwen2 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: vocab type = BPE 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_vocab = 152064 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_merges = 151387 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_ctx_train = 32768 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_embd = 8192 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_head = 64 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_head_kv = 8 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_layer = 80 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_rot = 128 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_embd_head_k = 128 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_embd_head_v = 128 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_gqa = 8 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_embd_k_gqa = 1024 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_embd_v_gqa = 1024 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: f_norm_eps = 0.0e+00 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: f_norm_rms_eps = 1.0e-06 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: f_logit_scale = 0.0e+00 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_ff = 29568 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_expert = 0 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_expert_used = 0 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: causal attn = 1 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: pooling type = 0 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: rope type = 2 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: rope scaling = linear 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: freq_base_train = 1000000.0 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: freq_scale_train = 1 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: n_yarn_orig_ctx = 32768 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: rope_finetuned = unknown 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: ssm_d_conv = 0 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: ssm_d_inner = 0 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: ssm_d_state = 0 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: ssm_dt_rank = 0 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: model type = 70B 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: model ftype = Q2_K - Medium 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: model params = 72.71 B 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: model size = 27.76 GiB (3.28 BPW) 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: general.name = Qwen2-72B-Instruct 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>' 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: EOS token = 151645 '<|im_end|>' 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>' 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: LF token = 148848 'ÄĬ' 6月 11 01:17:54 Venue-vPro ollama[2760]: llm_load_print_meta: EOT token = 151645 '<|im_end|>' 6月 11 01:17:54 Venue-vPro ollama[2760]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes 6月 11 01:17:54 Venue-vPro ollama[2760]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no 6月 11 01:17:54 Venue-vPro ollama[2760]: ggml_cuda_init: found 2 CUDA devices: 6月 11 01:17:54 Venue-vPro ollama[2760]: Device 0: Tesla M40 24GB, compute capability 5.2, VMM: yes 6月 11 01:17:54 Venue-vPro ollama[2760]: Device 1: Tesla M40 24GB, compute capability 5.2, VMM: yes 6月 11 01:17:55 Venue-vPro ollama[2760]: llm_load_tensors: ggml ctx size = 1.38 MiB 6月 11 01:17:55 Venue-vPro ollama[2760]: time=2024-06-11T01:17:55.789+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll> 6月 11 01:17:56 Venue-vPro ollama[2760]: time=2024-06-11T01:17:56.152+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll> 6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: offloading 80 repeating layers to GPU 6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: offloading non-repeating layers to GPU 6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: offloaded 81/81 layers to GPU 6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: CPU buffer size = 389.81 MiB 6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: CUDA0 buffer size = 13868.58 MiB 6月 11 01:17:56 Venue-vPro ollama[2760]: llm_load_tensors: CUDA1 buffer size = 14166.62 MiB 6月 11 01:18:00 Venue-vPro ollama[2760]: time=2024-06-11T01:18:00.123+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll> 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: n_ctx = 2048 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: n_batch = 512 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: n_ubatch = 512 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: flash_attn = 0 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: freq_base = 1000000.0 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: freq_scale = 1 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_kv_cache_init: CUDA0 KV buffer size = 328.00 MiB 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_kv_cache_init: CUDA1 KV buffer size = 312.00 MiB 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: KV self size = 640.00 MiB, K (f16): 320.00 MiB, V (f16): 320.00 MiB 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: CUDA_Host output buffer size = 0.61 MiB 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: pipeline parallelism enabled (n_copies=4) 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: CUDA0 compute buffer size = 400.01 MiB 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: CUDA1 compute buffer size = 400.02 MiB 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: CUDA_Host compute buffer size = 32.02 MiB 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: graph nodes = 2806 6月 11 01:18:00 Venue-vPro ollama[2760]: llama_new_context_with_model: graph splits = 3 6月 11 01:18:00 Venue-vPro ollama[2760]: time=2024-06-11T01:18:00.374+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll> 6月 11 01:18:04 Venue-vPro ollama[2760]: GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda/dmmv.cu:653: false 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2814] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2815] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2816] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2817] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2818] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2819] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2820] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2821] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2822] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2823] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2824] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2825] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2826] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2827] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2828] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2829] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2830] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2831] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2832] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2833] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2834] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2835] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2836] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2837] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2838] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2839] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2840] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2841] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2842] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2843] 6月 11 01:18:04 Venue-vPro ollama[2849]: [New LWP 2844] 6月 11 01:18:04 Venue-vPro ollama[2849]: [Thread debugging using libthread_db enabled] 6月 11 01:18:04 Venue-vPro ollama[2849]: Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 6月 11 01:18:05 Venue-vPro ollama[2760]: time=2024-06-11T01:18:05.095+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll> 6月 11 01:18:05 Venue-vPro ollama[2849]: 0x00007f4b23780c7f in __GI___wait4 (pid=2849, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27 6月 11 01:18:05 Venue-vPro ollama[2760]: 27 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory. 6月 11 01:18:05 Venue-vPro ollama[2849]: #0 0x00007f4b23780c7f in __GI___wait4 (pid=2849, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.> 6月 11 01:18:05 Venue-vPro ollama[2849]: 27 in ../sysdeps/unix/sysv/linux/wait4.c 6月 11 01:18:05 Venue-vPro ollama[2849]: #1 0x00000000005febbb in ggml_print_backtrace () 6月 11 01:18:05 Venue-vPro ollama[2849]: #2 0x00000000006b5dbc in ggml_cuda_op_dequantize_mul_mat_vec(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor > 6月 11 01:18:05 Venue-vPro ollama[2849]: #3 0x000000000068356a in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_te> 6月 11 01:18:05 Venue-vPro ollama[2849]: #4 0x00000000006866db in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) () 6月 11 01:18:05 Venue-vPro ollama[2849]: #5 0x000000000064a42b in ggml_backend_sched_graph_compute_async () 6月 11 01:18:05 Venue-vPro ollama[2849]: #6 0x000000000055c91f in llama_decode () 6月 11 01:18:05 Venue-vPro ollama[2849]: #7 0x00000000004ffbe4 in llama_init_from_gpt_params(gpt_params&) () 6月 11 01:18:05 Venue-vPro ollama[2849]: #8 0x00000000004a158d in llama_server_context::load_model(gpt_params const&) () 6月 11 01:18:05 Venue-vPro ollama[2849]: #9 0x0000000000432ed6 in main () 6月 11 01:18:05 Venue-vPro ollama[2849]: [Inferior 1 (process 2813) detached] 6月 11 01:18:05 Venue-vPro ollama[2760]: time=2024-06-11T01:18:05.395+08:00 level=INFO source=server.go:567 msg="waiting for server to become available" status="ll> 6月 11 01:18:05 Venue-vPro ollama[2760]: time=2024-06-11T01:18:05.646+08:00 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner pr> 6月 11 01:18:05 Venue-vPro ollama[2760]: [GIN] 2024/06/11 - 01:18:05 | 500 | 13.241382363s | 127.0.0.1 | POST "/api/chat" 6月 11 01:18:10 Venue-vPro ollama[2760]: time=2024-06-11T01:18:10.885+08:00 level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" secon> 6月 11 01:18:11 Venue-vPro ollama[2760]: time=2024-06-11T01:18:11.134+08:00 level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" secon> 6月 11 01:18:11 Venue-vPro ollama[2760]: time=2024-06-11T01:18:11.385+08:00 level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" secon> ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.42
GiteaMirror added the bug label 2025-11-12 11:26:27 -06:00
Author
Owner

@mikestut commented on GitHub (Jun 11, 2024):

I can run the model qwen7b:16bf Maybe you can try it.

---Original---
From: @.>
Date: Tue, Jun 11, 2024 14:58 PM
To: @.
>;
Cc: @.@.>;
Subject: Re: [ollama/ollama] ollama run qwen2:72b-instruct-q2_K but Error:llama runner process has terminated: signal: aborted (core dumped) (Issue#4964)

I have the same problem with qwen2:7b


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: @.***>

@mikestut commented on GitHub (Jun 11, 2024): I can run the model qwen7b:16bf Maybe you can try it. ---Original--- From: ***@***.***&gt; Date: Tue, Jun 11, 2024 14:58 PM To: ***@***.***&gt;; Cc: ***@***.******@***.***&gt;; Subject: Re: [ollama/ollama] ollama run qwen2:72b-instruct-q2_K but Error:llama runner process has terminated: signal: aborted (core dumped) (Issue#4964) I have the same problem with qwen2:7b — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***&gt;
Author
Owner

@wgong commented on GitHub (Jun 14, 2024):

got the same error when running CLI

$ ollama run qwen2
Error: llama runner process has terminated: signal: aborted (core dumped)

ollama version is 0.1.36,
running on Ubuntu with GPU = NVIDIA GeForce RTX 4060 (8GB)

No issue with llama3, codegemma, gemma, phi3, mistral, aya

@wgong commented on GitHub (Jun 14, 2024): got the same error when running CLI ``` $ ollama run qwen2 Error: llama runner process has terminated: signal: aborted (core dumped) ``` ollama version is 0.1.36, running on Ubuntu with GPU = NVIDIA GeForce RTX 4060 (8GB) No issue with llama3, codegemma, gemma, phi3, mistral, aya
Author
Owner

@YogaNovvaindra commented on GitHub (Jun 14, 2024):

Got the same error running phi3 on ollama docker CPU only

root@0fe89d3262cb:/# ollama run phi3
Error: llama runner process has terminated: signal: aborted (core dumped) 
root@0fe89d3262cb:/# ollama -v
ollama version is 0.1.44

Log:

llama_model_load: error loading model: mmap failed: No such device
llama_load_model_from_file: exception loading model
terminate called after throwing an instance of 'std::runtime_error'
  what():  mmap failed: No such device
time=2024-06-14T02:16:51.468Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error"
time=2024-06-14T02:16:51.719Z level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) "
@YogaNovvaindra commented on GitHub (Jun 14, 2024): Got the same error running phi3 on ollama docker CPU only ``` root@0fe89d3262cb:/# ollama run phi3 Error: llama runner process has terminated: signal: aborted (core dumped) root@0fe89d3262cb:/# ollama -v ollama version is 0.1.44 ``` Log: ``` llama_model_load: error loading model: mmap failed: No such device llama_load_model_from_file: exception loading model terminate called after throwing an instance of 'std::runtime_error' what(): mmap failed: No such device time=2024-06-14T02:16:51.468Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error" time=2024-06-14T02:16:51.719Z level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) " ```
Author
Owner

@lyx8995 commented on GitHub (Jul 1, 2024):

same problem,have you handled?

@lyx8995 commented on GitHub (Jul 1, 2024): same problem,have you handled?
Author
Owner

@nicho2 commented on GitHub (Jul 1, 2024):

hello, i have the same problem with camembert-large embedding model (latest or q8). I use an another model (nomic embed text) without problem:

POST /api/embeddings HTTP/1.1
host: 172.17.0.1:11434
connection: keep-alive
content-type: text/plain;charset=UTF-8
accept: */*
accept-language: *
sec-fetch-mode: cors
user-agent: node
accept-encoding: gzip, deflate
content-length: 623

{"model":"royalpha/sentence-camembert-large:latest","prompt":"<document_metadata>\nsourceDocument: Guide technique AREA MANAGER - partie 1.md\npublished: 7/1/2024, 6:44:11 AM\n</document_metadata>\n\n# Guide Technique Note - Guide Area Manager.Docx Alain Sabourdy 20 Juin 2024\r\n\r\n## Contexte\r\n\r\n![0_Image_0.Png](0_Image_0.Png)\r\n\r\nCe document est r..dig.. dans l'unique objectif d'..tre une source d'information pour notre d..veloppement autour d'une Intelligence artificielle d'assistance au commissionnement d'un BMS IoT. Il doit ..tre traduit au format \"markdown\" pour ..tre communiqu.. .. l'IA Attention,"}HTTP/1.1 500 Internal Server Error
Content-Type: application/json; charset=utf-8
Date: Mon, 01 Jul 2024 08:00:04 GMT
Content-Length: 79

{"error":"llama runner process has terminated: signal: aborted (core dumped) "}

I use the version 0.1.48
i'm on Ubuntu with 2 GPU (A4000)

logs :

juil. 01 10:30:30 system-Precision-Tower-5810 ollama[1517977]: [GIN] 2024/07/01 - 10:30:30 | 500 | 43.218860116s |      172.17.0.2 | POST     "/api/embeddings"
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:31.379+02:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=25 layers.of>
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:31.380+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama3128957023/runners/cuda>
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1564266]: INFO [main] build info | build=1 commit="7c26775" tid="126336302825472" timestamp=1719822631
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1564266]: INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0>
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1564266]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="3" port="34659" tid="126336302825472" timestamp=1719822631
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: loaded meta data with 19 key-value pairs and 389 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-3f6b8a>
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv   0:                       general.architecture str              = bert
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv   1:                               general.name str              = sentence-camembert-large
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv   2:                           bert.block_count u32              = 24
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv   3:                        bert.context_length u32              = 514
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv   4:                      bert.embedding_length u32              = 1024
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv   5:                   bert.feed_forward_length u32              = 4096
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv   6:                  bert.attention.head_count u32              = 16
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv   7:          bert.attention.layer_norm_epsilon f32              = 0.000010
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv   8:                          general.file_type u32              = 7
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv   9:                      bert.attention.causal bool             = false
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv  10:            tokenizer.ggml.token_type_count u32              = 2
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv  11:                tokenizer.ggml.bos_token_id u32              = 0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv  12:                tokenizer.ggml.eos_token_id u32              = 2
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = bert
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,32005]   = ["▁<s>NOTUSED", "▁<pad>", "▁</s...
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,32005]   = [-1000.000000, -1000.000000, -1000.00...
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,32005]   = [3, 3, 3, 3, 1, 3, 3, 1, 1, 1, 1, 1, ...
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv  17:            tokenizer.ggml.padding_token_id u32              = 1
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv  18:               general.quantization_version u32              = 2
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - type  f32:  244 tensors
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - type q8_0:  145 tensors
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_vocab: special tokens cache size = 7
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_vocab: token to piece cache size = 0.2572 MB
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: format           = GGUF V3 (latest)
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: arch             = bert
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: vocab type       = WPM
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_vocab          = 32005
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_merges         = 0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_ctx_train      = 514
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_embd           = 1024
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_head           = 16
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_head_kv        = 16
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_layer          = 24
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_rot            = 64
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_embd_head_k    = 64
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_embd_head_v    = 64
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_gqa            = 1
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_embd_k_gqa     = 1024
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_embd_v_gqa     = 1024
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: f_norm_eps       = 1.0e-05
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: f_logit_scale    = 0.0e+00
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_ff             = 4096
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_expert         = 0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_expert_used    = 0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: causal attn      = 0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: pooling type     = -1
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: rope type        = 2
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: rope scaling     = linear
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: freq_base_train  = 10000.0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: freq_scale_train = 1
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_ctx_orig_yarn  = 514
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: rope_finetuned   = unknown
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: ssm_d_conv       = 0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: ssm_d_inner      = 0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: ssm_d_state      = 0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: ssm_dt_rank      = 0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: model type       = 335M
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: model ftype      = Q8_0
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: model params     = 335.61 M
uil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: model size       = 342.45 MiB (8.56 BPW)
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: general.name     = sentence-camembert-large
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: BOS token        = 0 '▁<s>NOTUSED'
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: EOS token        = 2 '▁</s>NOTUSED'
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: UNK token        = 100 '▁▁on'
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: SEP token        = 102 '▁h'
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: PAD token        = 1 '▁<pad>'
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: CLS token        = 101 '▁▁–'
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: MASK token       = 103 '▁y'
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: LF token         = 0 '▁<s>NOTUSED'
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   yes
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: ggml_cuda_init: found 1 CUDA devices:
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]:   Device 0: NVIDIA RTX A4000, compute capability 8.6, VMM: yes
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_tensors: ggml ctx size =    0.35 MiB
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_load: error loading model: check_tensor_dims: tensor 'token_types.weight' has wrong shape; expected  1024,     2, got  102>
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_load_model_from_file: exception loading model
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: terminate called after throwing an instance of 'std::runtime_error'
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]:   what():  check_tensor_dims: tensor 'token_types.weight' has wrong shape; expected  1024,     2, got  1024,     1,     1,     1
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:31.905+02:00 level=INFO source=sched.go:382 msg="loaded runners" count=1
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:31.905+02:00 level=INFO source=server.go:556 msg="waiting for llama runner to start responding"
juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:31.961+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server err>
juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:32.212+02:00 level=ERROR source=sched.go:388 msg="error loading llama server" error="llama runner process has te>
juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:32.212+02:00 level=WARN source=server.go:475 msg="llama runner process no longer running" sys=134 string="signal>
juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:32.212+02:00 level=WARN source=server.go:475 msg="llama runner process no longer running" sys=134 string="signal>
juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:32.212+02:00 level=WARN source=server.go:475 msg="llama runner process no longer running" sys=134 string="signal>
juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:32.212+02:00 level=WARN source=server.go:475 msg="llama runner process no longer running" sys=134 string="signal>
juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: [GIN] 2024/07/01 - 10:30:32 | 500 | 44.450877468s |      172.17.0.2 | POST     "/api/embeddings"
juil. 01 10:30:33 system-Precision-Tower-5810 systemd[1]: Stopping Ollama Service...
juil. 01 10:30:33 system-Precision-Tower-5810 systemd[1]: ollama.service: Deactivated successfully.
juil. 01 10:30:33 system-Precision-Tower-5810 systemd[1]: Stopped Ollama Service.
juil. 01 10:30:33 system-Precision-Tower-5810 systemd[1]: ollama.service: Consumed 51.896s CPU time.
juil. 01 10:30:33 system-Precision-Tower-5810 systemd[1]: Started Ollama Service.
juil. 01 10:30:33 system-Precision-Tower-5810 ollama[1564664]: 2024/07/01 10:30:33 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVE>
juil. 01 10:30:33 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:33.807+02:00 level=INFO source=images.go:730 msg="total blobs: 39"
juil. 01 10:30:33 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:33.808+02:00 level=INFO source=images.go:737 msg="total unused blobs removed: 0"
juil. 01 10:30:33 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:33.809+02:00 level=INFO source=routes.go:1111 msg="Listening on [::]:11434 (version 0.1.48)"
uil. 01 10:30:33 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:33.809+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2775426305/runners
juil. 01 10:30:37 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:37.668+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [rocm_v60101 cpu cpu_avx cpu_avx2 cuda_v>
juil. 01 10:30:38 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:38.176+02:00 level=INFO source=types.go:98 msg="inference compute" id=GPU-ab19754d-e86c-547d-899d-945082a2536f l>
juil. 01 10:30:38 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:38.177+02:00 level=INFO source=types.go:98 msg="inference compute" id=GPU-de5ead24-ca41-5a85-3046-6e331171011d l>
@nicho2 commented on GitHub (Jul 1, 2024): hello, i have the same problem with camembert-large embedding model (latest or q8). I use an another model (nomic embed text) without problem: POST /api/embeddings HTTP/1.1 host: 172.17.0.1:11434 connection: keep-alive content-type: text/plain;charset=UTF-8 accept: */* accept-language: * sec-fetch-mode: cors user-agent: node accept-encoding: gzip, deflate content-length: 623 {"model":"royalpha/sentence-camembert-large:latest","prompt":"<document_metadata>\nsourceDocument: Guide technique AREA MANAGER - partie 1.md\npublished: 7/1/2024, 6:44:11 AM\n</document_metadata>\n\n# Guide Technique Note - Guide Area Manager.Docx Alain Sabourdy 20 Juin 2024\r\n\r\n## Contexte\r\n\r\n![0_Image_0.Png](0_Image_0.Png)\r\n\r\nCe document est r..dig.. dans l'unique objectif d'..tre une source d'information pour notre d..veloppement autour d'une Intelligence artificielle d'assistance au commissionnement d'un BMS IoT. Il doit ..tre traduit au format \"markdown\" pour ..tre communiqu.. .. l'IA Attention,"}HTTP/1.1 500 Internal Server Error Content-Type: application/json; charset=utf-8 Date: Mon, 01 Jul 2024 08:00:04 GMT Content-Length: 79 {"error":"llama runner process has terminated: signal: aborted (core dumped) "} I use the version 0.1.48 i'm on Ubuntu with 2 GPU (A4000) logs : juil. 01 10:30:30 system-Precision-Tower-5810 ollama[1517977]: [GIN] 2024/07/01 - 10:30:30 | 500 | 43.218860116s | 172.17.0.2 | POST "/api/embeddings" juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:31.379+02:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=25 layers.of> juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:31.380+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama3128957023/runners/cuda> juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1564266]: INFO [main] build info | build=1 commit="7c26775" tid="126336302825472" timestamp=1719822631 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1564266]: INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0> juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1564266]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="3" port="34659" tid="126336302825472" timestamp=1719822631 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: loaded meta data with 19 key-value pairs and 389 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-3f6b8a> juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 0: general.architecture str = bert juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 1: general.name str = sentence-camembert-large juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 2: bert.block_count u32 = 24 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 3: bert.context_length u32 = 514 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 4: bert.embedding_length u32 = 1024 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 5: bert.feed_forward_length u32 = 4096 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 6: bert.attention.head_count u32 = 16 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 7: bert.attention.layer_norm_epsilon f32 = 0.000010 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 8: general.file_type u32 = 7 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 9: bert.attention.causal bool = false juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 10: tokenizer.ggml.token_type_count u32 = 2 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 11: tokenizer.ggml.bos_token_id u32 = 0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 12: tokenizer.ggml.eos_token_id u32 = 2 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 13: tokenizer.ggml.model str = bert juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,32005] = ["▁<s>NOTUSED", "▁<pad>", "▁</s... juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,32005] = [-1000.000000, -1000.000000, -1000.00... juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,32005] = [3, 3, 3, 3, 1, 3, 3, 1, 1, 1, 1, 1, ... juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32 = 1 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - kv 18: general.quantization_version u32 = 2 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - type f32: 244 tensors juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_loader: - type q8_0: 145 tensors juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_vocab: special tokens cache size = 7 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_vocab: token to piece cache size = 0.2572 MB juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: format = GGUF V3 (latest) juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: arch = bert juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: vocab type = WPM juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_vocab = 32005 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_merges = 0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_ctx_train = 514 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_embd = 1024 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_head = 16 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_head_kv = 16 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_layer = 24 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_rot = 64 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_embd_head_k = 64 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_embd_head_v = 64 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_gqa = 1 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_embd_k_gqa = 1024 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_embd_v_gqa = 1024 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: f_norm_eps = 1.0e-05 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: f_norm_rms_eps = 0.0e+00 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: f_logit_scale = 0.0e+00 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_ff = 4096 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_expert = 0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_expert_used = 0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: causal attn = 0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: pooling type = -1 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: rope type = 2 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: rope scaling = linear juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: freq_base_train = 10000.0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: freq_scale_train = 1 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: n_ctx_orig_yarn = 514 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: rope_finetuned = unknown juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: ssm_d_conv = 0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: ssm_d_inner = 0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: ssm_d_state = 0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: ssm_dt_rank = 0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: model type = 335M juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: model ftype = Q8_0 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: model params = 335.61 M uil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: model size = 342.45 MiB (8.56 BPW) juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: general.name = sentence-camembert-large juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: BOS token = 0 '▁<s>NOTUSED' juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: EOS token = 2 '▁</s>NOTUSED' juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: UNK token = 100 '▁▁on' juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: SEP token = 102 '▁h' juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: PAD token = 1 '▁<pad>' juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: CLS token = 101 '▁▁–' juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: MASK token = 103 '▁y' juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_print_meta: LF token = 0 '▁<s>NOTUSED' juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: ggml_cuda_init: CUDA_USE_TENSOR_CORES: no juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: ggml_cuda_init: found 1 CUDA devices: juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: Device 0: NVIDIA RTX A4000, compute capability 8.6, VMM: yes juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llm_load_tensors: ggml ctx size = 0.35 MiB juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_model_load: error loading model: check_tensor_dims: tensor 'token_types.weight' has wrong shape; expected 1024, 2, got 102> juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: llama_load_model_from_file: exception loading model juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: terminate called after throwing an instance of 'std::runtime_error' juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: what(): check_tensor_dims: tensor 'token_types.weight' has wrong shape; expected 1024, 2, got 1024, 1, 1, 1 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:31.905+02:00 level=INFO source=sched.go:382 msg="loaded runners" count=1 juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:31.905+02:00 level=INFO source=server.go:556 msg="waiting for llama runner to start responding" juil. 01 10:30:31 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:31.961+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server err> juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:32.212+02:00 level=ERROR source=sched.go:388 msg="error loading llama server" error="llama runner process has te> juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:32.212+02:00 level=WARN source=server.go:475 msg="llama runner process no longer running" sys=134 string="signal> juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:32.212+02:00 level=WARN source=server.go:475 msg="llama runner process no longer running" sys=134 string="signal> juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:32.212+02:00 level=WARN source=server.go:475 msg="llama runner process no longer running" sys=134 string="signal> juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: time=2024-07-01T10:30:32.212+02:00 level=WARN source=server.go:475 msg="llama runner process no longer running" sys=134 string="signal> juil. 01 10:30:32 system-Precision-Tower-5810 ollama[1517977]: [GIN] 2024/07/01 - 10:30:32 | 500 | 44.450877468s | 172.17.0.2 | POST "/api/embeddings" juil. 01 10:30:33 system-Precision-Tower-5810 systemd[1]: Stopping Ollama Service... juil. 01 10:30:33 system-Precision-Tower-5810 systemd[1]: ollama.service: Deactivated successfully. juil. 01 10:30:33 system-Precision-Tower-5810 systemd[1]: Stopped Ollama Service. juil. 01 10:30:33 system-Precision-Tower-5810 systemd[1]: ollama.service: Consumed 51.896s CPU time. juil. 01 10:30:33 system-Precision-Tower-5810 systemd[1]: Started Ollama Service. juil. 01 10:30:33 system-Precision-Tower-5810 ollama[1564664]: 2024/07/01 10:30:33 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVE> juil. 01 10:30:33 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:33.807+02:00 level=INFO source=images.go:730 msg="total blobs: 39" juil. 01 10:30:33 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:33.808+02:00 level=INFO source=images.go:737 msg="total unused blobs removed: 0" juil. 01 10:30:33 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:33.809+02:00 level=INFO source=routes.go:1111 msg="Listening on [::]:11434 (version 0.1.48)" uil. 01 10:30:33 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:33.809+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2775426305/runners juil. 01 10:30:37 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:37.668+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [rocm_v60101 cpu cpu_avx cpu_avx2 cuda_v> juil. 01 10:30:38 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:38.176+02:00 level=INFO source=types.go:98 msg="inference compute" id=GPU-ab19754d-e86c-547d-899d-945082a2536f l> juil. 01 10:30:38 system-Precision-Tower-5810 ollama[1564664]: time=2024-07-01T10:30:38.177+02:00 level=INFO source=types.go:98 msg="inference compute" id=GPU-de5ead24-ca41-5a85-3046-6e331171011d l>
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#3139