[GH-ISSUE #2747] SIGFPE: floating-point exception during model initialization #48166

Closed
opened 2026-04-28 06:57:14 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @mitar on GitHub (Feb 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2747

I get SIGFPE: floating-point exception during model initialization:

llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from .../.ollama/models/blobs/sha256:8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,61249]   = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.11 MiB
llm_load_tensors:        CPU buffer size =  3647.87 MiB
..................................................................................................
llama_new_context_with_model: n_ctx      = 4
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =     4.00 MiB
llama_new_context_with_model: KV self size  =    4.00 MiB, K (f32):    2.00 MiB, V (f32):    2.00 MiB
llama_new_context_with_model:        CPU input buffer size   =     0.00 MiB
SIGFPE: floating-point exception
PC=0x7f787963a686 m=9 sigcode=1
signal arrived during cgo execution
instruction bytes: 0x49 0xf7 0x7e 0x18 0x48 0x85 0xd2 0x75 0xa5 0x49 0x8b 0x45 0x20 0x48 0x99 0x49

goroutine 1 [syscall]:
runtime.cgocall(0x98b4e0, 0xc0004c51e0)
	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0004c51b8 sp=0xc0004c5180 pc=0x40ab6b
github.com/jmorganca/ollama/llm._Cfunc_dyn_llama_server_init({0x7f7868001380, 0x7f78795182f0, 0x7f787950f760, 0x7f7879511160, 0x7f7879520be0, 0x7f7879513a00, 0x7f7879514040, 0x7f787950f810, 0x7f7879519360, 0x7f7879519eb0, ...}, ...)
	_cgo_gotypes.go:286 +0x45 fp=0xc0004c51e0 sp=0xc0004c51b8 pc=0x746525
github.com/jmorganca/ollama/llm.newDynExtServer.func7(0xac1b7a?, 0xc?)
	.../ollama/llm/dyn_ext_server.go:153 +0xef fp=0xc0004c52d0 sp=0xc0004c51e0 pc=0x747a6f
github.com/jmorganca/ollama/llm.newDynExtServer({0xc0003af940, 0x3f}, {0xc000112ea0, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
	.../ollama/llm/dyn_ext_server.go:153 +0xa65 fp=0xc0004c5570 sp=0xc0004c52d0 pc=0x747705
github.com/jmorganca/ollama/llm.newLlmServer({{_, _, _}, {_, _}, {_, _}}, {_, _}, {0xc000112ea0, ...}, ...)
	.../ollama/llm/llm.go:158 +0x425 fp=0xc0004c5730 sp=0xc0004c5570 pc=0x743e65
github.com/jmorganca/ollama/llm.New({0xacba2e, 0x25}, {0xc000112ea0, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
	.../ollama/llm/llm.go:123 +0x713 fp=0xc0004c59b0 sp=0xc0004c5730 pc=0x7437d3

I suspect the issue is in CPU input buffer size = 0.00 MiB. I am not sure why the input buffer size is 0? Probably there is some code which fails after that.

Originally created by @mitar on GitHub (Feb 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2747 I get `SIGFPE: floating-point exception` during model initialization: ``` llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from .../.ollama/models/blobs/sha256:8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = LLaMA v2 llama_model_loader: - kv 2: llama.context_length u32 = 4096 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 21: tokenizer.chat_template str = {% if messages[0]['role'] == 'system'... llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 4096 llm_load_print_meta: n_embd_v_gqa = 4096 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 11008 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 6.74 B llm_load_print_meta: model size = 3.56 GiB (4.54 BPW) llm_load_print_meta: general.name = LLaMA v2 llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.11 MiB llm_load_tensors: CPU buffer size = 3647.87 MiB .................................................................................................. llama_new_context_with_model: n_ctx = 4 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 4.00 MiB llama_new_context_with_model: KV self size = 4.00 MiB, K (f32): 2.00 MiB, V (f32): 2.00 MiB llama_new_context_with_model: CPU input buffer size = 0.00 MiB SIGFPE: floating-point exception PC=0x7f787963a686 m=9 sigcode=1 signal arrived during cgo execution instruction bytes: 0x49 0xf7 0x7e 0x18 0x48 0x85 0xd2 0x75 0xa5 0x49 0x8b 0x45 0x20 0x48 0x99 0x49 goroutine 1 [syscall]: runtime.cgocall(0x98b4e0, 0xc0004c51e0) /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0004c51b8 sp=0xc0004c5180 pc=0x40ab6b github.com/jmorganca/ollama/llm._Cfunc_dyn_llama_server_init({0x7f7868001380, 0x7f78795182f0, 0x7f787950f760, 0x7f7879511160, 0x7f7879520be0, 0x7f7879513a00, 0x7f7879514040, 0x7f787950f810, 0x7f7879519360, 0x7f7879519eb0, ...}, ...) _cgo_gotypes.go:286 +0x45 fp=0xc0004c51e0 sp=0xc0004c51b8 pc=0x746525 github.com/jmorganca/ollama/llm.newDynExtServer.func7(0xac1b7a?, 0xc?) .../ollama/llm/dyn_ext_server.go:153 +0xef fp=0xc0004c52d0 sp=0xc0004c51e0 pc=0x747a6f github.com/jmorganca/ollama/llm.newDynExtServer({0xc0003af940, 0x3f}, {0xc000112ea0, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...) .../ollama/llm/dyn_ext_server.go:153 +0xa65 fp=0xc0004c5570 sp=0xc0004c52d0 pc=0x747705 github.com/jmorganca/ollama/llm.newLlmServer({{_, _, _}, {_, _}, {_, _}}, {_, _}, {0xc000112ea0, ...}, ...) .../ollama/llm/llm.go:158 +0x425 fp=0xc0004c5730 sp=0xc0004c5570 pc=0x743e65 github.com/jmorganca/ollama/llm.New({0xacba2e, 0x25}, {0xc000112ea0, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...) .../ollama/llm/llm.go:123 +0x713 fp=0xc0004c59b0 sp=0xc0004c5730 pc=0x7437d3 ``` I suspect the issue is in `CPU input buffer size = 0.00 MiB`. I am not sure why the input buffer size is 0? Probably there is some code which fails after that.
Author
Owner

@jmorganca commented on GitHub (May 10, 2024):

This should be fixed now but let me know if that's not the case

<!-- gh-comment-id:2103677523 --> @jmorganca commented on GitHub (May 10, 2024): This should be fixed now but let me know if that's not the case
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48166