[GH-ISSUE #4803] Run chat api with Llama3 8B Model converted by llama.cpp had infinity response time #3030

Closed
opened 2026-04-12 13:26:36 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @cuongnguyengit on GitHub (Jun 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4803

What is the issue?

Hi team,

I used your guide (https://github.com/ollama/ollama/blob/main/docs/import.md) to convert https://huggingface.co/hiieu/Meta-Llama-3-8B-Instruct-function-calling-json-mode to gguf file.

All of conversions were ok but when I run with ollama I get the following error:

llama_new_context_with_model: graph splits = 5
{"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"139847421562880","timestamp":1717470350}
{"function":"initialize","level":"INFO","line":457,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"139847421562880","timestamp":1717470350}
{"function":"main","level":"INFO","line":3064,"msg":"model loaded","tid":"139847421562880","timestamp":1717470350}
{"function":"validate_model_chat_template","level":"ERR","line":437,"msg":"The chat template comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses","tid":"139847421562880","timestamp":1717470350}
{"function":"main","hostname":"127.0.0.1","level":"INFO","line":3267,"msg":"HTTP server listening","n_threads_http":"71","port":"24108","tid":"139847421562880","timestamp":1717470350}
{"function":"update_slots","level":"INFO","line":1578,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"139847421562880","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"139847421562880","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6688,"status":200,"tid":"139843732500480","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6690,"status":200,"tid":"139843724107776","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6692,"status":200,"tid":"139843707322368","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6694,"status":200,"tid":"139843715715072","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6696,"status":200,"tid":"139843698929664","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6698,"status":200,"tid":"139843690536960","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6810,"status":200,"tid":"139843598282752","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6810,"status":200,"tid":"139843598282752","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6802,"status":200,"tid":"139843589890048","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"POST","msg":"request","params":{},"path":"/tokenize","remote_addr":"127.0.0.1","remote_port":6810,"status":200,"tid":"139843598282752","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":9,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6776,"status":200,"tid":"139843581497344","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":10,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6810,"status":200,"tid":"139843598282752","timestamp":1717470350}
{"function":"launch_slot_with_data","level":"INFO","line":830,"msg":"slot is processing task","slot_id":0,"task_id":11,"tid":"139847421562880","timestamp":1717470350}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1809,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":188,"slot_id":0,"task_id":11,"tid":"139847421562880","timestamp":1717470350}
{"function":"update_slots","level":"INFO","line":1836,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":11,"tid":"139847421562880","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":13,"tid":"139847421562880","timestamp":1717470350}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6804,"status":200,"tid":"139843573104640","timestamp":1717470350}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":34,"tid":"139847421562880","timestamp":1717470351}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6752,"status":200,"tid":"139843564711936","timestamp":1717470351}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":44,"tid":"139847421562880","timestamp":1717470351}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6788,"status":200,"tid":"139843556319232","timestamp":1717470351}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":47,"tid":"139847421562880","timestamp":1717470351}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6756,"status":200,"tid":"139843547926528","timestamp":1717470351}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":52,"tid":"139847421562880","timestamp":1717470351}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6790,"status":200,"tid":"139843539533824","timestamp":1717470351}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":72,"tid":"139847421562880","timestamp":1717470352}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6764,"status":200,"tid":"139843531141120","timestamp":1717470352}
{"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":97,"tid":"139847421562880","timestamp":1717470353}
{"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6772,"status":200,"tid":"139843682144256","timestamp":1717470353}

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.32

Originally created by @cuongnguyengit on GitHub (Jun 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4803 ### What is the issue? Hi team, I used your guide (https://github.com/ollama/ollama/blob/main/docs/import.md) to convert https://huggingface.co/hiieu/Meta-Llama-3-8B-Instruct-function-calling-json-mode to gguf file. All of conversions were ok but when I run with ollama I get the following error: llama_new_context_with_model: graph splits = 5 {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"139847421562880","timestamp":1717470350} {"function":"initialize","level":"INFO","line":457,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"139847421562880","timestamp":1717470350} {"function":"main","level":"INFO","line":3064,"msg":"model loaded","tid":"139847421562880","timestamp":1717470350} {"function":"validate_model_chat_template","level":"ERR","line":437,"msg":"The chat template comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses","tid":"139847421562880","timestamp":1717470350} {"function":"main","hostname":"127.0.0.1","level":"INFO","line":3267,"msg":"HTTP server listening","n_threads_http":"71","port":"24108","tid":"139847421562880","timestamp":1717470350} {"function":"update_slots","level":"INFO","line":1578,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"139847421562880","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"139847421562880","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6688,"status":200,"tid":"139843732500480","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6690,"status":200,"tid":"139843724107776","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6692,"status":200,"tid":"139843707322368","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6694,"status":200,"tid":"139843715715072","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6696,"status":200,"tid":"139843698929664","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6698,"status":200,"tid":"139843690536960","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6810,"status":200,"tid":"139843598282752","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6810,"status":200,"tid":"139843598282752","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6802,"status":200,"tid":"139843589890048","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"POST","msg":"request","params":{},"path":"/tokenize","remote_addr":"127.0.0.1","remote_port":6810,"status":200,"tid":"139843598282752","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":9,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6776,"status":200,"tid":"139843581497344","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":10,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6810,"status":200,"tid":"139843598282752","timestamp":1717470350} {"function":"launch_slot_with_data","level":"INFO","line":830,"msg":"slot is processing task","slot_id":0,"task_id":11,"tid":"139847421562880","timestamp":1717470350} {"function":"update_slots","ga_i":0,"level":"INFO","line":1809,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":188,"slot_id":0,"task_id":11,"tid":"139847421562880","timestamp":1717470350} {"function":"update_slots","level":"INFO","line":1836,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":11,"tid":"139847421562880","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":13,"tid":"139847421562880","timestamp":1717470350} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6804,"status":200,"tid":"139843573104640","timestamp":1717470350} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":34,"tid":"139847421562880","timestamp":1717470351} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6752,"status":200,"tid":"139843564711936","timestamp":1717470351} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":44,"tid":"139847421562880","timestamp":1717470351} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6788,"status":200,"tid":"139843556319232","timestamp":1717470351} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":47,"tid":"139847421562880","timestamp":1717470351} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6756,"status":200,"tid":"139843547926528","timestamp":1717470351} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":52,"tid":"139847421562880","timestamp":1717470351} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6790,"status":200,"tid":"139843539533824","timestamp":1717470351} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":72,"tid":"139847421562880","timestamp":1717470352} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6764,"status":200,"tid":"139843531141120","timestamp":1717470352} {"function":"process_single_task","level":"INFO","line":1506,"msg":"slot data","n_idle_slots":0,"n_processing_slots":1,"task_id":97,"tid":"139847421562880","timestamp":1717470353} {"function":"log_server_request","level":"INFO","line":2734,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"127.0.0.1","remote_port":6772,"status":200,"tid":"139843682144256","timestamp":1717470353} ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.32
GiteaMirror added the bug label 2026-04-12 13:26:36 -05:00
Author
Owner

@cuongnguyengit commented on GitHub (Jun 4, 2024):

My template: TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
PARAMETER num_keep 24
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop ""<|eot_id|>""

<!-- gh-comment-id:2146520662 --> @cuongnguyengit commented on GitHub (Jun 4, 2024): My template: TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|>""" PARAMETER num_keep 24 PARAMETER stop "<|start_header_id|>" PARAMETER stop "<|end_header_id|>" PARAMETER stop "\"<|eot_id|>\""
Author
Owner

@pdevine commented on GitHub (Jun 5, 2024):

@cuongnguyengit can you upgrade to the latest version of ollama? There were numerous fixes that had to be done to llama3 due to its vocabulary and how it uses stop tokens.

I'm going to go ahead and close the issue, but we can reopen this if it's still a problem on newer versions.

<!-- gh-comment-id:2150934129 --> @pdevine commented on GitHub (Jun 5, 2024): @cuongnguyengit can you upgrade to the latest version of ollama? There were numerous fixes that had to be done to llama3 due to its vocabulary and how it uses stop tokens. I'm going to go ahead and close the issue, but we can reopen this if it's still a problem on newer versions.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3030