[GH-ISSUE #5775] Assistant doesn't continue from its last message #29357

Closed
opened 2026-04-22 08:08:38 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @yilmaz08 on GitHub (Jul 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5775

Originally assigned to: @jmorganca on GitHub.

What is the issue?

I love using llama3:8b with Open WebUI's text generation and recently I've realized whatever I write there llama3:8b just says random stuff.

After that I've tried the message editing in Open WebUI and even if I edit it the message continues like no assistant message was provided.

Finally I have tested the API with the same text and it still happened. So I posted this here.

Here is the body of my POST request to http://localhost:11434/api/chat/:

{
    "model": "llama3",
    "messages": [
        {"role": "user", "content": "hi"},
        {"role": "assistant", "content": "Hello this message is edited, "}
    ],
    "stream": false
}

Response:

{
    "model": "llama3",
    "created_at": "2024-07-18T16:56:59.425466236Z",
    "message": {
        "role": "assistant",
        "content": "Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?"
    },
    "done_reason": "stop",
    "done": true,
    "total_duration": 1085833912,
    "load_duration": 14700744,
    "prompt_eval_count": 11,
    "prompt_eval_duration": 56151000,
    "eval_count": 26,
    "eval_duration": 882579000
}

I would normally expect it to start with "Hello this message is edited, " part I've provided however it ignores last assistant message.

I am not sure why exactly but same thing happens for phi3 model too.

Is this feature removed or is it a bug on my side?

System:
OS: Arch Linux 6.9.9-arch1-1
GPU: NVIDIA 3060 Mobile
CPU: Intel i7-12700H

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.2.5

Originally created by @yilmaz08 on GitHub (Jul 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5775 Originally assigned to: @jmorganca on GitHub. ### What is the issue? I love using llama3:8b with Open WebUI's text generation and recently I've realized whatever I write there llama3:8b just says random stuff. After that I've tried the message editing in Open WebUI and even if I edit it the message continues like no assistant message was provided. Finally I have tested the API with the same text and it still happened. So I posted this here. Here is the body of my POST request to http://localhost:11434/api/chat/: ``` { "model": "llama3", "messages": [ {"role": "user", "content": "hi"}, {"role": "assistant", "content": "Hello this message is edited, "} ], "stream": false } ``` Response: ``` { "model": "llama3", "created_at": "2024-07-18T16:56:59.425466236Z", "message": { "role": "assistant", "content": "Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?" }, "done_reason": "stop", "done": true, "total_duration": 1085833912, "load_duration": 14700744, "prompt_eval_count": 11, "prompt_eval_duration": 56151000, "eval_count": 26, "eval_duration": 882579000 } ``` I would normally expect it to start with "Hello this message is edited, " part I've provided however it ignores last assistant message. I am not sure why exactly but same thing happens for phi3 model too. Is this feature removed or is it a bug on my side? System: OS: Arch Linux 6.9.9-arch1-1 GPU: NVIDIA 3060 Mobile CPU: Intel i7-12700H ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.2.5
GiteaMirror added the bug label 2026-04-22 08:08:38 -05:00
Author
Owner

@josegtmonteiro commented on GitHub (Jul 19, 2024):

Also seeing this same issue.

@mxyng or @jmorganca , do you think it could be related to something on #5126 or #5440 ?

Edit: I can confirm that getting back to 0.2.1 version of ollama solves the issue and model is able to continue past message.

<!-- gh-comment-id:2239903040 --> @josegtmonteiro commented on GitHub (Jul 19, 2024): Also seeing this same issue. @mxyng or @jmorganca , do you think it could be related to something on #5126 or #5440 ? Edit: I can confirm that getting back to 0.2.1 version of ollama solves the issue and model is able to continue past message.
Author
Owner

@josegtmonteiro commented on GitHub (Jul 19, 2024):

Adding a little bit of more information here. Using $env:OLLAMA_DEBUG="1" I was able to compare the final prompts on version 0.2.1 and 0.2.7

messages I'm passing to the API:
history = [ {"role": "system", "content": 'You are a helpful assistant.'}, {"role": "user", "content": "Hello, how are you today?"}, {"role": "assistant", "content": "Thanks for asking! I'm "}, ]

0.2.1 prompt:
prompt="\n<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHello, how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThanks for asking! I'm "

0.2.7 prompt
prompt="<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHello, how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThanks for asking! I'm <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

<!-- gh-comment-id:2240017995 --> @josegtmonteiro commented on GitHub (Jul 19, 2024): Adding a little bit of more information here. Using $env:OLLAMA_DEBUG="1" I was able to compare the final prompts on version 0.2.1 and 0.2.7 messages I'm passing to the API: ` history = [ {"role": "system", "content": 'You are a helpful assistant.'}, {"role": "user", "content": "Hello, how are you today?"}, {"role": "assistant", "content": "Thanks for asking! I'm "}, ]` 0.2.1 prompt: `prompt="\n<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHello, how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThanks for asking! I'm "` 0.2.7 prompt `prompt="<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHello, how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThanks for asking! I'm <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"`
Author
Owner

@jmorganca commented on GitHub (Jul 20, 2024):

Sorry folks a fix is on the way with https://github.com/ollama/ollama/pull/5802

<!-- gh-comment-id:2240810088 --> @jmorganca commented on GitHub (Jul 20, 2024): Sorry folks a fix is on the way with https://github.com/ollama/ollama/pull/5802
Author
Owner

@josegtmonteiro commented on GitHub (Jul 21, 2024):

@jmorganca , thanks for the quick fix.

Testing here with 0.2.8-rc1. Still not able to continue the message.

With the same example I mentioned before, using the OLLAMA_DEBUG I'm able to see the final prompt on the console, it is:

prompt="<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHello, how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThanks for asking! I'm <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

The messages I'm passing to the chat endpoint are:
history = [ {"role": "system", "content": 'You are a helpful assistant.'}, {"role": "user", "content": "Hello, how are you today?"}, {"role": "assistant", "content": "Thanks for asking! I'm "}, ]

Not sure it has any difference, but I'm testing with "llama3-groq-tool-use:8b-q8_0" model.

<!-- gh-comment-id:2241742367 --> @josegtmonteiro commented on GitHub (Jul 21, 2024): @jmorganca , thanks for the quick fix. Testing here with 0.2.8-rc1. Still not able to continue the message. With the same example I mentioned before, using the OLLAMA_DEBUG I'm able to see the final prompt on the console, it is: `prompt="<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHello, how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThanks for asking! I'm <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"` The messages I'm passing to the chat endpoint are: `history = [ {"role": "system", "content": 'You are a helpful assistant.'}, {"role": "user", "content": "Hello, how are you today?"}, {"role": "assistant", "content": "Thanks for asking! I'm "}, ]` Not sure it has any difference, but I'm testing with "llama3-groq-tool-use:8b-q8_0" model.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29357