[GH-ISSUE #5877] Ollama API not seeing messages provided in conversation_history #3666

Closed
opened 2026-04-12 14:27:45 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @barclaybrown on GitHub (Jul 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5877

What is the issue?

When I pass a list of dictionaries (messages) to ollama.chat it seems that the model does not see anything other than the latest message. For example, I want the model to get a bunch of text, and then answer a question about it. I send something like:

role : system content: You are a helpful assistant
role: user content: a bunch of reference text
role: user content: a question related to the reference text

then I get back

role: assistant content: an answer unrelated to the reference text, as if it doesn't see it

Is this a bug, or maybe I'm doing something wrong?

OS

Windows

GPU

Nvidia

CPU

No response

Ollama version

0.2.7

Originally created by @barclaybrown on GitHub (Jul 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5877 ### What is the issue? When I pass a list of dictionaries (messages) to ollama.chat it seems that the model does not see anything other than the latest message. For example, I want the model to get a bunch of text, and then answer a question about it. I send something like: role : system content: You are a helpful assistant role: user content: a bunch of reference text role: user content: a question related to the reference text then I get back role: assistant content: an answer unrelated to the reference text, as if it doesn't see it Is this a bug, or maybe I'm doing something wrong? ### OS Windows ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.2.7
GiteaMirror added the bug label 2026-04-12 14:27:45 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 23, 2024):

Can you provide an example? It looks to work here:

$ curl -s localhost:11434/api/chat -d '{
  "model": "qwen2:0.5b",
  "messages": [
    {
      "role": "user",
      "content": "an apple is $1.50"
    },
    {
      "role": "user",
      "content": "how much is an apple?"
    }
  ],
  "stream": false
}' | jq .message
{
  "role": "assistant",
  "content": "An apple costs $1.50."
}

<!-- gh-comment-id:2245487070 --> @rick-github commented on GitHub (Jul 23, 2024): Can you provide an example? It looks to work here: ```sh $ curl -s localhost:11434/api/chat -d '{ "model": "qwen2:0.5b", "messages": [ { "role": "user", "content": "an apple is $1.50" }, { "role": "user", "content": "how much is an apple?" } ], "stream": false }' | jq .message ``` ```json { "role": "assistant", "content": "An apple costs $1.50." } ```
Author
Owner

@barclaybrown commented on GitHub (Jul 23, 2024):

Thanks very much This might have given me the hint I needed. The text I'm supplying in the first user message is 45000 tokens, which I thought Phi3 128k should be fine with. Is there a chance that Ollama is throwing it out because it's too long? My same example worked, like yours, when I chopped the text down to about 4k tokens. If there's a limit to user message length, I could split up my text?

<!-- gh-comment-id:2245604681 --> @barclaybrown commented on GitHub (Jul 23, 2024): Thanks very much This might have given me the hint I needed. The text I'm supplying in the first user message is 45000 tokens, which I thought Phi3 128k should be fine with. Is there a chance that Ollama is throwing it out because it's too long? My same example worked, like yours, when I chopped the text down to about 4k tokens. If there's a limit to user message length, I could split up my text?
Author
Owner

@rick-github commented on GitHub (Jul 23, 2024):

The default context length for ollama is 2048, so it's likely the beginning text is being discarded (check the server logs for "input truncated"). You can pass an option in the API call to increase the window:

$ curl -s localhost:11434/api/chat -d '{
  "model": "qwen2:0.5b",
  "messages": [
    {
      "role": "user",
      "content": "an apple is $1.50"
    },
    {
      "role": "user",
      "content": "how much is an apple?"
    }
  ],
  "options": { "num_ctx": 45000 },
  "stream": false
}' | jq .message

Be aware that this can significantly increase the VRAM/RAM requirements for the model, memory usage is quadratic on the length of the context window.

<!-- gh-comment-id:2245621867 --> @rick-github commented on GitHub (Jul 23, 2024): The default context length for ollama is 2048, so it's likely the beginning text is being discarded (check the server logs for "input truncated"). You can pass an option in the API call to increase the window: ``` $ curl -s localhost:11434/api/chat -d '{ "model": "qwen2:0.5b", "messages": [ { "role": "user", "content": "an apple is $1.50" }, { "role": "user", "content": "how much is an apple?" } ], "options": { "num_ctx": 45000 }, "stream": false }' | jq .message ``` Be aware that this can significantly increase the VRAM/RAM requirements for the model, memory usage is quadratic on the length of the context window.
Author
Owner

@pdevine commented on GitHub (Sep 12, 2024):

@rick-github 's answer is correct. I'll go ahead and close this out.

<!-- gh-comment-id:2347322435 --> @pdevine commented on GitHub (Sep 12, 2024): @rick-github 's answer is correct. I'll go ahead and close this out.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3666