[GH-ISSUE #6838] Old Context Information fetched #4318

Closed
opened 2026-04-12 15:14:40 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @atul-siriusai on GitHub (Sep 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6838

Hello,

I am currently working on a Retrieval-Augmented Generation (RAG) application using LLaMA 3.1 70B. The workflow involves a set of documents in markdown format and an Excel sheet containing specific information that needs to be extracted from these documents. The process iterates over each row, dynamically generating a prompt and retrieving the relevant record along with its citation.

However, I am encountering an issue when processing subsequent documents. It appears that the model is retaining context from the previous document and using it to answer queries for the new document. This is causing inconsistencies in the responses and affecting the accuracy of the extraction.

Any insights on how to ensure the model resets context between documents would be appreciated

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.6

Originally created by @atul-siriusai on GitHub (Sep 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6838 Hello, I am currently working on a Retrieval-Augmented Generation (RAG) application using LLaMA 3.1 70B. The workflow involves a set of documents in markdown format and an Excel sheet containing specific information that needs to be extracted from these documents. The process iterates over each row, dynamically generating a prompt and retrieving the relevant record along with its citation. However, I am encountering an issue when processing subsequent documents. It appears that the model is retaining context from the previous document and using it to answer queries for the new document. This is causing inconsistencies in the responses and affecting the accuracy of the extraction. Any insights on how to ensure the model resets context between documents would be appreciated ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.6
GiteaMirror added the bug label 2026-04-12 15:14:40 -05:00
Author
Owner

@YonTracks commented on GitHub (Sep 17, 2024):

Hello, I learned, to reset this bug, Send a empty context[] or no context(i can't remember, im on phone lol) I sent a email, I thought maybe security issue, had me code diving to sort lol. but if local all good, trying to make sure this is not also a feature. Good luck.

<!-- gh-comment-id:2355715080 --> @YonTracks commented on GitHub (Sep 17, 2024): Hello, I learned, to reset this bug, Send a empty context[] or no context(i can't remember, im on phone lol) I sent a email, I thought maybe security issue, had me code diving to sort lol. but if local all good, trying to make sure this is not also a feature. Good luck.
Author
Owner

@rick-github commented on GitHub (Sep 18, 2024):

What client are you using? The model itself is stateless, if the responses from the model include context from previous prompts, it means the client must be sending that information with the prompt.

<!-- gh-comment-id:2357224422 --> @rick-github commented on GitHub (Sep 18, 2024): What client are you using? The model itself is stateless, if the responses from the model include context from previous prompts, it means the client must be sending that information with the prompt.
Author
Owner

@YonTracks commented on GitHub (Sep 18, 2024):

yes, being sure the default context[] embedding is reset with the new chat(check for cache etc) each time for me yes solved/fixed, a react/next.js issue (skills issue lol srry). native windows ollama local install with own custom next.js ui, api/generate.
api/chat not affected.
cheers.

<!-- gh-comment-id:2357242779 --> @YonTracks commented on GitHub (Sep 18, 2024): yes, being sure the default context[] embedding is reset with the new chat(check for cache etc) each time for me yes solved/fixed, a react/next.js issue (skills issue lol srry). native windows ollama local install with own custom next.js ui, api/generate. api/chat not affected. cheers.
Author
Owner

@YonTracks commented on GitHub (Sep 18, 2024):

thankyou.

<!-- gh-comment-id:2357543048 --> @YonTracks commented on GitHub (Sep 18, 2024): thankyou.
Author
Owner

@atul-siriusai commented on GitHub (Sep 18, 2024):

I am using ChalOllama from Langchain.

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg


From: frob @.>
Sent: Wednesday, September 18, 2024 5:31:06 AM
To: ollama/ollama @.
>
Cc: Atul Bhagat @.>; Author @.>
Subject: Re: [ollama/ollama] Old Context Information fetched (Issue #6838)

What client are you using? The model itself is stateless, if the responses from the model include context from previous prompts, it means the client must be sending that information with the prompt.


Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/6838#issuecomment-2357224422, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BJF7A4RY2G7DSVUCU4RFYCTZXC7EFAVCNFSM6AAAAABOLM4ADKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJXGIZDINBSGI.
You are receiving this because you authored the thread.Message ID: @.***>

<!-- gh-comment-id:2357554728 --> @atul-siriusai commented on GitHub (Sep 18, 2024): I am using ChalOllama from Langchain. Sent from Outlook for Android<https://aka.ms/AAb9ysg> ________________________________ From: frob ***@***.***> Sent: Wednesday, September 18, 2024 5:31:06 AM To: ollama/ollama ***@***.***> Cc: Atul Bhagat ***@***.***>; Author ***@***.***> Subject: Re: [ollama/ollama] Old Context Information fetched (Issue #6838) What client are you using? The model itself is stateless, if the responses from the model include context from previous prompts, it means the client must be sending that information with the prompt. — Reply to this email directly, view it on GitHub<https://github.com/ollama/ollama/issues/6838#issuecomment-2357224422>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJF7A4RY2G7DSVUCU4RFYCTZXC7EFAVCNFSM6AAAAABOLM4ADKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJXGIZDINBSGI>. You are receiving this because you authored the thread.Message ID: ***@***.***>
Author
Owner

@rick-github commented on GitHub (Sep 18, 2024):

Your langchain app may be doing something like

from langchain_ollama import ChatOllama

messages = [
  SystemMessage{"You are an AI expert in RAG"),
]

while not finished_all_instructions():
  messages.append(HumanMessage(get_instruction_from_excel()))
  response = ChatOllama(model="llama3.1").invoke(messages)
  messages.append(response)
  do_something_with_response(response)

so it's accumulating context with each invoke().

If you can provide a code snippet, or better, a standalone script that demonstrates the problem, debugging would be easier.

<!-- gh-comment-id:2357570952 --> @rick-github commented on GitHub (Sep 18, 2024): Your langchain app may be doing something like ```python from langchain_ollama import ChatOllama messages = [ SystemMessage{"You are an AI expert in RAG"), ] while not finished_all_instructions(): messages.append(HumanMessage(get_instruction_from_excel())) response = ChatOllama(model="llama3.1").invoke(messages) messages.append(response) do_something_with_response(response) ``` so it's accumulating context with each `invoke()`. If you can provide a code snippet, or better, a standalone script that demonstrates the problem, debugging would be easier.
Author
Owner

@YonTracks commented on GitHub (Sep 18, 2024):

good info here https://python.langchain.com/docs/tutorials/local_rag/ , seems embeddings / context needs to be reset same same ensure reset / empty [] context each time. input["context"] stuff like that and a good one is cache param? needs to be None = None? or cache: false, check the docs. good luck srry about edit.

<!-- gh-comment-id:2357577372 --> @YonTracks commented on GitHub (Sep 18, 2024): good info here `https://python.langchain.com/docs/tutorials/local_rag/` , seems embeddings / context needs to be reset same same ensure reset / empty [] context each time. input["context"] stuff like that and a good one is cache param? needs to be None = None? or cache: false, check the docs. good luck srry about edit.
Author
Owner

@atul-siriusai commented on GitHub (Sep 18, 2024):

Thanks for your comments.

If it is accumulating with each invoke(). Is there a way to clear it?
When you say send empty context, would that even clear it, as only empty entry would be appended.

<!-- gh-comment-id:2357683704 --> @atul-siriusai commented on GitHub (Sep 18, 2024): Thanks for your comments. If it is accumulating with each invoke(). Is there a way to clear it? When you say send empty context, would that even clear it, as only empty entry would be appended.
Author
Owner

@YonTracks commented on GitHub (Sep 18, 2024):

It actually shows ollama is getting even better, love it cheers, ollama.

Best to understand exactly what is happening here?
but, try no context(your use case might not need it, if can be done even not sure need code example?),
also srry if not the issue but, I'm thinking there is the "prompt"/ messages , then the context, but langchain might not allow that(auto context or something). check your updated / latest etc. doc's should say? cache is a gotcha big time? guessing that will be it. good luck, I tried. cheers

<!-- gh-comment-id:2357757607 --> @YonTracks commented on GitHub (Sep 18, 2024): It actually shows ollama is getting even better, love it cheers, ollama. Best to understand exactly what is happening here? but, try no context(your use case might not need it, if can be done even not sure need code example?), also srry if not the issue but, I'm thinking there is the "prompt"/ messages , then the context, but langchain might not allow that(auto context or something). check your updated / latest etc. doc's should say? cache is a gotcha big time? guessing that will be it. good luck, I tried. cheers
Author
Owner

@rick-github commented on GitHub (Sep 18, 2024):

If we can see the actual code, we can offer better suggestions. But the simplest way to clear the context is to just create a new message structure each time.

from langchain_ollama import ChatOllama

while not finished_all_instructions():
  messages = [
    SystemMessage{"You are an AI expert in RAG"),
    HumanMessage(get_instruction_from_excel())
  ]
  response = ChatOllama(model="llama3.1").invoke(messages)
  do_something_with_response(response)
<!-- gh-comment-id:2357931060 --> @rick-github commented on GitHub (Sep 18, 2024): If we can see the actual code, we can offer better suggestions. But the simplest way to clear the context is to just create a new message structure each time. ```python from langchain_ollama import ChatOllama while not finished_all_instructions(): messages = [ SystemMessage{"You are an AI expert in RAG"), HumanMessage(get_instruction_from_excel()) ] response = ChatOllama(model="llama3.1").invoke(messages) do_something_with_response(response) ```
Author
Owner

@mdhuzaifapatel commented on GitHub (Nov 20, 2024):

Can you guys please tell me (if possible with a code example) how to use context (list of numbers) which we get from current response in next response?

Everytime I'm hitting Ollama API it's generating general answers, not related to the conversation history, I mean, I think it's not using previous responses context .
Please help me out
Thanks

<!-- gh-comment-id:2489411275 --> @mdhuzaifapatel commented on GitHub (Nov 20, 2024): Can you guys please tell me (if possible with a code example) how to use context (list of numbers) which we get from current response in next response? Everytime I'm hitting Ollama API it's generating general answers, not related to the conversation history, I mean, I think it's not using previous responses context . Please help me out Thanks
Author
Owner

@rick-github commented on GitHub (Nov 20, 2024):

#!/usr/bin/env python3

import ollama

prompts = [
  'An apple is $1.50',
  'How much is an apple?'
]

context = []
llm = ollama.Client()

for p in prompts:
  response = llm.generate(
      model="llama3.2:3b-instruct-q4_K_M",
      prompt=p,
      context=context
      )
  context = context + response["context"]
  print(response["response"]+"\n==")
$ ./6838.py 
That sounds like a pretty standard price for an apple! Would you like to know the nutritional information or some fun facts about apples?
==
You mentioned earlier that an apple costs $1.50.
==
<!-- gh-comment-id:2489597400 --> @rick-github commented on GitHub (Nov 20, 2024): ```python #!/usr/bin/env python3 import ollama prompts = [ 'An apple is $1.50', 'How much is an apple?' ] context = [] llm = ollama.Client() for p in prompts: response = llm.generate( model="llama3.2:3b-instruct-q4_K_M", prompt=p, context=context ) context = context + response["context"] print(response["response"]+"\n==") ``` ```console $ ./6838.py That sounds like a pretty standard price for an apple! Would you like to know the nutritional information or some fun facts about apples? == You mentioned earlier that an apple costs $1.50. == ```
Author
Owner

@mdhuzaifapatel commented on GitHub (Nov 21, 2024):

Thanks!

On Thu, Nov 21, 2024, 3:12 AM frob @.***> wrote:

#!/usr/bin/env python3
import ollama
prompts = [
'An apple is $1.50',
'How much is an apple?'
]
context = []llm = ollama.Client()
for p in prompts:
response = llm.generate(
model="llama3.2:3b-instruct-q4_K_M",
prompt=p,
context=context
)
context = context + response["context"]
print(response["response"]+"\n==")

$ ./6838.py That sounds like a pretty standard price for an apple! Would you like to know the nutritional information or some fun facts about apples?==You mentioned earlier that an apple costs $1.50.==


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/6838#issuecomment-2489597400,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AVEXRVTA2BAW6OMIKV3UTZ32BT64BAVCNFSM6AAAAABOLM4ADKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGU4TONBQGA
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:2491049068 --> @mdhuzaifapatel commented on GitHub (Nov 21, 2024): Thanks! On Thu, Nov 21, 2024, 3:12 AM frob ***@***.***> wrote: > #!/usr/bin/env python3 > import ollama > prompts = [ > 'An apple is $1.50', > 'How much is an apple?' > ] > context = []llm = ollama.Client() > for p in prompts: > response = llm.generate( > model="llama3.2:3b-instruct-q4_K_M", > prompt=p, > context=context > ) > context = context + response["context"] > print(response["response"]+"\n==") > > $ ./6838.py That sounds like a pretty standard price for an apple! Would you like to know the nutritional information or some fun facts about apples?==You mentioned earlier that an apple costs $1.50.== > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/6838#issuecomment-2489597400>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AVEXRVTA2BAW6OMIKV3UTZ32BT64BAVCNFSM6AAAAABOLM4ADKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGU4TONBQGA> > . > You are receiving this because you commented.Message ID: > ***@***.***> >
Author
Owner

@rick-github commented on GitHub (Nov 29, 2024):

@mdhuzaifapatel Note that the context field is being deprecated: https://github.com/ollama/ollama/pull/7878

<!-- gh-comment-id:2508740570 --> @rick-github commented on GitHub (Nov 29, 2024): @mdhuzaifapatel Note that the `context` field is being deprecated: https://github.com/ollama/ollama/pull/7878
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4318