[GH-ISSUE #427] Handle Chat History using API #197

Closed
opened 2026-04-12 09:43:25 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @unclecode on GitHub (Aug 27, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/427

Hello! I have a question about using the API. Should I add the chat history by augmenting the prompt in the conversation myself, or will the API handle it for me? I'm wondering if it's similar to OpenAI chat completion, where I can provide a list of messages as history, or if this is a stateless call and I need to handle the history by augmenting the prompt. Thank you!

Originally created by @unclecode on GitHub (Aug 27, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/427 Hello! I have a question about using the API. Should I add the chat history by augmenting the prompt in the conversation myself, or will the API handle it for me? I'm wondering if it's similar to OpenAI chat completion, where I can provide a list of messages as history, or if this is a stateless call and I need to handle the history by augmenting the prompt. Thank you!
GiteaMirror added the question label 2026-04-12 09:43:25 -05:00
Author
Owner

@mxyng commented on GitHub (Aug 29, 2023):

There are two approaches to chat history.

The first approach is to use the built in method. In the final message of a generate responses is a context. This field contains the chat history for that particular request as a list of tokens (ints). It includes the request it self, the LLM's response, and the context passed into the request. To continue the conversation, you can pass this field back into the next request, into the context field.

The pseudocode looks something like this:

context = []
for line in os.stdin:
  resp = post('/api/generate', json={'model': 'llama2', 'prompt': line.strip(), context=context)
  for resp_line in resp.iter_lines():
    body = json.loads(resp_line)
    # do something with body.get('response')
    if resp.get('context'):
      context = resp.get('context')

The second approach manages chat history directly. It does not use the context field and requires the user to track both requests and responses. This approach should use template instead of prompt otherwise the request may not match what the LLM sees.

Here's the pseudocode:

history = []
for line in os.stdin:
  templated_lines = template(history)
  resp = post('/api/generate', json={'model': 'llama2', 'template': templated_lines})
  resp_lines = []
  for resp_line in resp.iter_lines():
    body = json.loads(resp_line)
    # do something with body.get('response')
    resp_lines.append(body.get('response'))
  history.append(''.join(resp_text))

template in the example above should structure history into the format expected by the LLM. For llama2 without a system prompt, this will look something like this:

[INST] history[0] [/INST] history[1] [INST] history[2] [/INST] history[3] [INST] history[4] [/INST]
<!-- gh-comment-id:1697630698 --> @mxyng commented on GitHub (Aug 29, 2023): There are two approaches to chat history. The first approach is to use the built in method. In the final message of a generate responses is a `context`. This field contains the chat history for that particular request as a list of tokens (ints). It includes the request it self, the LLM's response, and the context passed into the request. To continue the conversation, you can pass this field back into the next request, into the context field. The pseudocode looks something like this: ```python context = [] for line in os.stdin: resp = post('/api/generate', json={'model': 'llama2', 'prompt': line.strip(), context=context) for resp_line in resp.iter_lines(): body = json.loads(resp_line) # do something with body.get('response') if resp.get('context'): context = resp.get('context') ``` The second approach manages chat history directly. It does not use the context field and requires the user to track both requests and responses. This approach should use `template` instead of `prompt` otherwise the request may not match what the LLM sees. Here's the pseudocode: ```python history = [] for line in os.stdin: templated_lines = template(history) resp = post('/api/generate', json={'model': 'llama2', 'template': templated_lines}) resp_lines = [] for resp_line in resp.iter_lines(): body = json.loads(resp_line) # do something with body.get('response') resp_lines.append(body.get('response')) history.append(''.join(resp_text)) ``` `template` in the example above should structure `history` into the format expected by the LLM. For `llama2` without a system prompt, this will look something like this: ``` [INST] history[0] [/INST] history[1] [INST] history[2] [/INST] history[3] [INST] history[4] [/INST] ```
Author
Owner

@mchiang0610 commented on GitHub (Aug 30, 2023):

@unclecode Hey! I wanted to reach out to see if the above comment by @mxyng answers your question?

I'll close this issue. If it did not, please re-open this issue.

Thank you!

<!-- gh-comment-id:1699850844 --> @mchiang0610 commented on GitHub (Aug 30, 2023): @unclecode Hey! I wanted to reach out to see if the above comment by @mxyng answers your question? I'll close this issue. If it did not, please re-open this issue. Thank you!
Author
Owner

@unclecode commented on GitHub (Sep 1, 2023):

There are two approaches to chat history.

The first approach is to use the built in method. In the final message of a generate responses is a context. This field contains the chat history for that particular request as a list of tokens (ints). It includes the request it self, the LLM's response, and the context passed into the request. To continue the conversation, you can pass this field back into the next request, into the context field.

The pseudocode looks something like this:

context = []
for line in os.stdin:
  resp = post('/api/generate', json={'model': 'llama2', 'prompt': line.strip(), context=context)
  for resp_line in resp.iter_lines():
    body = json.loads(resp_line)
    # do something with body.get('response')
    if resp.get('context'):
      context = resp.get('context')

The second approach manages chat history directly. It does not use the context field and requires the user to track both requests and responses. This approach should use template instead of prompt otherwise the request may not match what the LLM sees.

Here's the pseudocode:

history = []
for line in os.stdin:
  templated_lines = template(history)
  resp = post('/api/generate', json={'model': 'llama2', 'template': templated_lines})
  resp_lines = []
  for resp_line in resp.iter_lines():
    body = json.loads(resp_line)
    # do something with body.get('response')
    resp_lines.append(body.get('response'))
  history.append(''.join(resp_text))

template in the example above should structure history into the format expected by the LLM. For llama2 without a system prompt, this will look something like this:

[INST] history[0] [/INST] history[1] [INST] history[2] [/INST] history[3] [INST] history[4] [/INST]

@unclecode Hey! I wanted to reach out to see if the above comment by @mxyng answers your question?

I'll close this issue. If it did not, please re-open this issue.

Thank you!

Thank you very much. I’ve come across a brilliant idea regarding retaining the context and then reintroducing it to the model. It’s truly a brilliant concept. I was wondering if you reconvert it to text and inject it as a prompt, or if you use it as an input for the network. Thanks also for mentioning the second approach where I can manually insert it. I wasn’t aware of that option.

Great job, really. The work you’re all doing is truly remarkable. The process of maintaining and utilizing large language models locally seems much more straightforward. I’m unsure how best to express my gratitude. I’m interested in knowing how I might contribute to your project. Please provide more information. Thank you, and please keep me informed.

<!-- gh-comment-id:1701994129 --> @unclecode commented on GitHub (Sep 1, 2023): > There are two approaches to chat history. > > The first approach is to use the built in method. In the final message of a generate responses is a `context`. This field contains the chat history for that particular request as a list of tokens (ints). It includes the request it self, the LLM's response, and the context passed into the request. To continue the conversation, you can pass this field back into the next request, into the context field. > > The pseudocode looks something like this: > > ```python > context = [] > for line in os.stdin: > resp = post('/api/generate', json={'model': 'llama2', 'prompt': line.strip(), context=context) > for resp_line in resp.iter_lines(): > body = json.loads(resp_line) > # do something with body.get('response') > if resp.get('context'): > context = resp.get('context') > ``` > > The second approach manages chat history directly. It does not use the context field and requires the user to track both requests and responses. This approach should use `template` instead of `prompt` otherwise the request may not match what the LLM sees. > > Here's the pseudocode: > > ```python > history = [] > for line in os.stdin: > templated_lines = template(history) > resp = post('/api/generate', json={'model': 'llama2', 'template': templated_lines}) > resp_lines = [] > for resp_line in resp.iter_lines(): > body = json.loads(resp_line) > # do something with body.get('response') > resp_lines.append(body.get('response')) > history.append(''.join(resp_text)) > ``` > > `template` in the example above should structure `history` into the format expected by the LLM. For `llama2` without a system prompt, this will look something like this: > > ``` > [INST] history[0] [/INST] history[1] [INST] history[2] [/INST] history[3] [INST] history[4] [/INST] > ``` > @unclecode Hey! I wanted to reach out to see if the above comment by @mxyng answers your question? > > I'll close this issue. If it did not, please re-open this issue. > > Thank you! Thank you very much. I’ve come across a brilliant idea regarding retaining the context and then reintroducing it to the model. It’s truly a brilliant concept. I was wondering if you reconvert it to text and inject it as a prompt, or if you use it as an input for the network. Thanks also for mentioning the second approach where I can manually insert it. I wasn’t aware of that option. Great job, really. The work you’re all doing is truly remarkable. The process of maintaining and utilizing large language models locally seems much more straightforward. I’m unsure how best to express my gratitude. I’m interested in knowing how I might contribute to your project. Please provide more information. Thank you, and please keep me informed.
Author
Owner

@RealHacker commented on GitHub (Feb 22, 2025):

I see the context field is deprecated in api documentation. So now Approach 1 doesn't work anymore, right?

<!-- gh-comment-id:2676269725 --> @RealHacker commented on GitHub (Feb 22, 2025): I see the context field is deprecated in api documentation. So now Approach 1 doesn't work anymore, right?
Author
Owner

@Mengxh001 commented on GitHub (Feb 25, 2025):

I see the context field is deprecated in api documentation. So now Approach 1 doesn't work anymore, right?

yes,now we must use ChatRequest.Messages ,put all Message into it

API document see:
https://github.com/ollama/ollama/blob/main/docs/api.md#chat-request-with-history

<!-- gh-comment-id:2680749472 --> @Mengxh001 commented on GitHub (Feb 25, 2025): > I see the context field is deprecated in api documentation. So now Approach 1 doesn't work anymore, right? yes,now we must use ChatRequest.Messages ,put all Message into it API document see: https://github.com/ollama/ollama/blob/main/docs/api.md#chat-request-with-history
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#197