[GH-ISSUE #4593] decoded context and raw response should coincide #64921

Open
opened 2026-05-03 19:16:47 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @fcalabrow on GitHub (May 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4593

I made two requests.
The first one had the following params:

$payload = @{
    model = "llama3"
    prompt = "<|start_header_id|>user<|end_header_id|>Hey!<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
    stream = $false
    seed = 123
    temperature = 0
    raw = $true
} | ConvertTo-Json

And I got this:

{
  "model": "llama3",
  "created_at": "2024-05-23T15:48:39.9473937Z",
  "response": "\n\nHey! It's nice to chat with you. Is there something on your mind that you'd like to talk about, or are you just looking for some casual conversation? I'm here to listen and help if I can!",
  "done": true,
  "done_reason": "stop",
  "total_duration": 14993710800,
  "load_duration": 2539700,
  "prompt_eval_duration": 297100000,
  "eval_count": 47,
  "eval_duration": 14693012000
}

And the second one's parameters:

$payload = @{
    model = "llama3"
    prompt = "Hey!"
    stream = $false
    seed = 123
    temperature = 0
} | ConvertTo-Json

And the response:

{
  "model": "llama3",
  "created_at": "2024-05-23T15:49:20.9390008Z",
  "response": "Hey! It's nice to meet you. Is there something I can help you with, or would you like to chat?",
  "done": true,
  "done_reason": "stop",
  "context": [128006,882,128007,271,19182,0,128009,128006,78191,128007,271,19182,0,1102,596,6555,311,3449,499,13,2209,1070,2555,358,649,1520,499,449,11,477,1053,499,1093,311,6369,30,128009],
  "total_duration": 9436567800,
  "load_duration": 2744600,
  "prompt_eval_count": 8,
  "prompt_eval_duration": 1562642000,
  "eval_count": 26,
  "eval_duration": 7870476000
}

Now if I decode my context list with the llama3 tokenizer, I get:

<|start_header_id|>user<|end_header_id|>\n\nHey!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nHey! It's nice to meet you. Is there something I can help you with or would you like to chat?<|eot_id|>

So, the questions is: Why the response with raw = true doesn't look like my decoded list? Wouldn't that be more reasonable? If I'm to work with raw prompts, I want raw responses. Especially if I need to keep a chat history.

Originally created by @fcalabrow on GitHub (May 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4593 I made two requests. The first one had the following params: ``` $payload = @{ model = "llama3" prompt = "<|start_header_id|>user<|end_header_id|>Hey!<|eot_id|><|start_header_id|>assistant<|end_header_id|>" stream = $false seed = 123 temperature = 0 raw = $true } | ConvertTo-Json ``` And I got this: ``` { "model": "llama3", "created_at": "2024-05-23T15:48:39.9473937Z", "response": "\n\nHey! It's nice to chat with you. Is there something on your mind that you'd like to talk about, or are you just looking for some casual conversation? I'm here to listen and help if I can!", "done": true, "done_reason": "stop", "total_duration": 14993710800, "load_duration": 2539700, "prompt_eval_duration": 297100000, "eval_count": 47, "eval_duration": 14693012000 } ``` And the second one's parameters: ``` $payload = @{ model = "llama3" prompt = "Hey!" stream = $false seed = 123 temperature = 0 } | ConvertTo-Json ``` And the response: ``` { "model": "llama3", "created_at": "2024-05-23T15:49:20.9390008Z", "response": "Hey! It's nice to meet you. Is there something I can help you with, or would you like to chat?", "done": true, "done_reason": "stop", "context": [128006,882,128007,271,19182,0,128009,128006,78191,128007,271,19182,0,1102,596,6555,311,3449,499,13,2209,1070,2555,358,649,1520,499,449,11,477,1053,499,1093,311,6369,30,128009], "total_duration": 9436567800, "load_duration": 2744600, "prompt_eval_count": 8, "prompt_eval_duration": 1562642000, "eval_count": 26, "eval_duration": 7870476000 } ``` Now if I decode my context list with the llama3 tokenizer, I get: ``` <|start_header_id|>user<|end_header_id|>\n\nHey!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nHey! It's nice to meet you. Is there something I can help you with or would you like to chat?<|eot_id|> ``` So, the questions is: **Why the response with raw = true doesn't look like my decoded list?** Wouldn't that be more reasonable? If I'm to work with raw prompts, I want raw responses. Especially if I need to keep a chat history.
GiteaMirror added the feature request label 2026-05-03 19:16:47 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64921