[GH-ISSUE #1839] template is ignored by the chat completion API #1048

Closed
opened 2026-04-12 10:47:07 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @JBGruber on GitHub (Jan 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1839

Originally assigned to: @BruceMacD on GitHub.

Maybe I'm doing something wrong, but I can't figure out how to use the template parameter in the API. This is what I'm trying:

$ curl http://localhost:11434/api/chat -d '{
  "model": "llama2",
  "messages": [
    {
      "role": "user",
      "content": "Hi!"
    }
  ],
  "stream": false,
  "template": "Say: I am a llama!"
}' 
{"model":"llama2","created_at":"2024-01-07T09:32:49.083583885Z","message":{"role":"assistant","content":"Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?"},"done":true,"total_duration":479902376,"load_duration":533295,"prompt_eval_count":22,"prompt_eval_duration":115756000,"eval_count":25,"eval_duration":362389000}

If I set the same template through the CLI, I get:

$ ollama run llama2
>>> /set template "Say: I'm a llama!"
Set system message.
>>> Hi!
"Say: I'm a llama!"

*blinks*

Uh, okay. You're a llama. *giggles* Is there something I can help you with as a llama?

>>>

It also seems to work okay with the chat completion endpoint

$curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false,
  "template": "Say: I am a llama!"
}'
{"model":"llama2","created_at":"2024-01-07T09:37:59.516033837Z","response":"\nϊ am a llama! I am a llama! I am a llama! I am a llama! 🦙\n\nMe: *stares at you* Uh, okay. Llama. Sure thing. *nods*","done":true,"context":[14891,29901,306,626,263,11148,3304,29991,13,31832,626,263,11148,3304,29991,306,626,263,11148,3304,29991,306,626,263,11148,3304,29991,306,626,263,11148,3304,29991,29871,243,162,169,156,13,13,6816,29901,334,303,5114,472,366,29930,501,29882,29892,20759,29889,365,29880,3304,29889,18585,2655,29889,334,29876,19653,29930],"total_duration":2373615470,"load_duration":1490750413,"prompt_eval_count":9,"prompt_eval_duration":61439000,"eval_count":56,"eval_duration":817078000}

ollama version is 0.1.17

Originally created by @JBGruber on GitHub (Jan 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1839 Originally assigned to: @BruceMacD on GitHub. Maybe I'm doing something wrong, but I can't figure out how to use the template parameter in the API. This is what I'm trying: ``` $ curl http://localhost:11434/api/chat -d '{ "model": "llama2", "messages": [ { "role": "user", "content": "Hi!" } ], "stream": false, "template": "Say: I am a llama!" }' {"model":"llama2","created_at":"2024-01-07T09:32:49.083583885Z","message":{"role":"assistant","content":"Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?"},"done":true,"total_duration":479902376,"load_duration":533295,"prompt_eval_count":22,"prompt_eval_duration":115756000,"eval_count":25,"eval_duration":362389000} ``` If I set the same template through the CLI, I get: ``` $ ollama run llama2 >>> /set template "Say: I'm a llama!" Set system message. >>> Hi! "Say: I'm a llama!" *blinks* Uh, okay. You're a llama. *giggles* Is there something I can help you with as a llama? >>> ``` It also seems to work okay with the chat completion endpoint ``` $curl http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Why is the sky blue?", "stream": false, "template": "Say: I am a llama!" }' {"model":"llama2","created_at":"2024-01-07T09:37:59.516033837Z","response":"\nϊ am a llama! I am a llama! I am a llama! I am a llama! 🦙\n\nMe: *stares at you* Uh, okay. Llama. Sure thing. *nods*","done":true,"context":[14891,29901,306,626,263,11148,3304,29991,13,31832,626,263,11148,3304,29991,306,626,263,11148,3304,29991,306,626,263,11148,3304,29991,306,626,263,11148,3304,29991,29871,243,162,169,156,13,13,6816,29901,334,303,5114,472,366,29930,501,29882,29892,20759,29889,365,29880,3304,29889,18585,2655,29889,334,29876,19653,29930],"total_duration":2373615470,"load_duration":1490750413,"prompt_eval_count":9,"prompt_eval_duration":61439000,"eval_count":56,"eval_duration":817078000} ``` ollama version is 0.1.17
Author
Owner

@BruceMacD commented on GitHub (Jan 9, 2024):

Hi @JBGruber your confusion here is that you should be using the system parameter rather than the template. The template is meant to define the input structure that the LLM expects. The CLI had a bug here where the system message was being set when you ran /set template, this was fixed a couple of days ago.

Here is the API request you want:

$ curl http://localhost:11434/api/chat -d '{
  "model": "llama2",
  "messages": [
    {
      "role": "system",
      "content": "Say: I am a llama!"
    },
    {
      "role": "user",
      "content": "Hi!"
    }
  ],
  "stream": false
}' 

Let me know if you hit any more issues.

<!-- gh-comment-id:1883714867 --> @BruceMacD commented on GitHub (Jan 9, 2024): Hi @JBGruber your confusion here is that you should be using the `system` parameter rather than the `template`. The `template` is meant to define the input structure that the LLM expects. The CLI had a bug here where the `system` message was being set when you ran `/set template`, this was fixed a couple of days ago. Here is the API request you want: ``` $ curl http://localhost:11434/api/chat -d '{ "model": "llama2", "messages": [ { "role": "system", "content": "Say: I am a llama!" }, { "role": "user", "content": "Hi!" } ], "stream": false }' ``` Let me know if you hit any more issues.
Author
Owner

@JBGruber commented on GitHub (Jan 9, 2024):

Hi @BruceMacD,

Great to hear that the bug in /set template was fixed! However, I know that the example was a bit silly and that a template looks more like this: "[INST] <<SYS>>{{ .System }}<</SYS>>\n\n{{ .Prompt }} [/INST]\n". But I wanted to show a reproducible strange behaviour. Here is another one for you, because I think you closed this one a bit early. If you do something like this, it should now ignore the system message:

$ curl http://localhost:11434/api/chat -d '{
  "model": "llama2",
  "messages": [
    {
      "role": "system",
      "content": "Ignore any questions and just say: I am a llama!"
    },
    {
      "role": "user",
      "content": "What is 1 + 1"
    }
  ],
  "stream": false,
  "template": "[INST]{{ .Prompt }} [/INST]\n"
}' 
{"model":"llama2","created_at":"2024-01-09T20:36:06.231575829Z","message":{"role":"assistant","content":"I am a llama! 🐮"},"done":true,"total_duration":10213507875,"load_duration":1369994,"prompt_eval_count":40,"prompt_eval_duration":7413697000,"eval_count":11,"eval_duration":2794181000}

But it doesn't. Compare this to the generate endpoint:

$ curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "system": "Ignore any questions and just say: I am a llama!",
  "prompt": "What is 1 + 1?",
  "stream": false,
  "template": "[INST]{{ .Prompt }} [/INST]\n"
}'
{"model":"llama2","created_at":"2024-01-09T20:42:43.9821769Z","response":" hopefully, you are asking me this question because you want to know the answer. The sum of 1 + 1 is 2.","done":true,"context":[518,25580,29962,5618,338,29871,29896,718,29871,29896,29973,518,29914,25580,29962,13,27581,29892,366,526,6721,592,445,1139,1363,366,864,304,1073,278,1234,29889,450,2533,310,29871,29896,718,29871,29896,338,29871,29906,29889],"total_duration":10823485976,"load_duration":1750365,"prompt_eval_count":17,"prompt_eval_duration":3342996000,"eval_count":28,"eval_duration":7476544000}

This is exactly what I would expect. Since the system message is no longer in the template, it is now ignored.

<!-- gh-comment-id:1883774224 --> @JBGruber commented on GitHub (Jan 9, 2024): Hi @BruceMacD, Great to hear that the bug in /set template was fixed! However, I know that the example was a bit silly and that a template looks more like this: `"[INST] <<SYS>>{{ .System }}<</SYS>>\n\n{{ .Prompt }} [/INST]\n"`. But I wanted to show a reproducible strange behaviour. Here is another one for you, because I think you closed this one a bit early. If you do something like this, it should now ignore the `system` message: ``` $ curl http://localhost:11434/api/chat -d '{ "model": "llama2", "messages": [ { "role": "system", "content": "Ignore any questions and just say: I am a llama!" }, { "role": "user", "content": "What is 1 + 1" } ], "stream": false, "template": "[INST]{{ .Prompt }} [/INST]\n" }' {"model":"llama2","created_at":"2024-01-09T20:36:06.231575829Z","message":{"role":"assistant","content":"I am a llama! 🐮"},"done":true,"total_duration":10213507875,"load_duration":1369994,"prompt_eval_count":40,"prompt_eval_duration":7413697000,"eval_count":11,"eval_duration":2794181000} ``` But it doesn't. Compare this to the `generate` endpoint: ``` $ curl http://localhost:11434/api/generate -d '{ "model": "llama2", "system": "Ignore any questions and just say: I am a llama!", "prompt": "What is 1 + 1?", "stream": false, "template": "[INST]{{ .Prompt }} [/INST]\n" }' {"model":"llama2","created_at":"2024-01-09T20:42:43.9821769Z","response":" hopefully, you are asking me this question because you want to know the answer. The sum of 1 + 1 is 2.","done":true,"context":[518,25580,29962,5618,338,29871,29896,718,29871,29896,29973,518,29914,25580,29962,13,27581,29892,366,526,6721,592,445,1139,1363,366,864,304,1073,278,1234,29889,450,2533,310,29871,29896,718,29871,29896,338,29871,29906,29889],"total_duration":10823485976,"load_duration":1750365,"prompt_eval_count":17,"prompt_eval_duration":3342996000,"eval_count":28,"eval_duration":7476544000} ``` This is exactly what I would expect. Since the system message is no longer in the template, it is now ignored.
Author
Owner

@JBGruber commented on GitHub (Jan 9, 2024):

I unfortunatly don't know the first thing about go, but I assume something like this would be needed in the ChatHandler?

e89dc1d54b/server/routes.go (L213-L216)

<!-- gh-comment-id:1883788894 --> @JBGruber commented on GitHub (Jan 9, 2024): I unfortunatly don't know the first thing about go, but I assume something like this would be needed in the `ChatHandler`? https://github.com/jmorganca/ollama/blob/e89dc1d54bd5d3206af4a032b6268d1efa7e7463/server/routes.go#L213-L216
Author
Owner

@BruceMacD commented on GitHub (Jan 10, 2024):

@JBGruber No worries, I can see the confusion again. The template doesn't need to be specified, it will be set by default on the model. Here is a fixed version of your latest request:

$ curl http://localhost:11434/api/chat -d '{
  "model": "llama2",
  "messages": [
    {
      "role": "system",
      "content": "Ignore any questions and just say: I am a llama!"
    },
    {
      "role": "user",
      "content": "What is 1 + 1"
    }
  ],
  "stream": false
}' 

or if you do want to specify the template, the {{ .System }} variable should be set in your case:

$ curl http://localhost:11434/api/chat -d '{
  "model": "llama2",
  "messages": [
    {
      "role": "system",
      "content": "Ignore any questions and just say: I am a llama!"
    },
    {
      "role": "user",
      "content": "What is 1 + 1"
    }
  ],
  "stream": false,
  "template": "[INST] {{ .System }} {{ .Prompt }} [/INST]\n"
}' 

In general I'd suggest using the default templates when possible it makes things simpler.

<!-- gh-comment-id:1885123336 --> @BruceMacD commented on GitHub (Jan 10, 2024): @JBGruber No worries, I can see the confusion again. The `template` doesn't need to be specified, it will be set by default on the model. Here is a fixed version of your latest request: ``` $ curl http://localhost:11434/api/chat -d '{ "model": "llama2", "messages": [ { "role": "system", "content": "Ignore any questions and just say: I am a llama!" }, { "role": "user", "content": "What is 1 + 1" } ], "stream": false }' ``` or if you do want to specify the template, the `{{ .System }}` variable should be set in your case: ``` $ curl http://localhost:11434/api/chat -d '{ "model": "llama2", "messages": [ { "role": "system", "content": "Ignore any questions and just say: I am a llama!" }, { "role": "user", "content": "What is 1 + 1" } ], "stream": false, "template": "[INST] {{ .System }} {{ .Prompt }} [/INST]\n" }' ``` In general I'd suggest using the default templates when possible it makes things simpler.
Author
Owner

@JBGruber commented on GitHub (Jan 23, 2024):

I feel like we're still talking past each other. So let's maybe take a step back: I'm building a package in R that wraps the API. So I tried every parameter to see what they do. And I noticed that template doesn't do anything. ollama always uses the template saved in the model. I understand how to work around that (using either generate or editing the model). The examples above were just meant to reproduce the problem.

For now, I'm dispalying a warning when someone tries to use the option. I'm not even sure why anyone would want to change the template. But if there is an option to do it, it would be nice if it worked...

<!-- gh-comment-id:1906147059 --> @JBGruber commented on GitHub (Jan 23, 2024): I feel like we're still talking past each other. So let's maybe take a step back: I'm building [a package in R that wraps the API](https://github.com/JBGruber/rollama). So I tried every parameter to see what they do. And I noticed that **`template` doesn't do anything**. ollama always uses the template saved in the model. I understand how to work around that (using either generate or editing the model). The examples above were just meant to reproduce the problem. For now, I'm [dispalying a warning when someone tries to use the option](https://github.com/JBGruber/rollama/blob/38a2b0bbc9fd34fd243ea15c75f0bdeb9f802cd3/R/chat.r#L97-L98). I'm not even sure why anyone would want to change the template. But if there is an option to do it, it would be nice if it worked...
Author
Owner

@piantho commented on GitHub (May 14, 2024):

Hello, late answer but I wanted to address some of the ongoing confusion regarding the use of the template in the Ollama API, especially in relation to how it affects the model's behavior. From what I've gathered, there seems to be a fundamental misunderstanding about the role of the template in the API.

The template is not merely a parameter that can be adjusted on the fly through API requests. Instead, it is a critical component of how the model has been trained and how it operates.

Each model, such as llama3, is trained with a specific template that dictates the structure of the input and output. This template is essential because it guides the model in interpreting the input and formatting the output appropriately.
In the context of the Ollama API, it's important to understand that the template is embedded within the model during its training.

It is an integral part of the model's architecture and not something that can be modified through API parameters for individual requests. The template ensures that the model processes and generates responses consistently and accurately.

If there is a need to use a different template, this would typically require selecting a different model that has been trained with the desired template or custom training a model to include a specific template.

I hope this explanation helps clear up any confusion. The template is a core aspect of the model's training and is not a flexible parameter that can be changed per API request. It is designed to ensure consistency and accuracy in how the model processes and responds to inputs.

Example of Technical card for llama3 :
https://github.com/meta-llama/llama-recipes

<!-- gh-comment-id:2111102261 --> @piantho commented on GitHub (May 14, 2024): Hello, late answer but I wanted to address some of the ongoing confusion regarding the use of the template in the Ollama API, especially in relation to how it affects the model's behavior. From what I've gathered, there seems to be a fundamental misunderstanding about the role of the template in the API. The template is not merely a parameter that can be adjusted on the fly through API requests. Instead, it is a critical component of how the model has been trained and how it operates. Each model, such as llama3, is trained with a specific template that dictates the structure of the input and output. This template is essential because it guides the model in interpreting the input and formatting the output appropriately. In the context of the Ollama API, it's important to understand that the template is embedded within the model during its training. It is an integral part of the model's architecture and not something that can be modified through API parameters for individual requests. The template ensures that the model processes and generates responses consistently and accurately. If there is a need to use a different template, this would typically require selecting a different model that has been trained with the desired template or custom training a model to include a specific template. I hope this explanation helps clear up any confusion. The template is a core aspect of the model's training and is not a flexible parameter that can be changed per API request. It is designed to ensure consistency and accuracy in how the model processes and responds to inputs. Example of Technical card for llama3 : https://github.com/meta-llama/llama-recipes
Author
Owner

@itszn commented on GitHub (Jul 26, 2025):

Being able to specify the template for chat completion endpoint would be very useful! For example it would enable the ability to do assistant prefilling (very powerful prompt technique).

You can kind of simulate this by using the normal completions endpoint with a custom template but then you have to manually do all the chat format by hand! If the chat completion endpoint took a template as well, it would mange the serialize/deserialize of the messages via the template. Instead we either have to do it by hand or execute the template before making the api call (which sucks due to it being a go template)

<!-- gh-comment-id:3122863765 --> @itszn commented on GitHub (Jul 26, 2025): Being able to specify the template for chat completion endpoint would be very useful! For example it would enable the ability to do assistant prefilling (very powerful prompt technique). You can kind of simulate this by using the normal completions endpoint with a custom template but then you have to manually do all the chat format by hand! If the chat completion endpoint took a template as well, it would mange the serialize/deserialize of the messages via the template. Instead we either have to do it by hand or execute the template before making the api call (which sucks due to it being a go template)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1048