[GH-ISSUE #3650] Default command R Modelfile template does not respect specification #28010

Closed
opened 2026-04-22 05:43:47 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @GiovanniGatti on GitHub (Apr 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3650

What is the issue?

After reading the documentation of Command R I found it strange that the (mandatory) <BOS_TOKEN> wans't specified in the default Modelfile.template here.

I'm not sure where is the best place to report this sort of issue.

What did you expect to see?

No response

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

No response

Architecture

No response

Platform

No response

Ollama version

No response

GPU

No response

GPU info

No response

CPU

No response

Other software

No response

Originally created by @GiovanniGatti on GitHub (Apr 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3650 ### What is the issue? After reading the documentation of [Command R](https://docs.cohere.com/docs/prompting-command-r#components-of-a-structured-prompt) I found it strange that the (mandatory) `<BOS_TOKEN>` wans't specified in the default Modelfile.template [here](https://ollama.com/library/command-r:latest/blobs/42499e38acdf). I'm not sure where is the best place to report this sort of issue. ### What did you expect to see? _No response_ ### Steps to reproduce _No response_ ### Are there any recent changes that introduced the issue? _No response_ ### OS _No response_ ### Architecture _No response_ ### Platform _No response_ ### Ollama version _No response_ ### GPU _No response_ ### GPU info _No response_ ### CPU _No response_ ### Other software _No response_
GiteaMirror added the bug label 2026-04-22 05:43:47 -05:00
Author
Owner

@jukofyork commented on GitHub (Apr 15, 2024):

The wrapped llama.cpp server adds the <BOS> token it sees in the GGUF file right at the start:

https://github.com/ggerganov/llama.cpp/tree/master/examples/server

A BOS token is inserted at the start, if all of the following conditions are true:

- The prompt is a string or an array with the first element given as a string
- The model's `tokenizer.ggml.add_bos_token` metadata is `true`
- The system prompt is empty

From the GGUF:

llm_load_print_meta: BOS token        = 5 '<BOS_TOKEN>'

But it won't add it to following prompts if that is needed (just about to read the prompt template in detail myself now).

<!-- gh-comment-id:2057246901 --> @jukofyork commented on GitHub (Apr 15, 2024): The wrapped llama.cpp server adds the `<BOS>` token it sees in the GGUF file right at the start: https://github.com/ggerganov/llama.cpp/tree/master/examples/server > A BOS token is inserted at the start, if all of the following conditions are true: > > ``` > - The prompt is a string or an array with the first element given as a string > - The model's `tokenizer.ggml.add_bos_token` metadata is `true` > - The system prompt is empty > ``` From the GGUF: ``` llm_load_print_meta: BOS token = 5 '<BOS_TOKEN>' ``` But it won't add it to following prompts if that is needed (just about to read the prompt template in detail myself now).
Author
Owner

@jukofyork commented on GitHub (Apr 15, 2024):

Probably the bigger problem is the ollama.com prompt is adding <|END_OF_TURN_TOKEN|> after the response, when it too is defined in the GUUF:

llm_load_print_meta: EOS token        = 255001 '<|END_OF_TURN_TOKEN|>'

whereas this (for all other models) should get added automatically... Sadly Ollama is so opaque compared to llama.cpp's server; I don't know if it will add this or not... 😕 We really need some clear way to debug this.


From reading the specs, using command-r in 'Chat History' mode should use this template:

TEMPLATE """{{if .System}}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{.System}}<|END_OF_TURN_TOKEN|>{{end}}<|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{.Prompt}}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{.Response}}"""

and add the <|END_OF_TURN_TOKEN|> back on itself.

<!-- gh-comment-id:2057316439 --> @jukofyork commented on GitHub (Apr 15, 2024): Probably the bigger problem is the [ollama.com](https://ollama.com/library/command-r) prompt is adding `<|END_OF_TURN_TOKEN|>` after the response, when it too is defined in the GUUF: ``` llm_load_print_meta: EOS token = 255001 '<|END_OF_TURN_TOKEN|>' ``` whereas this (for all other models) should get added automatically... Sadly Ollama is so opaque compared to llama.cpp's server; I don't know if it will add this or not... :confused: We really need some clear way to debug this. --- From reading the specs, using `command-r` in 'Chat History' mode should use this template: ``` TEMPLATE """{{if .System}}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{.System}}<|END_OF_TURN_TOKEN|>{{end}}<|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{.Prompt}}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{.Response}}""" ``` and add the `<|END_OF_TURN_TOKEN|>` back on itself.
Author
Owner

@jukofyork commented on GitHub (Apr 15, 2024):

Yeah, that seems to be the correct template. I just tested it with 20+ back and forth conversation and it was completely coherent.

This model is horrible at coding and writes pretty bad 1-shot stories, but oh boy can it self-critique and refine... Hugely impressed now! 😮

<!-- gh-comment-id:2057504067 --> @jukofyork commented on GitHub (Apr 15, 2024): Yeah, that seems to be the correct template. I just tested it with 20+ back and forth conversation and it was completely coherent. This model is horrible at coding and writes pretty bad 1-shot stories, but oh boy can it self-critique and refine... Hugely impressed now! :open_mouth:
Author
Owner

@jmorganca commented on GitHub (Apr 15, 2024):

Will merge this with https://github.com/ollama/ollama/issues/1977

<!-- gh-comment-id:2057620243 --> @jmorganca commented on GitHub (Apr 15, 2024): Will merge this with https://github.com/ollama/ollama/issues/1977
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28010