[GH-ISSUE #1977] Mistakes in template definitions on models available to download from https://ollama.ai #1139

Closed
opened 2026-04-12 10:52:46 -05:00 by GiteaMirror · 18 comments
Owner

Originally created by @jukofyork on GitHub (Jan 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1977

Originally assigned to: @jmorganca on GitHub.

Hi,

Some of the mistakes in the TEMPLATE definitions for the models you can download from https://ollama.ai are hurting the models to varying degrees. I only found this by accident when experimenting with the API to use some of the code completion / code editing prompts used by the continue project (https://github.com/continuedev/continue/tree/main/core/llm/templates).

I've sourced all these primarily by looking at the original tokenizer config and failing that, looking through the official descriptions and/or their respective official Github discussions. I've concentrated on the original/official models (other than phind-codellama) as it's hard to find any concrete info on a lot of the "bootleg" fine-tuned models.

The ones which are particularly effected are:

  • codellama missing the space before the response severely hurts the performance when presented with a large section code. There is a lot of 'cargo cult' prompt templates for codellama going around, but this one can be confirmed from their official release page and the tokenizer config.
  • deepseek-llm having the system message prepended to every message seems to increase the chance of responding in Chinese Unicode characters (Deepseek say specifically it wasn't trained to use a system message).
  • deepseek-coder quickly fills its context when discussing large sections of code and will start to repeat the system message back at you before completely descending into gibberish (this happens very quickly if using a detailed / long custom system message).

llama2 doesn't seem too effected by the missing the space before the response , but again this template can be confirmed from their official release page and the tokenizer config.

deepseek-llm, mixtral and mistral absolutely should NOT have a space or newline before the response or they will often respond with gibberish and/or Chinese Unicode characters.

The official mixtral huggingface page actually tells you a slightly wrong template format, but the original tokenizer config is the same as mistral.

The suggestion for adding "Response" to phind-codellama is from the huggingface discussion, so can't confirm if this is true or not.

codellama:34b-instruct:

TEMPLATE """<s>[INST] {{ if and .First .System }}<<SYS>>
{{ .System }}
<</SYS>>

{{ end }}{{ .Prompt }} [/INST] {{ .Response }}"""

deepseek-coder:33b-instruct:

TEMPLATE """{{ if and .First .System }}{{ .System }}
{{ end }}### Instruction:
{{ .Prompt }}
### Response:
{{ .Response }}"""

deepseek-llm:67b-chat:

TEMPLATE """User: {{ if and .First .System }}{{ .System }} {{ end }}{{ .Prompt }}

Assistant:{{ .Response }}"""

llama2:70b-chat:

TEMPLATE """<s>[INST] {{ if and .First .System }}<<SYS>>
{{ .System }}
<</SYS>>

{{ end }}{{ .Prompt }} [/INST] {{ .Response }}"""

mixtral:8x7b-instruct-v0.1 & mistral:7b-instruct-v0.2:

TEMPLATE """{{ if .First }}<s>{{ end }}[INST] {{ if and .First .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]{{ .Response }}"""

phind-codellama:34b-v2:

TEMPLATE """{{ if and .First .System }}### System Prompt
{{ .System }}

{{ end }}### User Message
{{ .Prompt }}

### Assistant Response
{{ .Response }}"""

yi:34b-chat:

TEMPLATE """{{ if and .First .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}"""

These two aren't listed on https://ollama.ai but also use the same "ChatML" template as yi:

mpt:30B-chat:

TEMPLATE """{{ if and .First .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}"""

qwen:72b-chat:

TEMPLATE """{{ if and .First .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}"""

Are there any other "non-bootleg" models I should look at? I might as well do them too if there are any.

Originally created by @jukofyork on GitHub (Jan 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1977 Originally assigned to: @jmorganca on GitHub. Hi, Some of the mistakes in the `TEMPLATE` definitions for the models you can download from https://ollama.ai are hurting the models to varying degrees. I only found this by accident when experimenting with the API to use some of the code completion / code editing prompts used by the continue project (https://github.com/continuedev/continue/tree/main/core/llm/templates). I've sourced all these primarily by looking at the original tokenizer config and failing that, looking through the official descriptions and/or their respective official Github discussions. I've concentrated on the original/official models (other than `phind-codellama`) as it's hard to find any concrete info on a lot of the "bootleg" fine-tuned models. The ones which are particularly effected are: - `codellama` missing the space before the response **severely** hurts the performance when presented with a large section code. There is a lot of 'cargo cult' prompt templates for `codellama` going around, but this one can be confirmed from their official release page and the tokenizer config. - `deepseek-llm` having the system message prepended to every message seems to increase the chance of responding in Chinese Unicode characters (Deepseek say specifically it wasn't trained to use a system message). - `deepseek-coder` quickly fills its context when discussing large sections of code and will start to repeat the system message back at you before completely descending into gibberish (this happens very quickly if using a detailed / long custom system message). `llama2` doesn't seem too effected by the missing the space before the response , but again this template can be confirmed from their official release page and the tokenizer config. `deepseek-llm`, `mixtral` and `mistral` absolutely should **NOT** have a space or newline before the response or they will often respond with gibberish and/or Chinese Unicode characters. The official `mixtral` huggingface page actually tells you a slightly wrong template format, but the original tokenizer config is the same as `mistral`. The suggestion for adding "**Response**" to `phind-codellama` is from the huggingface discussion, so can't confirm if this is true or not. **codellama:34b-instruct:** ``` TEMPLATE """<s>[INST] {{ if and .First .System }}<<SYS>> {{ .System }} <</SYS>> {{ end }}{{ .Prompt }} [/INST] {{ .Response }}""" ``` ---- **deepseek-coder:33b-instruct:** ``` TEMPLATE """{{ if and .First .System }}{{ .System }} {{ end }}### Instruction: {{ .Prompt }} ### Response: {{ .Response }}""" ``` ---- **deepseek-llm:67b-chat:** ``` TEMPLATE """User: {{ if and .First .System }}{{ .System }} {{ end }}{{ .Prompt }} Assistant:{{ .Response }}""" ``` ---- **llama2:70b-chat:** ``` TEMPLATE """<s>[INST] {{ if and .First .System }}<<SYS>> {{ .System }} <</SYS>> {{ end }}{{ .Prompt }} [/INST] {{ .Response }}""" ``` ---- **mixtral:8x7b-instruct-v0.1 & mistral:7b-instruct-v0.2:** ``` TEMPLATE """{{ if .First }}<s>{{ end }}[INST] {{ if and .First .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]{{ .Response }}""" ``` ---- **phind-codellama:34b-v2:** ``` TEMPLATE """{{ if and .First .System }}### System Prompt {{ .System }} {{ end }}### User Message {{ .Prompt }} ### Assistant Response {{ .Response }}""" ``` ---- **yi:34b-chat:** ``` TEMPLATE """{{ if and .First .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}<|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant {{ .Response }}""" ``` ---- These two aren't listed on https://ollama.ai but also use the same "ChatML" template as `yi`: **mpt:30B-chat:** ``` TEMPLATE """{{ if and .First .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}<|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant {{ .Response }}""" ``` ---- **qwen:72b-chat:** ``` TEMPLATE """{{ if and .First .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}<|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant {{ .Response }}""" ``` ---- Are there any other "non-bootleg" models I should look at? I might as well do them too if there are any.
GiteaMirror added the bugmodel labels 2026-04-12 10:52:46 -05:00
Author
Owner

@jukofyork commented on GitHub (Jan 13, 2024):

By the way if anybody else wants to learn more about the template syntax then this is the reference page:

https://pkg.go.dev/text/template

I was pretty confused to start with when I tried to grep the whole project and could find no reference to "if" or "and" anywhere!

<!-- gh-comment-id:1890593073 --> @jukofyork commented on GitHub (Jan 13, 2024): By the way if anybody else wants to learn more about the template syntax then this is the reference page: https://pkg.go.dev/text/template I was pretty confused to start with when I tried to grep the whole project and could find no reference to "if" or "and" anywhere!
Author
Owner

@scpedicini commented on GitHub (Jan 13, 2024):

I think being able to see how the final transformed input -> template -> output chain in the logs would help catch these kinds of issues - linking this enhancement feature:

https://github.com/jmorganca/ollama/issues/1533

<!-- gh-comment-id:1890593084 --> @scpedicini commented on GitHub (Jan 13, 2024): I think being able to see how the final transformed input -> template -> output chain in the logs would help catch these kinds of issues - linking this enhancement feature: https://github.com/jmorganca/ollama/issues/1533
Author
Owner

@jukofyork commented on GitHub (Jan 13, 2024):

I think a lot of the other models, even if concrete template formats can't be sourced, should probably have their templates changed to use the {{ if and .First .System }}...{{ .System }}...{{ end }} statement.

As it is the system message is often getting added to every message. This might sometimes be a good idea if you don't want to lose the system message, but by default it shouldn't be doing this and particular care should be taken as to where the system message is added if intentionally including it each time.

<!-- gh-comment-id:1890594937 --> @jukofyork commented on GitHub (Jan 13, 2024): I think a lot of the other models, even if concrete template formats can't be sourced, should probably have their templates changed to use the `{{ if and .First .System }}...{{ .System }}...{{ end }}` statement. As it is the system message is often getting added to every message. This might sometimes be a good idea if you don't want to lose the system message, but by default it shouldn't be doing this and particular care should be taken as to where the system message is added if intentionally including it each time.
Author
Owner

@jmorganca commented on GitHub (Jan 13, 2024):

Thank you so much for the work to go through all of the templates @jukofyork (both in the models on ollama.ai but also in their respective repos on HF and GitHub). Will get this fixed

<!-- gh-comment-id:1890662905 --> @jmorganca commented on GitHub (Jan 13, 2024): Thank you so much for the work to go through all of the templates @jukofyork (both in the models on ollama.ai but also in their respective repos on HF and GitHub). Will get this fixed
Author
Owner

@jukofyork commented on GitHub (Jan 13, 2024):

Thank you so much for the work to go through all of the templates @jukofyork (both in the models on ollama.ai but also in their respective repos on HF and GitHub). Will get this fixed

No problem and if there are any other original/official models you know of then I can try to find the correct prompt for them too.

I don't think it's really possible to find the prompt format for a lot of the fine-tuned models thought. Most seem to be training on a mix of several different/merged datasets and I don't think even the creators know the correct format sometimes.

<!-- gh-comment-id:1890731042 --> @jukofyork commented on GitHub (Jan 13, 2024): > Thank you so much for the work to go through all of the templates @jukofyork (both in the models on ollama.ai but also in their respective repos on HF and GitHub). Will get this fixed No problem and if there are any other original/official models you know of then I can try to find the correct prompt for them too. I don't think it's really possible to find the prompt format for a lot of the fine-tuned models thought. Most seem to be training on a mix of several different/merged datasets and I don't think even the creators know the correct format sometimes.
Author
Owner

@nathanpbell commented on GitHub (Jan 15, 2024):

I've noticed a couple other errors in the models available from the library:

  1. mistral models have numCtx defaulting to 2048 instead of 4096 (actually 32568 is probably the correct value). I can't tell fully, but I think Ollama is truncating down to numCtx before loading the prompt into the model?

  2. mistrallite's tokenizer appears broken. Mistrallite is a long context fine tune of Mistral from the Amazon team, and the prompt format is different than Mistral's and introduces 3 new tokens. When passing the prompt through api/generate, it doesn't appear like those new strings are being properly parsed into the new token values. Full disclosure: I'm new to this and I'm using Mistrallite through LangChain -> Ollama and so the bug may be somewhere between there, so forgive me if my hunch is wrong that this is a bug in the model uploaded to Ollama library.

<!-- gh-comment-id:1892829813 --> @nathanpbell commented on GitHub (Jan 15, 2024): I've noticed a couple other errors in the models available from the library: 1. `mistral` models have numCtx defaulting to 2048 instead of 4096 (actually 32568 is probably the correct value). I can't tell fully, but I think Ollama is truncating down to numCtx before loading the prompt into the model? 2. `mistrallite`'s tokenizer appears broken. Mistrallite is a long context fine tune of Mistral from the Amazon team, and the prompt format is different than Mistral's and introduces 3 new tokens. When passing the prompt through api/generate, it doesn't appear like those new strings are being properly parsed into the new token values. Full disclosure: I'm new to this and I'm using Mistrallite through LangChain -> Ollama and so the bug may be somewhere between there, so forgive me if my hunch is wrong that this is a bug in the model uploaded to Ollama library.
Author
Owner

@jukofyork commented on GitHub (Jan 15, 2024):

I've noticed a couple other errors in the models available from the library:

1. `mistral` models have numCtx defaulting to 2048 instead of 4096 (actually 32568 is probably the correct value). I can't tell fully, but I think Ollama is truncating down to numCtx before loading the prompt into the model?

Yeah, I'm still none the wiser what the Mistral and Mixtral models' context actually is. The official pages says they were both trained on 8k context. But then other info says it's 32k.Then yet more info says Mistral uses a sliding window and is really just 8k (or even 4k) and Mixtral was trained to use 32k straight off and the sliding window for it was a bug on release.

<!-- gh-comment-id:1892834412 --> @jukofyork commented on GitHub (Jan 15, 2024): > I've noticed a couple other errors in the models available from the library: > > 1. `mistral` models have numCtx defaulting to 2048 instead of 4096 (actually 32568 is probably the correct value). I can't tell fully, but I think Ollama is truncating down to numCtx before loading the prompt into the model? Yeah, I'm still none the wiser what the Mistral and Mixtral models' context actually is. The official pages says they were both trained on 8k context. But then other info says it's 32k.Then yet more info says Mistral uses a sliding window and is really just 8k (or even 4k) and Mixtral was trained to use 32k straight off and the sliding window for it was a bug on release.
Author
Owner

@nathanpbell commented on GitHub (Jan 15, 2024):

I believe the right value is 32K. The sliding window is 4K which effects performance of prompts that are outside that window, but as far as I can tell, we shouldn't be truncating anything less than 32K before passing it to the model. But that's my novice understanding.

Anecdotally, I've tested the model's ability to recall text in long contexts using the default settings in "ollama pull mistral" and it can't remember anything past 2K. When I modify the call to use an 8K context window it is able to recall tokens outside of the 2K window that seems to be the ollama default.

I think the fix is that the Modelfile for mistral and it's variants should specify a num_ctx of 32K

<!-- gh-comment-id:1892836108 --> @nathanpbell commented on GitHub (Jan 15, 2024): I believe the right value is 32K. The sliding window is 4K which effects performance of prompts that are outside that window, but as far as I can tell, we shouldn't be truncating anything less than 32K before passing it to the model. But that's my novice understanding. Anecdotally, I've tested the model's ability to recall text in long contexts using the default settings in "ollama pull mistral" and it can't remember anything past 2K. When I modify the call to use an 8K context window it is able to recall tokens outside of the 2K window that seems to be the ollama default. I think the fix is that the Modelfile for mistral and it's variants should specify a num_ctx of 32K
Author
Owner

@jukofyork commented on GitHub (Jan 15, 2024):

I believe the right value is 32K. The sliding window is 4K which effects performance of prompts that are outside that window, but as far as I can tell, we shouldn't be truncating anything less than 32K before passing it to the model. But that's my novice understanding.

Is this for Mistral or Mixtral? I only ask because a lot on the SillyTaven reddit report that Mistral runs into problems around 8k context (or possibly even 6.5k IIRC?).

<!-- gh-comment-id:1892837743 --> @jukofyork commented on GitHub (Jan 15, 2024): > I believe the right value is 32K. The sliding window is 4K which effects performance of prompts that are outside that window, but as far as I can tell, we shouldn't be truncating anything less than 32K before passing it to the model. But that's my novice understanding. Is this for `Mistral` or `Mixtral`? I only ask because a lot on the SillyTaven reddit report that `Mistral` runs into problems around 8k context (or possibly even 6.5k IIRC?).
Author
Owner

@nathanpbell commented on GitHub (Jan 15, 2024):

The original Mistral (7B and it's variants including instruct-v0.1, v0.2, etc.).

The way the sliding window works - you'll see degradation after the 4K sliding window (so it's best performance is in the 4k), but that performance should trail off the longer the context (in increments of 4K) all the way to 32K where it will stop "remembering" anything beyond that.

My experience with Mistral in Ollama using the default Modelfile is that rather than the gradual performance degradation you'd expect after 4k, it actually is only sending 2K of tokens and has a steep cliff drop off in performance (it can't remember anything after 2k). Passing in a num_ctx > 2K at runtime fixes that.

I propose that should be the default in the Modelfile, but I don't think the Ollama model library is in a github repo anywhere that we can generate pull requests. Please correct me if I'm wrong.

<!-- gh-comment-id:1892838014 --> @nathanpbell commented on GitHub (Jan 15, 2024): The original Mistral (7B and it's variants including instruct-v0.1, v0.2, etc.). The way the sliding window works - you'll see degradation after the 4K sliding window (so it's best performance is in the 4k), but that performance should trail off the longer the context (in increments of 4K) all the way to 32K where it will stop "remembering" anything beyond that. My experience with Mistral in Ollama using the default Modelfile is that rather than the gradual performance degradation you'd expect after 4k, it actually is only sending 2K of tokens and has a steep cliff drop off in performance (it can't remember anything after 2k). Passing in a num_ctx > 2K at runtime fixes that. I propose that should be the default in the Modelfile, but I don't think the Ollama model library is in a github repo anywhere that we can generate pull requests. Please correct me if I'm wrong.
Author
Owner

@jukofyork commented on GitHub (Jan 15, 2024):

Ah, thanks. I'm actually just running everything but the coding models at 4k context for now as the num_batch bug makes it too fidly to find the right value.

<!-- gh-comment-id:1892843428 --> @jukofyork commented on GitHub (Jan 15, 2024): Ah, thanks. I'm actually just running everything but the coding models at 4k context for now as the `num_batch` bug makes it too fidly to find the right value.
Author
Owner

@nathanpbell commented on GitHub (Jan 15, 2024):

I should add one other thing, it sounds like Mistral's sliding window attention (SWA) is not actually implemented in llama.cpp (which Ollama uses) and so almost assuredly doesn't work the way described in their paper. But it does "work" in that it can generate coherent responses.

Llama.cpp discussion: https://github.com/ggerganov/llama.cpp/issues/3867#issuecomment-1787815958

<!-- gh-comment-id:1892862755 --> @nathanpbell commented on GitHub (Jan 15, 2024): I should add one other thing, it sounds like Mistral's sliding window attention (SWA) is not actually implemented in llama.cpp (which Ollama uses) and so almost assuredly doesn't work the way described in their paper. But it does "work" in that it can generate coherent responses. Llama.cpp discussion: https://github.com/ggerganov/llama.cpp/issues/3867#issuecomment-1787815958
Author
Owner

@cognitivetech commented on GitHub (Feb 7, 2024):

in fact, according to the mistral paper its trained on 8k context

Parameter Value
dim 4096
n_layers 32
head_dim 128
hidden_dim 14336
n_heads 32
n_kv_heads 8
window_size 4096
context_len 8192
vocab_size 32000

the 32k context was a misinterpretation from the beginning.. see more info on this discussion
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/discussions/43

<!-- gh-comment-id:1931677930 --> @cognitivetech commented on GitHub (Feb 7, 2024): in fact, according to the mistral paper its [trained on 8k context](https://arxiv.org/pdf/2310.06825.pdf) | Parameter | Value | | -- | -- | | dim | 4096 | | n_layers | 32 | | head_dim | 128 | | hidden_dim | 14336 | | n_heads | 32 | | n_kv_heads | 8 | | window_size | 4096 | | context_len | 8192 | | vocab_size | 32000 | the 32k context was a misinterpretation from the beginning.. see more info on this discussion https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/discussions/43
Author
Owner

@jukofyork commented on GitHub (Feb 13, 2024):

I spent all afternoon running different experiments and am actually shocked at how much finding the proper prompt has improved all 3 models:

It's made Mistral about as good as the other 2 were before, and the other 2 are now MUCH better; with all the weirdness (ie: where they claimed to make changes to code when they didn't etc) gone now.

I've marked the spaces with '■' so they stand out, but you will need to change them. Also remember if you aren't using Ollama or llama.cpp you might need to add back the <s> prefix:


Mistral and Miqu:

TEMPLATE """{{ if and .First .System }}[INST]■{{ .System }}

Please await further instructions and simply respond with 'Understood'.■[/INST]
Understood</s>■
{{ end }}[INST]■{{ .Prompt }}■[/INST]
{{ .Response }}"""

This agrees with the example on the Mistral page:

text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2


Mixtral:

TEMPLATE """{{ if and .First .System }}■[INST]■{{ .System }}

Please await further instructions and simply respond with 'Understood'.■[/INST]■
Understood</s>
{{ end }}■[INST]■{{ .Prompt }}■[/INST]■
{{ .Response }}"""

This sort of agrees with the example on the Mixtral page:

<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]

https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

But it seems using the newlines before the response like the Mistral example is essential.

<!-- gh-comment-id:1942302584 --> @jukofyork commented on GitHub (Feb 13, 2024): I spent all afternoon running different experiments and am actually shocked at how much finding the proper prompt has improved all 3 models: It's made *Mistral* about as good as the other 2 were before, and the other 2 are now **MUCH** better; with all the weirdness (ie: where they claimed to make changes to code when they didn't etc) gone now. I've marked the spaces with '■' so they stand out, but you will need to change them. Also remember if you aren't using Ollama or llama.cpp you might need to add back the `<s>` prefix: --- `Mistral` and `Miqu`: ``` TEMPLATE """{{ if and .First .System }}[INST]■{{ .System }} Please await further instructions and simply respond with 'Understood'.■[/INST] Understood</s>■ {{ end }}[INST]■{{ .Prompt }}■[/INST] {{ .Response }}""" ``` This agrees with the example on the Mistral page: ``` text = "<s>[INST] What is your favourite condiment? [/INST]" "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> " "[INST] Do you have mayonnaise recipes? [/INST]" ``` https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 --- `Mixtral`: ``` TEMPLATE """{{ if and .First .System }}■[INST]■{{ .System }} Please await further instructions and simply respond with 'Understood'.■[/INST]■ Understood</s> {{ end }}■[INST]■{{ .Prompt }}■[/INST]■ {{ .Response }}""" ``` This sort of agrees with the example on the Mixtral page: ``` <s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST] ``` https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 But it seems using the newlines before the response like the Mistral example is essential.
Author
Owner

@jukofyork commented on GitHub (Feb 15, 2024):

I actually got both miqu and phind-codellama to give up their real training prompts. Explanation here:

https://huggingface.co/miqudev/miqu-1-70b/discussions/25

TEMPLATE """{{ if and .First .System }}{{ .System }}

{{ end }}[INST] {{ .Prompt }}
[/INST]{{ .Response }}"""

https://huggingface.co/Phind/Phind-CodeLlama-34B-v2/discussions/31

TEMPLATE """{{ if and .First .System }}{{ .System }}

{{ end }}### Instruction:
{{ .Prompt }}

### Response:
{{ .Response }}"""

miqu is MUCH better with the correct prompt; like unbelievably better!!! 😱

<!-- gh-comment-id:1945192766 --> @jukofyork commented on GitHub (Feb 15, 2024): I actually got both `miqu` and `phind-codellama` to give up their real training prompts. Explanation here: https://huggingface.co/miqudev/miqu-1-70b/discussions/25 ``` TEMPLATE """{{ if and .First .System }}{{ .System }} {{ end }}[INST] {{ .Prompt }} [/INST]{{ .Response }}""" ``` https://huggingface.co/Phind/Phind-CodeLlama-34B-v2/discussions/31 ``` TEMPLATE """{{ if and .First .System }}{{ .System }} {{ end }}### Instruction: {{ .Prompt }} ### Response: {{ .Response }}""" ``` `miqu` is ***MUCH*** better with the correct prompt; like unbelievably better!!! :scream:
Author
Owner

@cognitivetech commented on GitHub (Feb 22, 2024):

may as well thow my two cents in the mix.. I have tested a lot of things, but this works really well for mistral models:

TEMPLATE """
{{ if .First  }}<s>{{ if .System  }}[INST]{{ .System }}[/INST]{{ end }}</s>{{ end }}[INST] {{ .Prompt }} [/INST]
"""
PARAMETER num_ctx 8000
PARAMETER num_gpu -1
PARAMETER num_predict 4000

Unless you have special personality, don't use a system prompt, it works better.

Even if you don't have few-shot prompt or chat history, still include the <s></s>

<!-- gh-comment-id:1960148761 --> @cognitivetech commented on GitHub (Feb 22, 2024): may as well thow my two cents in the mix.. I have tested a lot of things, but this works really well for mistral models: ``` TEMPLATE """ {{ if .First }}<s>{{ if .System }}[INST]{{ .System }}[/INST]{{ end }}</s>{{ end }}[INST] {{ .Prompt }} [/INST] """ PARAMETER num_ctx 8000 PARAMETER num_gpu -1 PARAMETER num_predict 4000 ``` Unless you have special personality, don't use a system prompt, it works better. Even if you don't have few-shot prompt or chat history, still include the `<s></s>`
Author
Owner

@jukofyork commented on GitHub (Apr 16, 2024):

wizardlm2

{{ if .System }}{{ .System }} {{ end }}{{ if .Prompt }}USER: {{ .Prompt }} {{ end }}ASSISTANT: {{ .Response }}

Pretty sure this shouldn't have that extra space between ASSISTANT: and {{ .Response }}.

command-r-plus

{{ if .System }}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{ .System }}<|END_OF_TURN_TOKEN|>{{ end }}{{ if .Prompt }}<|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{ .Prompt }}<|END_OF_TURN_TOKEN|>{{ end }}<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{ .Response }}<|END_OF_TURN_TOKEN|>

Shouldn't have the <|END_OF_TURN_TOKEN|> part after the response as it's in the GGUF file as an EOS token already.

command-r has the same problem in its template too (as was pointed out in another thread linked above).

<!-- gh-comment-id:2059168634 --> @jukofyork commented on GitHub (Apr 16, 2024): [wizardlm2](https://ollama.com/library/wizardlm2) ``` {{ if .System }}{{ .System }} {{ end }}{{ if .Prompt }}USER: {{ .Prompt }} {{ end }}ASSISTANT: {{ .Response }} ``` Pretty sure this shouldn't have that extra space between `ASSISTANT:` and `{{ .Response }}`. [command-r-plus](https://ollama.com/library/command-r-plus) ``` {{ if .System }}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{ .System }}<|END_OF_TURN_TOKEN|>{{ end }}{{ if .Prompt }}<|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{ .Prompt }}<|END_OF_TURN_TOKEN|>{{ end }}<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{ .Response }}<|END_OF_TURN_TOKEN|> ``` Shouldn't have the `<|END_OF_TURN_TOKEN|>` part after the response as it's in the GGUF file as an `EOS` token already. [command-r](https://ollama.com/library/command-r) has the same problem in its template too (as was pointed out in another thread linked above).
Author
Owner

@jmorganca commented on GitHub (Jan 14, 2025):

This is now be possible (e.g. https://ollama.com/library/llama3.3/blobs/948af2743fc7). @jukofyork noted, thanks for spotting these. If you find any more don't hesitate to open an issue!

<!-- gh-comment-id:2588792913 --> @jmorganca commented on GitHub (Jan 14, 2025): This is now be possible (e.g. https://ollama.com/library/llama3.3/blobs/948af2743fc7). @jukofyork noted, thanks for spotting these. If you find any more don't hesitate to open an issue!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1139