[GH-ISSUE #1977] Mistakes in template definitions on models available to download from https://ollama.ai #1139

New Issue

GiteaMirror · 2026-04-12T10:52:46-05:00

GiteaMirror commented

2026-04-12 10:52:46 -05:00

Originally created by @jukofyork on GitHub (Jan 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1977

Originally assigned to: @jmorganca on GitHub.

Hi,

Some of the mistakes in the TEMPLATE definitions for the models you can download from https://ollama.ai are hurting the models to varying degrees. I only found this by accident when experimenting with the API to use some of the code completion / code editing prompts used by the continue project (https://github.com/continuedev/continue/tree/main/core/llm/templates).

I've sourced all these primarily by looking at the original tokenizer config and failing that, looking through the official descriptions and/or their respective official Github discussions. I've concentrated on the original/official models (other than phind-codellama) as it's hard to find any concrete info on a lot of the "bootleg" fine-tuned models.

The ones which are particularly effected are:

codellama missing the space before the response severely hurts the performance when presented with a large section code. There is a lot of 'cargo cult' prompt templates for codellama going around, but this one can be confirmed from their official release page and the tokenizer config.
deepseek-llm having the system message prepended to every message seems to increase the chance of responding in Chinese Unicode characters (Deepseek say specifically it wasn't trained to use a system message).
deepseek-coder quickly fills its context when discussing large sections of code and will start to repeat the system message back at you before completely descending into gibberish (this happens very quickly if using a detailed / long custom system message).

llama2 doesn't seem too effected by the missing the space before the response , but again this template can be confirmed from their official release page and the tokenizer config.

deepseek-llm, mixtral and mistral absolutely should NOT have a space or newline before the response or they will often respond with gibberish and/or Chinese Unicode characters.

The official mixtral huggingface page actually tells you a slightly wrong template format, but the original tokenizer config is the same as mistral.

The suggestion for adding "Response" to phind-codellama is from the huggingface discussion, so can't confirm if this is true or not.

codellama:34b-instruct:

TEMPLATE """<s>[INST] {{ if and .First .System }}<<SYS>>
{{ .System }}
<</SYS>>

{{ end }}{{ .Prompt }} [/INST] {{ .Response }}"""

deepseek-coder:33b-instruct:

TEMPLATE """{{ if and .First .System }}{{ .System }}
{{ end }}### Instruction:
{{ .Prompt }}
### Response:
{{ .Response }}"""

deepseek-llm:67b-chat:

TEMPLATE """User: {{ if and .First .System }}{{ .System }} {{ end }}{{ .Prompt }}

Assistant:{{ .Response }}"""

llama2:70b-chat:

TEMPLATE """<s>[INST] {{ if and .First .System }}<<SYS>>
{{ .System }}
<</SYS>>

{{ end }}{{ .Prompt }} [/INST] {{ .Response }}"""

mixtral:8x7b-instruct-v0.1 & mistral:7b-instruct-v0.2:

TEMPLATE """{{ if .First }}<s>{{ end }}[INST] {{ if and .First .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]{{ .Response }}"""

phind-codellama:34b-v2:

TEMPLATE """{{ if and .First .System }}### System Prompt
{{ .System }}

{{ end }}### User Message
{{ .Prompt }}

### Assistant Response
{{ .Response }}"""

yi:34b-chat:

TEMPLATE """{{ if and .First .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}"""

These two aren't listed on https://ollama.ai but also use the same "ChatML" template as yi:

mpt:30B-chat:

TEMPLATE """{{ if and .First .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}"""

qwen:72b-chat:

TEMPLATE """{{ if and .First .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}"""

Are there any other "non-bootleg" models I should look at? I might as well do them too if there are any.

Originally created by @jukofyork on GitHub (Jan 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1977 Originally assigned to: @jmorganca on GitHub. Hi, Some of the mistakes in the `TEMPLATE` definitions for the models you can download from https://ollama.ai are hurting the models to varying degrees. I only found this by accident when experimenting with the API to use some of the code completion / code editing prompts used by the continue project (https://github.com/continuedev/continue/tree/main/core/llm/templates). I've sourced all these primarily by looking at the original tokenizer config and failing that, looking through the official descriptions and/or their respective official Github discussions. I've concentrated on the original/official models (other than `phind-codellama`) as it's hard to find any concrete info on a lot of the "bootleg" fine-tuned models. The ones which are particularly effected are: - `codellama` missing the space before the response **severely** hurts the performance when presented with a large section code. There is a lot of 'cargo cult' prompt templates for `codellama` going around, but this one can be confirmed from their official release page and the tokenizer config. - `deepseek-llm` having the system message prepended to every message seems to increase the chance of responding in Chinese Unicode characters (Deepseek say specifically it wasn't trained to use a system message). - `deepseek-coder` quickly fills its context when discussing large sections of code and will start to repeat the system message back at you before completely descending into gibberish (this happens very quickly if using a detailed / long custom system message). `llama2` doesn't seem too effected by the missing the space before the response , but again this template can be confirmed from their official release page and the tokenizer config. `deepseek-llm`, `mixtral` and `mistral` absolutely should **NOT** have a space or newline before the response or they will often respond with gibberish and/or Chinese Unicode characters. The official `mixtral` huggingface page actually tells you a slightly wrong template format, but the original tokenizer config is the same as `mistral`. The suggestion for adding "**Response**" to `phind-codellama` is from the huggingface discussion, so can't confirm if this is true or not. **codellama:34b-instruct:** ``` TEMPLATE """<s>[INST] {{ if and .First .System }}<<SYS>> {{ .System }} <</SYS>> {{ end }}{{ .Prompt }} [/INST] {{ .Response }}""" ``` ---- **deepseek-coder:33b-instruct:** ``` TEMPLATE """{{ if and .First .System }}{{ .System }} {{ end }}### Instruction: {{ .Prompt }} ### Response: {{ .Response }}""" ``` ---- **deepseek-llm:67b-chat:** ``` TEMPLATE """User: {{ if and .First .System }}{{ .System }} {{ end }}{{ .Prompt }} Assistant:{{ .Response }}""" ``` ---- **llama2:70b-chat:** ``` TEMPLATE """<s>[INST] {{ if and .First .System }}<<SYS>> {{ .System }} <</SYS>> {{ end }}{{ .Prompt }} [/INST] {{ .Response }}""" ``` ---- **mixtral:8x7b-instruct-v0.1 & mistral:7b-instruct-v0.2:** ``` TEMPLATE """{{ if .First }}<s>{{ end }}[INST] {{ if and .First .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]{{ .Response }}""" ``` ---- **phind-codellama:34b-v2:** ``` TEMPLATE """{{ if and .First .System }}### System Prompt {{ .System }} {{ end }}### User Message {{ .Prompt }} ### Assistant Response {{ .Response }}""" ``` ---- **yi:34b-chat:** ``` TEMPLATE """{{ if and .First .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}<|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant {{ .Response }}""" ``` ---- These two aren't listed on https://ollama.ai but also use the same "ChatML" template as `yi`: **mpt:30B-chat:** ``` TEMPLATE """{{ if and .First .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}<|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant {{ .Response }}""" ``` ---- **qwen:72b-chat:** ``` TEMPLATE """{{ if and .First .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}<|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant {{ .Response }}""" ``` ---- Are there any other "non-bootleg" models I should look at? I might as well do them too if there are any.

GiteaMirror added the bug model labels 2026-04-12 10:52:46 -05:00

GiteaMirror closed this issue

2026-04-12 10:52:47 -05:00

GiteaMirror commented

2026-04-12 10:52:48 -05:00

@jukofyork commented on GitHub (Jan 13, 2024):

By the way if anybody else wants to learn more about the template syntax then this is the reference page:

https://pkg.go.dev/text/template

I was pretty confused to start with when I tried to grep the whole project and could find no reference to "if" or "and" anywhere!

@jukofyork commented on GitHub (Jan 13, 2024): By the way if anybody else wants to learn more about the template syntax then this is the reference page: https://pkg.go.dev/text/template I was pretty confused to start with when I tried to grep the whole project and could find no reference to "if" or "and" anywhere!

GiteaMirror commented

2026-04-12 10:52:48 -05:00

@scpedicini commented on GitHub (Jan 13, 2024):

I think being able to see how the final transformed input -> template -> output chain in the logs would help catch these kinds of issues - linking this enhancement feature:

https://github.com/jmorganca/ollama/issues/1533

@scpedicini commented on GitHub (Jan 13, 2024): I think being able to see how the final transformed input -> template -> output chain in the logs would help catch these kinds of issues - linking this enhancement feature: https://github.com/jmorganca/ollama/issues/1533

GiteaMirror commented

2026-04-12 10:52:48 -05:00

@jukofyork commented on GitHub (Jan 13, 2024):

I think a lot of the other models, even if concrete template formats can't be sourced, should probably have their templates changed to use the {{ if and .First .System }}...{{ .System }}...{{ end }} statement.

As it is the system message is often getting added to every message. This might sometimes be a good idea if you don't want to lose the system message, but by default it shouldn't be doing this and particular care should be taken as to where the system message is added if intentionally including it each time.

@jukofyork commented on GitHub (Jan 13, 2024): I think a lot of the other models, even if concrete template formats can't be sourced, should probably have their templates changed to use the `{{ if and .First .System }}...{{ .System }}...{{ end }}` statement. As it is the system message is often getting added to every message. This might sometimes be a good idea if you don't want to lose the system message, but by default it shouldn't be doing this and particular care should be taken as to where the system message is added if intentionally including it each time.

GiteaMirror commented

2026-04-12 10:52:49 -05:00

@jmorganca commented on GitHub (Jan 13, 2024):

Thank you so much for the work to go through all of the templates @jukofyork (both in the models on ollama.ai but also in their respective repos on HF and GitHub). Will get this fixed

@jmorganca commented on GitHub (Jan 13, 2024): Thank you so much for the work to go through all of the templates @jukofyork (both in the models on ollama.ai but also in their respective repos on HF and GitHub). Will get this fixed

GiteaMirror commented

2026-04-12 10:52:49 -05:00

@jukofyork commented on GitHub (Jan 13, 2024):

Thank you so much for the work to go through all of the templates @jukofyork (both in the models on ollama.ai but also in their respective repos on HF and GitHub). Will get this fixed

No problem and if there are any other original/official models you know of then I can try to find the correct prompt for them too.

I don't think it's really possible to find the prompt format for a lot of the fine-tuned models thought. Most seem to be training on a mix of several different/merged datasets and I don't think even the creators know the correct format sometimes.

@jukofyork commented on GitHub (Jan 13, 2024): > Thank you so much for the work to go through all of the templates @jukofyork (both in the models on ollama.ai but also in their respective repos on HF and GitHub). Will get this fixed No problem and if there are any other original/official models you know of then I can try to find the correct prompt for them too. I don't think it's really possible to find the prompt format for a lot of the fine-tuned models thought. Most seem to be training on a mix of several different/merged datasets and I don't think even the creators know the correct format sometimes.

GiteaMirror commented

2026-04-12 10:52:49 -05:00

@nathanpbell commented on GitHub (Jan 15, 2024):

I've noticed a couple other errors in the models available from the library:

mistral models have numCtx defaulting to 2048 instead of 4096 (actually 32568 is probably the correct value). I can't tell fully, but I think Ollama is truncating down to numCtx before loading the prompt into the model?
mistrallite's tokenizer appears broken. Mistrallite is a long context fine tune of Mistral from the Amazon team, and the prompt format is different than Mistral's and introduces 3 new tokens. When passing the prompt through api/generate, it doesn't appear like those new strings are being properly parsed into the new token values. Full disclosure: I'm new to this and I'm using Mistrallite through LangChain -> Ollama and so the bug may be somewhere between there, so forgive me if my hunch is wrong that this is a bug in the model uploaded to Ollama library.

@nathanpbell commented on GitHub (Jan 15, 2024): I've noticed a couple other errors in the models available from the library: 1. `mistral` models have numCtx defaulting to 2048 instead of 4096 (actually 32568 is probably the correct value). I can't tell fully, but I think Ollama is truncating down to numCtx before loading the prompt into the model? 2. `mistrallite`'s tokenizer appears broken. Mistrallite is a long context fine tune of Mistral from the Amazon team, and the prompt format is different than Mistral's and introduces 3 new tokens. When passing the prompt through api/generate, it doesn't appear like those new strings are being properly parsed into the new token values. Full disclosure: I'm new to this and I'm using Mistrallite through LangChain -> Ollama and so the bug may be somewhere between there, so forgive me if my hunch is wrong that this is a bug in the model uploaded to Ollama library.

GiteaMirror commented

2026-04-12 10:52:49 -05:00

@jukofyork commented on GitHub (Jan 15, 2024):

I've noticed a couple other errors in the models available from the library:

1. `mistral` models have numCtx defaulting to 2048 instead of 4096 (actually 32568 is probably the correct value). I can't tell fully, but I think Ollama is truncating down to numCtx before loading the prompt into the model?

Yeah, I'm still none the wiser what the Mistral and Mixtral models' context actually is. The official pages says they were both trained on 8k context. But then other info says it's 32k.Then yet more info says Mistral uses a sliding window and is really just 8k (or even 4k) and Mixtral was trained to use 32k straight off and the sliding window for it was a bug on release.

@jukofyork commented on GitHub (Jan 15, 2024): > I've noticed a couple other errors in the models available from the library: > > 1. `mistral` models have numCtx defaulting to 2048 instead of 4096 (actually 32568 is probably the correct value). I can't tell fully, but I think Ollama is truncating down to numCtx before loading the prompt into the model? Yeah, I'm still none the wiser what the Mistral and Mixtral models' context actually is. The official pages says they were both trained on 8k context. But then other info says it's 32k.Then yet more info says Mistral uses a sliding window and is really just 8k (or even 4k) and Mixtral was trained to use 32k straight off and the sliding window for it was a bug on release.

GiteaMirror commented

2026-04-12 10:52:50 -05:00

@nathanpbell commented on GitHub (Jan 15, 2024):

I believe the right value is 32K. The sliding window is 4K which effects performance of prompts that are outside that window, but as far as I can tell, we shouldn't be truncating anything less than 32K before passing it to the model. But that's my novice understanding.

Anecdotally, I've tested the model's ability to recall text in long contexts using the default settings in "ollama pull mistral" and it can't remember anything past 2K. When I modify the call to use an 8K context window it is able to recall tokens outside of the 2K window that seems to be the ollama default.

I think the fix is that the Modelfile for mistral and it's variants should specify a num_ctx of 32K

@nathanpbell commented on GitHub (Jan 15, 2024): I believe the right value is 32K. The sliding window is 4K which effects performance of prompts that are outside that window, but as far as I can tell, we shouldn't be truncating anything less than 32K before passing it to the model. But that's my novice understanding. Anecdotally, I've tested the model's ability to recall text in long contexts using the default settings in "ollama pull mistral" and it can't remember anything past 2K. When I modify the call to use an 8K context window it is able to recall tokens outside of the 2K window that seems to be the ollama default. I think the fix is that the Modelfile for mistral and it's variants should specify a num_ctx of 32K

GiteaMirror commented

2026-04-12 10:52:50 -05:00

@jukofyork commented on GitHub (Jan 15, 2024):

I believe the right value is 32K. The sliding window is 4K which effects performance of prompts that are outside that window, but as far as I can tell, we shouldn't be truncating anything less than 32K before passing it to the model. But that's my novice understanding.

Is this for Mistral or Mixtral? I only ask because a lot on the SillyTaven reddit report that Mistral runs into problems around 8k context (or possibly even 6.5k IIRC?).

@jukofyork commented on GitHub (Jan 15, 2024): > I believe the right value is 32K. The sliding window is 4K which effects performance of prompts that are outside that window, but as far as I can tell, we shouldn't be truncating anything less than 32K before passing it to the model. But that's my novice understanding. Is this for `Mistral` or `Mixtral`? I only ask because a lot on the SillyTaven reddit report that `Mistral` runs into problems around 8k context (or possibly even 6.5k IIRC?).

GiteaMirror commented

2026-04-12 10:52:51 -05:00

@nathanpbell commented on GitHub (Jan 15, 2024):

The original Mistral (7B and it's variants including instruct-v0.1, v0.2, etc.).

The way the sliding window works - you'll see degradation after the 4K sliding window (so it's best performance is in the 4k), but that performance should trail off the longer the context (in increments of 4K) all the way to 32K where it will stop "remembering" anything beyond that.

My experience with Mistral in Ollama using the default Modelfile is that rather than the gradual performance degradation you'd expect after 4k, it actually is only sending 2K of tokens and has a steep cliff drop off in performance (it can't remember anything after 2k). Passing in a num_ctx > 2K at runtime fixes that.

I propose that should be the default in the Modelfile, but I don't think the Ollama model library is in a github repo anywhere that we can generate pull requests. Please correct me if I'm wrong.

@nathanpbell commented on GitHub (Jan 15, 2024): The original Mistral (7B and it's variants including instruct-v0.1, v0.2, etc.). The way the sliding window works - you'll see degradation after the 4K sliding window (so it's best performance is in the 4k), but that performance should trail off the longer the context (in increments of 4K) all the way to 32K where it will stop "remembering" anything beyond that. My experience with Mistral in Ollama using the default Modelfile is that rather than the gradual performance degradation you'd expect after 4k, it actually is only sending 2K of tokens and has a steep cliff drop off in performance (it can't remember anything after 2k). Passing in a num_ctx > 2K at runtime fixes that. I propose that should be the default in the Modelfile, but I don't think the Ollama model library is in a github repo anywhere that we can generate pull requests. Please correct me if I'm wrong.

GiteaMirror commented

2026-04-12 10:52:51 -05:00

@jukofyork commented on GitHub (Jan 15, 2024):

Ah, thanks. I'm actually just running everything but the coding models at 4k context for now as the num_batch bug makes it too fidly to find the right value.

@jukofyork commented on GitHub (Jan 15, 2024): Ah, thanks. I'm actually just running everything but the coding models at 4k context for now as the `num_batch` bug makes it too fidly to find the right value.

GiteaMirror commented

2026-04-12 10:52:51 -05:00

@nathanpbell commented on GitHub (Jan 15, 2024):

I should add one other thing, it sounds like Mistral's sliding window attention (SWA) is not actually implemented in llama.cpp (which Ollama uses) and so almost assuredly doesn't work the way described in their paper. But it does "work" in that it can generate coherent responses.

Llama.cpp discussion: https://github.com/ggerganov/llama.cpp/issues/3867#issuecomment-1787815958

@nathanpbell commented on GitHub (Jan 15, 2024): I should add one other thing, it sounds like Mistral's sliding window attention (SWA) is not actually implemented in llama.cpp (which Ollama uses) and so almost assuredly doesn't work the way described in their paper. But it does "work" in that it can generate coherent responses. Llama.cpp discussion: https://github.com/ggerganov/llama.cpp/issues/3867#issuecomment-1787815958

GiteaMirror commented

2026-04-12 10:52:52 -05:00

@cognitivetech commented on GitHub (Feb 7, 2024):

in fact, according to the mistral paper its trained on 8k context

Parameter	Value
dim	4096
n_layers	32
head_dim	128
hidden_dim	14336
n_heads	32
n_kv_heads	8
window_size	4096
context_len	8192
vocab_size	32000

the 32k context was a misinterpretation from the beginning.. see more info on this discussion
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/discussions/43

@cognitivetech commented on GitHub (Feb 7, 2024): in fact, according to the mistral paper its [trained on 8k context](https://arxiv.org/pdf/2310.06825.pdf) | Parameter | Value | | -- | -- | | dim | 4096 | | n_layers | 32 | | head_dim | 128 | | hidden_dim | 14336 | | n_heads | 32 | | n_kv_heads | 8 | | window_size | 4096 | | context_len | 8192 | | vocab_size | 32000 | the 32k context was a misinterpretation from the beginning.. see more info on this discussion https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/discussions/43

GiteaMirror commented

2026-04-12 10:52:52 -05:00

@jukofyork commented on GitHub (Feb 13, 2024):

I spent all afternoon running different experiments and am actually shocked at how much finding the proper prompt has improved all 3 models:

It's made Mistral about as good as the other 2 were before, and the other 2 are now MUCH better; with all the weirdness (ie: where they claimed to make changes to code when they didn't etc) gone now.

I've marked the spaces with '■' so they stand out, but you will need to change them. Also remember if you aren't using Ollama or llama.cpp you might need to add back the <s> prefix:

Mistral and Miqu:

TEMPLATE """{{ if and .First .System }}[INST]■{{ .System }}

Please await further instructions and simply respond with 'Understood'.■[/INST]
Understood</s>■
{{ end }}[INST]■{{ .Prompt }}■[/INST]
{{ .Response }}"""

This agrees with the example on the Mistral page:

text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

Mixtral:

TEMPLATE """{{ if and .First .System }}■[INST]■{{ .System }}

Please await further instructions and simply respond with 'Understood'.■[/INST]■
Understood</s>
{{ end }}■[INST]■{{ .Prompt }}■[/INST]■
{{ .Response }}"""

This sort of agrees with the example on the Mixtral page:

<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]

https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

But it seems using the newlines before the response like the Mistral example is essential.

@jukofyork commented on GitHub (Feb 13, 2024): I spent all afternoon running different experiments and am actually shocked at how much finding the proper prompt has improved all 3 models: It's made *Mistral* about as good as the other 2 were before, and the other 2 are now **MUCH** better; with all the weirdness (ie: where they claimed to make changes to code when they didn't etc) gone now. I've marked the spaces with '■' so they stand out, but you will need to change them. Also remember if you aren't using Ollama or llama.cpp you might need to add back the `<s>` prefix: --- `Mistral` and `Miqu`: ``` TEMPLATE """{{ if and .First .System }}[INST]■{{ .System }} Please await further instructions and simply respond with 'Understood'.■[/INST] Understood</s>■ {{ end }}[INST]■{{ .Prompt }}■[/INST] {{ .Response }}""" ``` This agrees with the example on the Mistral page: ``` text = "<s>[INST] What is your favourite condiment? [/INST]" "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> " "[INST] Do you have mayonnaise recipes? [/INST]" ``` https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 --- `Mixtral`: ``` TEMPLATE """{{ if and .First .System }}■[INST]■{{ .System }} Please await further instructions and simply respond with 'Understood'.■[/INST]■ Understood</s> {{ end }}■[INST]■{{ .Prompt }}■[/INST]■ {{ .Response }}""" ``` This sort of agrees with the example on the Mixtral page: ``` <s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST] ``` https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 But it seems using the newlines before the response like the Mistral example is essential.

GiteaMirror commented

2026-04-12 10:52:52 -05:00

@jukofyork commented on GitHub (Feb 15, 2024):

I actually got both miqu and phind-codellama to give up their real training prompts. Explanation here:

https://huggingface.co/miqudev/miqu-1-70b/discussions/25

TEMPLATE """{{ if and .First .System }}{{ .System }}

{{ end }}[INST] {{ .Prompt }}
[/INST]{{ .Response }}"""

https://huggingface.co/Phind/Phind-CodeLlama-34B-v2/discussions/31

TEMPLATE """{{ if and .First .System }}{{ .System }}

{{ end }}### Instruction:
{{ .Prompt }}

### Response:
{{ .Response }}"""

miqu is MUCH better with the correct prompt; like unbelievably better!!! 😱

@jukofyork commented on GitHub (Feb 15, 2024): I actually got both `miqu` and `phind-codellama` to give up their real training prompts. Explanation here: https://huggingface.co/miqudev/miqu-1-70b/discussions/25 ``` TEMPLATE """{{ if and .First .System }}{{ .System }} {{ end }}[INST] {{ .Prompt }} [/INST]{{ .Response }}""" ``` https://huggingface.co/Phind/Phind-CodeLlama-34B-v2/discussions/31 ``` TEMPLATE """{{ if and .First .System }}{{ .System }} {{ end }}### Instruction: {{ .Prompt }} ### Response: {{ .Response }}""" ``` `miqu` is ***MUCH*** better with the correct prompt; like unbelievably better!!! :scream:

GiteaMirror commented

2026-04-12 10:52:52 -05:00

@cognitivetech commented on GitHub (Feb 22, 2024):

may as well thow my two cents in the mix.. I have tested a lot of things, but this works really well for mistral models:

TEMPLATE """
{{ if .First  }}<s>{{ if .System  }}[INST]{{ .System }}[/INST]{{ end }}</s>{{ end }}[INST] {{ .Prompt }} [/INST]
"""
PARAMETER num_ctx 8000
PARAMETER num_gpu -1
PARAMETER num_predict 4000

Unless you have special personality, don't use a system prompt, it works better.

Even if you don't have few-shot prompt or chat history, still include the <s></s>

@cognitivetech commented on GitHub (Feb 22, 2024): may as well thow my two cents in the mix.. I have tested a lot of things, but this works really well for mistral models: ``` TEMPLATE """ {{ if .First }}<s>{{ if .System }}[INST]{{ .System }}[/INST]{{ end }}</s>{{ end }}[INST] {{ .Prompt }} [/INST] """ PARAMETER num_ctx 8000 PARAMETER num_gpu -1 PARAMETER num_predict 4000 ``` Unless you have special personality, don't use a system prompt, it works better. Even if you don't have few-shot prompt or chat history, still include the `<s></s>`

GiteaMirror commented

2026-04-12 10:52:53 -05:00

@jukofyork commented on GitHub (Apr 16, 2024):

wizardlm2

{{ if .System }}{{ .System }} {{ end }}{{ if .Prompt }}USER: {{ .Prompt }} {{ end }}ASSISTANT: {{ .Response }}

Pretty sure this shouldn't have that extra space between ASSISTANT: and {{ .Response }}.

command-r-plus

{{ if .System }}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{ .System }}<|END_OF_TURN_TOKEN|>{{ end }}{{ if .Prompt }}<|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{ .Prompt }}<|END_OF_TURN_TOKEN|>{{ end }}<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{ .Response }}<|END_OF_TURN_TOKEN|>

Shouldn't have the <|END_OF_TURN_TOKEN|> part after the response as it's in the GGUF file as an EOS token already.

command-r has the same problem in its template too (as was pointed out in another thread linked above).

@jukofyork commented on GitHub (Apr 16, 2024): [wizardlm2](https://ollama.com/library/wizardlm2) ``` {{ if .System }}{{ .System }} {{ end }}{{ if .Prompt }}USER: {{ .Prompt }} {{ end }}ASSISTANT: {{ .Response }} ``` Pretty sure this shouldn't have that extra space between `ASSISTANT:` and `{{ .Response }}`. [command-r-plus](https://ollama.com/library/command-r-plus) ``` {{ if .System }}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{ .System }}<|END_OF_TURN_TOKEN|>{{ end }}{{ if .Prompt }}<|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{ .Prompt }}<|END_OF_TURN_TOKEN|>{{ end }}<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{ .Response }}<|END_OF_TURN_TOKEN|> ``` Shouldn't have the `<|END_OF_TURN_TOKEN|>` part after the response as it's in the GGUF file as an `EOS` token already. [command-r](https://ollama.com/library/command-r) has the same problem in its template too (as was pointed out in another thread linked above).

GiteaMirror commented

2026-04-12 10:52:53 -05:00

@jmorganca commented on GitHub (Jan 14, 2025):

This is now be possible (e.g. https://ollama.com/library/llama3.3/blobs/948af2743fc7). @jukofyork noted, thanks for spotting these. If you find any more don't hesitate to open an issue!

@jmorganca commented on GitHub (Jan 14, 2025): This is now be possible (e.g. https://ollama.com/library/llama3.3/blobs/948af2743fc7). @jukofyork noted, thanks for spotting these. If you find any more don't hesitate to open an issue!

GiteaMirror referenced this issue

2026-04-22 02:32:43 -05:00

[GH-ISSUE #1139] Error while loading Nous-Capybara-34B #26333

GiteaMirror referenced this issue

2026-04-28 03:01:25 -05:00

[GH-ISSUE #1139] Error while loading Nous-Capybara-34B #47084

GiteaMirror referenced this issue

2026-05-03 09:46:14 -05:00

[GH-ISSUE #1139] Error while loading Nous-Capybara-34B #62609

GiteaMirror referenced this issue

2026-05-08 22:25:26 -05:00

[GH-ISSUE #1139] Error while loading Nous-Capybara-34B #78249

Sign in to join this conversation.

Branches Tags

main

hoyyeva/fix-claude-channels-env

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#1139