[GH-ISSUE #3848] phi3 doesn't seem to accept SYSTEM prompt #2384

New Issue

GiteaMirror · 2026-04-12T12:42:04-05:00

GiteaMirror commented

2026-04-12 12:42:04 -05:00

Originally created by @rb81 on GitHub (Apr 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3848

What is the issue?

Regardless of what's in the modelfile, it seems phi3 doesn't take in the SYSTEM prompt at all. I've looked around and can't find anyone else discussing this. Assuming this is a bug of some kind.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.1.32

Originally created by @rb81 on GitHub (Apr 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3848 ### What is the issue? Regardless of what's in the modelfile, it seems phi3 doesn't take in the SYSTEM prompt at all. I've looked around and can't find anyone else discussing this. Assuming this is a bug of some kind. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.1.32

GiteaMirror added the bug label 2026-04-12 12:42:04 -05:00

GiteaMirror closed this issue

2026-04-12 12:42:04 -05:00

GiteaMirror commented

2026-04-12 12:42:05 -05:00

@thinkverse commented on GitHub (Apr 23, 2024):

Took a cursory glance at the technical report¹ and I'm new to AI in general but it looks to me like Phi3 wasn't trained with a system prompt in mind, the template in the report shows only user and assistant.

<|user|>\n Question <|end|>\n <|assistant|>

The Modelfile² sample provided by Microsoft also doesn't include a system prompt. Perhaps @jmorganca or someone else from the Ollama team can shed some light on it. They know more than I do. 👍

@thinkverse commented on GitHub (Apr 23, 2024): Took a cursory glance at the technical report[^1] and I'm new to AI in general but it looks to me like Phi3 wasn't trained with a system prompt in mind, the template in the report shows only user and assistant. ``` <|user|>\n Question <|end|>\n <|assistant|> ``` The Modelfile[^2] sample provided by Microsoft also doesn't include a system prompt. Perhaps @jmorganca or someone else from the Ollama team can shed some light on it. They know more than I do. 👍 [^1]: https://arxiv.org/html/2404.14219v1#S2 [^2]: https://github.com/Azure-Samples/Phi-3MiniSamples/blob/main/ollama/Modelfile

GiteaMirror commented

2026-04-12 12:42:05 -05:00

@thinkverse commented on GitHub (Apr 23, 2024):

Circling back to this after some testing. From my testing, I get that it kinda supports it but not really. The results are not consistent at all. I've tried different system prompts and some it kinds of adhered to and others ignore completely.

The best result I had was Ollama's Mario example where Phi3 was not acting as Mario, like for instance, Llama3 does. But instead acted as an assistant answering questions about the Mushroom Kingdom.

The template on the HuggingFace READMEs shows <|system|>, but not on the GGUF and ONNX variants. And the sample Modelfile on HuggingFace¹ again doesn't have <|system|>, so maybe it was trained with it for a bit and then they decided to scrap it?

If I was going to use Phi3 I wouldn't bother with a system prompt given these results.

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/blob/main/Modelfile_q4 ↩︎

@thinkverse commented on GitHub (Apr 23, 2024): Circling back to this after some testing. From my testing, I get that it kinda supports it but not really. The results are not consistent at all. I've tried different system prompts and some it kinds of adhered to and others ignore completely. The best result I had was Ollama's Mario example where Phi3 was not acting as Mario, like for instance, Llama3 does. But instead acted as an assistant answering questions about the Mushroom Kingdom. The template on the HuggingFace READMEs shows `<|system|>`, but not on the `GGUF` and `ONNX` variants. And the sample Modelfile on HuggingFace[^1] again doesn't have `<|system|>`, so maybe it was trained with it for a bit and then they decided to scrap it? If I was going to use Phi3 I wouldn't bother with a system prompt given these results. [^1]: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/blob/main/Modelfile_q4

GiteaMirror commented

2026-04-12 12:42:06 -05:00

@rb81 commented on GitHub (Apr 24, 2024):

@thinkverse - Thanks for your comments. Perhaps they did scrap the system prompt to keep the model's training simpler..? In any case, feeding in a system prompt as the user is good enough for a model this small, I guess.

@rb81 commented on GitHub (Apr 24, 2024): @thinkverse - Thanks for your comments. Perhaps they did scrap the system prompt to keep the model's training simpler..? In any case, feeding in a system prompt as the user is good enough for a model this small, I guess.

GiteaMirror commented

2026-04-12 12:42:06 -05:00

@jmorganca commented on GitHub (Apr 24, 2024):

@rb81 Phi 3 has been updated with a system prompt as per their published tokenizer configuration.

% ollama run phi3
>>> /set system You are a helpful assistant that always answers in french.
Set system message.
>>> Hi!
Bonjour ! En tant qu'assistant, je suis toujours prêt à répondre et à vous assister dans votre langue maternelle française. 
Comment puis-je vous aider aujourd'hui ?

@jmorganca commented on GitHub (Apr 24, 2024): @rb81 Phi 3 has been updated with a system prompt as per their published tokenizer configuration. ``` % ollama run phi3 >>> /set system You are a helpful assistant that always answers in french. Set system message. >>> Hi! Bonjour ! En tant qu'assistant, je suis toujours prêt à répondre et à vous assister dans votre langue maternelle française. Comment puis-je vous aider aujourd'hui ? ```

GiteaMirror commented

2026-04-12 12:42:07 -05:00

@thinkverse commented on GitHub (Apr 24, 2024):

So basically the system prompts I was testing with were just bad. 😂 Time to learn how to write better prompts. 👍

@thinkverse commented on GitHub (Apr 24, 2024): So basically the system prompts I was testing with were just bad. 😂 Time to learn how to write better prompts. 👍

GiteaMirror commented

2026-04-12 12:42:07 -05:00

@lostmygithubaccount commented on GitHub (Apr 24, 2024):

from some limited testing yesterday, phi3 doesn't seem great at following system prompts/instructions. this was also an issue w/ phi2 that made it hard to use for anything 🤷

@lostmygithubaccount commented on GitHub (Apr 24, 2024): from some limited testing yesterday, phi3 doesn't seem great at following system prompts/instructions. this was also an issue w/ phi2 that made it hard to use for anything 🤷

GiteaMirror commented

2026-04-12 12:42:08 -05:00

@rb81 commented on GitHub (Apr 24, 2024):

Phi 3 has been updated with a system prompt as per their published tokenizer configuration.

% ollama run phi3
>>> /set system You are a helpful assistant that always answers in french.
Set system message.
>>> Hi!
Bonjour ! En tant qu'assistant, je suis toujours prêt à répondre et à vous assister dans votre langue maternelle française. 
Comment puis-je vous aider aujourd'hui ?

So, it seems setting the system prompt in runtime as you did works, but setting it in the modelfile doesn't. Have you had the same experience?

Edit: It seems phi3 decides which system prompts to follow 😂 More elaborate prompts don't work, but something simple like "Only respond in French." works just fine. Weird.

@rb81 commented on GitHub (Apr 24, 2024): > Phi 3 has been updated with a system prompt as per their published tokenizer configuration. > > ``` > % ollama run phi3 > >>> /set system You are a helpful assistant that always answers in french. > Set system message. > >>> Hi! > Bonjour ! En tant qu'assistant, je suis toujours prêt à répondre et à vous assister dans votre langue maternelle française. > Comment puis-je vous aider aujourd'hui ? > ``` So, it seems setting the system prompt in runtime as you did works, but setting it in the modelfile doesn't. Have you had the same experience? **Edit:** It seems phi3 decides which system prompts to follow 😂 More elaborate prompts don't work, but something simple like "Only respond in French." works just fine. Weird.

GiteaMirror commented

2026-04-12 12:42:08 -05:00

@jxtt01 commented on GitHub (Apr 29, 2024):

Does anyone involved in this issue understand how Phi-3-Mini's system prompt is handled under the hood of ollama? I'd like to replicate it in my own code, but I've run into the same issue that thinkverse spoke with regards to earlier (quoted below for clarity).

Took a cursory glance at the technical report1 and I'm new to AI in general but it looks to me like Phi3 wasn't trained with a system prompt in mind, the template in the report shows only user and assistant.
<|user|>\n Question <|end|>\n <|assistant|>
The Modelfile2 sample provided by Microsoft also doesn't include a system prompt. Perhaps @jmorganca or someone else from the Ollama team can shed some light on it. They know more than I do. 👍

Footnotes
1. [arxiv.org/html/2404.14219v1#S2](https://arxiv.org/html/2404.14219v1#S2) [↩](#user-content-fnref-1-07bb2b45304e3802ea204f5642c73b0b)

2. [Azure-Samples/Phi-3MiniSamples@`main`/ollama/Modelfile](https://github.com/Azure-Samples/Phi-3MiniSamples/blob/main/ollama/Modelfile?rgh-link-date=2024-04-23T20%3A15%3A02Z) [↩](#user-content-fnref-2-07bb2b45304e3802ea204f5642c73b0b)

@jxtt01 commented on GitHub (Apr 29, 2024): Does anyone involved in this issue understand how Phi-3-Mini's system prompt is handled under the hood of ollama? I'd like to replicate it in my own code, but I've run into the same issue that thinkverse spoke with regards to earlier (quoted below for clarity). > Took a cursory glance at the technical report[1](#user-content-fn-1-07bb2b45304e3802ea204f5642c73b0b) and I'm new to AI in general but it looks to me like Phi3 wasn't trained with a system prompt in mind, the template in the report shows only user and assistant. > > ``` > <|user|>\n Question <|end|>\n <|assistant|> > ``` > > The Modelfile[2](#user-content-fn-2-07bb2b45304e3802ea204f5642c73b0b) sample provided by Microsoft also doesn't include a system prompt. Perhaps @jmorganca or someone else from the Ollama team can shed some light on it. They know more than I do. 👍 > ## Footnotes > > 1. [arxiv.org/html/2404.14219v1#S2](https://arxiv.org/html/2404.14219v1#S2) [↩](#user-content-fnref-1-07bb2b45304e3802ea204f5642c73b0b) > > 2. [Azure-Samples/Phi-3MiniSamples@`main`/ollama/Modelfile](https://github.com/Azure-Samples/Phi-3MiniSamples/blob/main/ollama/Modelfile?rgh-link-date=2024-04-23T20%3A15%3A02Z) [↩](#user-content-fnref-2-07bb2b45304e3802ea204f5642c73b0b)

GiteaMirror commented

2026-04-12 12:42:09 -05:00

@wuriyanto48 commented on GitHub (May 25, 2024):

i just noticed, there is no <|system|> token in thephi3's chat_template config. https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/tokenizer_config.json

adding this config to the tokenizer, it works

tokenizer.use_default_system_prompt = True
tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + '<|end|>' }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"

I took chat_template config from https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json

@wuriyanto48 commented on GitHub (May 25, 2024): i just noticed, there is no `<|system|>` token in the`phi3's chat_template` config. https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/tokenizer_config.json adding this config to the tokenizer, it works ```python tokenizer.use_default_system_prompt = True tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + '<|end|>' }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}" ``` I took chat_template config from https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json

GiteaMirror commented

2026-04-12 12:42:10 -05:00

@spike-xiong commented on GitHub (May 31, 2024):

it seems like Phi3 doesn't accept the system role, however, they said Phi3 has 3 roles in their cookbook:
https://github.com/microsoft/Phi-3CookBook/blob/main/md/02.QuickStart/Huggingface_QuickStart.md

@spike-xiong commented on GitHub (May 31, 2024): it seems like Phi3 doesn't accept the system role, however, they said Phi3 has 3 roles in their cookbook: https://github.com/microsoft/Phi-3CookBook/blob/main/md/02.QuickStart/Huggingface_QuickStart.md

GiteaMirror commented

2026-04-12 12:42:10 -05:00

@spike-xiong commented on GitHub (May 31, 2024):

i just noticed, there is no <|system|> token in thephi3's chat_template config. https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/tokenizer_config.json

adding this config to the tokenizer, it works
tokenizer.use_default_system_prompt = True
tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + '<|end|>' }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"
I took chat_template config from https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json

great, thank you! And you don't even need this line tokenizer.use_default_system_prompt = True

@spike-xiong commented on GitHub (May 31, 2024): > i just noticed, there is no `<|system|>` token in the`phi3's chat_template` config. https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/tokenizer_config.json > > adding this config to the tokenizer, it works > > ```python > tokenizer.use_default_system_prompt = True > tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + '<|end|>' }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}" > ``` > > I took chat_template config from https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json great, thank you! And you don't even need this line `tokenizer.use_default_system_prompt = True`

GiteaMirror commented

2026-04-12 12:42:10 -05:00

@wuriyanto48 commented on GitHub (May 31, 2024):

Thank you @CaptXiong , for pointing it out

@wuriyanto48 commented on GitHub (May 31, 2024): Thank you @CaptXiong , for pointing it out

GiteaMirror commented

2026-04-12 12:42:11 -05:00

@rb81 commented on GitHub (May 31, 2024):

@wuriyanto48 @CaptXiong - I'm assuming what you guys figured out requires modifying the source code and recompiling, correct? If so, can we get the project team to look at this?

@rb81 commented on GitHub (May 31, 2024): @wuriyanto48 @CaptXiong - I'm assuming what you guys figured out requires modifying the source code and recompiling, correct? If so, can we get the project team to look at this?

GiteaMirror commented

2026-04-12 12:42:11 -05:00

@wuriyanto48 commented on GitHub (May 31, 2024):

@rb81 You don't need to recompile the source code, just like this

model_id = "microsoft/Phi-3-mini-4k-instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {
        "role": "system",
        "content": "You are GoodBot, You are a helpful assistant.",
    },
    {
        "role": "user",
        "content": "who are you?"
    }

 ]

tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + '<|end|>' }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=device,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)

hope this help

@wuriyanto48 commented on GitHub (May 31, 2024): @rb81 You don't need to recompile the source code, just like this ```python model_id = "microsoft/Phi-3-mini-4k-instruct" model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype="auto", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained(model_id) messages = [ { "role": "system", "content": "You are GoodBot, You are a helpful assistant.", }, { "role": "user", "content": "who are you?" } ] tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + '<|end|>' }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}" pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, device=device, ) generation_args = { "max_new_tokens": 500, "return_full_text": False, "temperature": 0.0, "do_sample": False, } output = pipe(messages, **generation_args) ``` hope this help

GiteaMirror commented

2026-04-12 12:42:12 -05:00

@rb81 commented on GitHub (May 31, 2024):

@rb81 You don't need to recompile the source code, just like this

model_id = "microsoft/Phi-3-mini-4k-instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {
        "role": "system",
        "content": "You are GoodBot, You are a helpful assistant.",
    },
    {
        "role": "user",
        "content": "who are you?"
    }

 ]

tokenizer.use_default_system_prompt = True
tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + '<|end|>' }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=device,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)

hope this help

Thanks @wuriyanto48, much appreciated!

@rb81 commented on GitHub (May 31, 2024): > @rb81 You don't need to recompile the source code, just like this > > ```python > model_id = "microsoft/Phi-3-mini-4k-instruct" > model = AutoModelForCausalLM.from_pretrained( > model_id, > torch_dtype="auto", > trust_remote_code=True, > ) > > tokenizer = AutoTokenizer.from_pretrained(model_id) > > messages = [ > { > "role": "system", > "content": "You are GoodBot, You are a helpful assistant.", > }, > { > "role": "user", > "content": "who are you?" > } > > ] > > tokenizer.use_default_system_prompt = True > tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + '<|end|>' }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}" > > pipe = pipeline( > "text-generation", > model=model, > tokenizer=tokenizer, > device=device, > ) > > generation_args = { > "max_new_tokens": 500, > "return_full_text": False, > "temperature": 0.0, > "do_sample": False, > } > > output = pipe(messages, **generation_args) > ``` > > hope this help Thanks @wuriyanto48, much appreciated!

GiteaMirror commented

2026-04-12 12:42:12 -05:00

@jmorganca commented on GitHub (Jul 5, 2024):

Hey folks, will close this for now, but let me know if you're still seeing issues.

@jmorganca commented on GitHub (Jul 5, 2024): Hey folks, will close this for now, but let me know if you're still seeing issues.

GiteaMirror commented

2026-04-12 12:42:13 -05:00

@rcastberg commented on GitHub (Aug 21, 2024):

Using python code on the Phi models I can get the phy models to follow a system instruction and only return what I instruct it to.
This does not work when the model is loaded into Ollama: See the examples below where I request that it only returns yes, no or I don't know.

I have attempted to include this in the user message but it doesn't seem to listen to that either. Is there anyway to get phi to listen to system messages in ollama.

Example in Ollama:

Same behaviour whatever phi model I choose.
`$ ollama run phi3:3.8b-instruct
>>> /set system As a knowledge responder, your task is to provide a direct answer to a factual question. Your response options are limited to "Yes", "No", or "I dont know"
Set system message.

>>> /show system
As a knowledge responder, your task is to provide a direct answer to a factual question. Your response options are limited to "Yes", "No", or "I dont know"

>>> Does a cheetah have stripes?
No. A cheetah has spots instead of stripes, which help with camounerage in their natural habitat. The black tear marks
under the eyes may also aid them to see better at high speeds by counteracting sun glare off dusty surfaces. While they
are similar looking animals called leopards and jaguars that have striking orange or spotted coats, cheetahs do not have
stripes on their fur.
`

Example Test Script in python :

`import torch
from transformers import pipeline

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "microsoft/Phi-3-mini-4k-instruct"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
{
"role": "system",
"content": """As a knowledge responder, your task is to provide a direct answer to a factual question. Your response options are limited to "Yes", "No", or "I dont know"""
},
{
"role": "user",
"content": "Does a cheetah have stripes?",
}

]

tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + '<|end|>' }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"

pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device="cuda",
)

generation_args = {
"max_new_tokens": 500,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}

output = pipe(messages, **generation_args)
print(output)
`

Result: [{'generated_text': ' No'}]

@rcastberg commented on GitHub (Aug 21, 2024): Using python code on the Phi models I can get the phy models to follow a system instruction and only return what I instruct it to. This does not work when the model is loaded into Ollama: See the examples below where I request that it only returns yes, no or I don't know. I have attempted to include this in the user message but it doesn't seem to listen to that either. Is there anyway to get phi to listen to system messages in ollama. # Example in Ollama: Same behaviour whatever phi model I choose. `$ ollama run phi3:3.8b-instruct \>\>\> /set system As a knowledge responder, your task is to provide a direct answer to a factual question. Your response options are limited to "Yes", "No", or "I dont know" Set system message. \>\>\> /show system As a knowledge responder, your task is to provide a direct answer to a factual question. Your response options are limited to "Yes", "No", or "I dont know" \>\>\> Does a cheetah have stripes? No. A cheetah has spots instead of stripes, which help with camounerage in their natural habitat. The black tear marks under the eyes may also aid them to see better at high speeds by counteracting sun glare off dusty surfaces. While they are similar looking animals called leopards and jaguars that have striking orange or spotted coats, cheetahs do not have stripes on their fur. ` # Example Test Script in python : `import torch from transformers import pipeline from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "microsoft/Phi-3-mini-4k-instruct" model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype="auto", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained(model_id) messages = [ { "role": "system", "content": """As a knowledge responder, your task is to provide a direct answer to a factual question. Your response options are limited to "Yes", "No", or "I dont know""" }, { "role": "user", "content": "Does a cheetah have stripes?", } ] tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + '<|end|>' }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + '<|end|>' }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}" pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, device="cuda", ) generation_args = { "max_new_tokens": 500, "return_full_text": False, "temperature": 0.0, "do_sample": False, } output = pipe(messages, **generation_args) print(output) ` **Result:** [{'generated_text': ' No'}]

GiteaMirror commented

2026-04-12 12:42:13 -05:00

@rick-github commented on GitHub (Aug 25, 2024):

phi3:3.8b-instruct is the q4_0 quantized version of the model. If you use the fp16 version, it follows instructions better.

$ ollama run phi3:3.8b-mini-128k-instruct-fp16
>>> /set system As a knowledge responder, your task is to provide a direct answer to a factual question. Your response options are limited to "Yes", "No", or "I dont know"
Set system message.
>>> Does a cheetah have stripes?
No

@rick-github commented on GitHub (Aug 25, 2024): phi3:3.8b-instruct is the q4_0 quantized version of the model. If you use the fp16 version, it follows instructions better. ``` $ ollama run phi3:3.8b-mini-128k-instruct-fp16 >>> /set system As a knowledge responder, your task is to provide a direct answer to a factual question. Your response options are limited to "Yes", "No", or "I dont know" Set system message. >>> Does a cheetah have stripes? No ```

GiteaMirror referenced this issue

2026-04-22 04:07:22 -05:00

[GH-ISSUE #2384] [Bug]: Not able to make call to Ollama multimodal model in the cookbook #27145

GiteaMirror referenced this issue

2026-04-28 05:43:02 -05:00

[GH-ISSUE #2384] [Bug]: Not able to make call to Ollama multimodal model in the cookbook #47897

GiteaMirror referenced this issue

2026-05-03 13:26:56 -05:00

[GH-ISSUE #2384] [Bug]: Not able to make call to Ollama multimodal model in the cookbook #63423

GiteaMirror referenced this issue

2026-05-09 03:54:19 -05:00

[GH-ISSUE #2384] [Bug]: Not able to make call to Ollama multimodal model in the cookbook #79063

Sign in to join this conversation.

Branches Tags

main

parth-remove-ollama-agent-command

parth-agent-harness-skills-synthetic-tool

hoyyeva/fix-anthropic-text-before-thinking

parth-agent-cli-markdown-rendering

mxyng/docs-cloud

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#2384