[GH-ISSUE #6322] Why role must be "system" or "user" or "assistant"? How can I add a custom role like "tool"? #3969

Closed
opened 2026-04-12 14:50:33 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @zhangsheng377 on GitHub (Aug 12, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6322

15c2d8fe14/parser/parser.go (L294)

Originally created by @zhangsheng377 on GitHub (Aug 12, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6322 https://github.com/ollama/ollama/blob/15c2d8fe149ba2b58aadbab615a6955f8821c7a9/parser/parser.go#L294
Author
Owner

@Cephra commented on GitHub (Aug 12, 2024):

You might want to take a look at these:

To see how tool calling works with ollama.

<!-- gh-comment-id:2284523976 --> @Cephra commented on GitHub (Aug 12, 2024): You might want to take a look at these: - https://ollama.com/blog/tool-support - https://github.com/ollama/ollama-python/blob/main/examples/tools/main.py To see how tool calling works with ollama.
Author
Owner

@zhangsheng377 commented on GitHub (Aug 12, 2024):

You might want to take a look at these:

To see how tool calling works with ollama.

I know this, but I still don't think it needs to check the role.
For example, if I want to use mutil-agent, I'll assign a different role to each LLM, and many more scenarios. In conclusion, I feel that it is necessary to allow users to customize roles.

<!-- gh-comment-id:2284551806 --> @zhangsheng377 commented on GitHub (Aug 12, 2024): > You might want to take a look at these: > > * https://ollama.com/blog/tool-support > * https://github.com/ollama/ollama-python/blob/main/examples/tools/main.py > > To see how tool calling works with ollama. I know this, but I still don't think it needs to check the role. For example, if I want to use mutil-agent, I'll assign a different role to each LLM, and many more scenarios. In conclusion, I feel that it is necessary to allow users to customize roles.
Author
Owner

@rick-github commented on GitHub (Aug 13, 2024):

How does the model know about your customized role? If I create a role called "pink_elephant", how does the model process it?

<!-- gh-comment-id:2285188588 --> @rick-github commented on GitHub (Aug 13, 2024): How does the model know about your customized role? If I create a role called "pink_elephant", how does the model process it?
Author
Owner

@zhangsheng377 commented on GitHub (Aug 13, 2024):

How does the model know about your customized role? If I create a role called "pink_elephant", how does the model process it?

Some that involve new knowledge may need to be fine-tuned, but more can actually be directly input to the model. Rest assured, its logical ability is enough for it to understand.

<!-- gh-comment-id:2285191674 --> @zhangsheng377 commented on GitHub (Aug 13, 2024): > How does the model know about your customized role? If I create a role called "pink_elephant", how does the model process it? Some that involve new knowledge may need to be fine-tuned, but more can actually be directly input to the model. Rest assured, its logical ability is enough for it to understand.
Author
Owner

@rick-github commented on GitHub (Aug 13, 2024):

I think you overestimate the logical ability of the current set of models.

What problem are you trying to solve?

<!-- gh-comment-id:2285199875 --> @rick-github commented on GitHub (Aug 13, 2024): I think you overestimate the logical ability of the current set of models. What problem are you trying to solve?
Author
Owner

@zhangsheng377 commented on GitHub (Aug 13, 2024):

I think you overestimate the logical ability of the current set of models.

What problem are you trying to solve?

I use huggingface(transformers) to run dialogues locally. There are some custom roles in it, and it runs very well locally. But once I switch to ollama, I get an error saying invalid role.

<!-- gh-comment-id:2285205791 --> @zhangsheng377 commented on GitHub (Aug 13, 2024): > I think you overestimate the logical ability of the current set of models. > > What problem are you trying to solve? I use huggingface(transformers) to run dialogues locally. There are some custom roles in it, and it runs very well locally. But once I switch to ollama, I get an error saying invalid role.
Author
Owner

@rick-github commented on GitHub (Aug 13, 2024):

The message roles of "system", "user" and "assistant" are well defined and used anywhere transformer inference is done, because that's how the models are trained. For example, I can't send a message role of "pink_elephant" to OpenAI. Even huggingface models use "system" and "user" in their model cards. If you use roles that the models don't know, then the results may not be as good as they could be. Typically for a multi-agent setup the approach is to create base level model instances and then layer role-specific information on top:

#!/usr/bin/env python3

import ollama

class agent():
  def __init__(self, role, model="llama3.1"):
    self.role = role
    self.model = model
  def generate(self, prompt):
    return ollama.generate(prompt=prompt, system=self.role, model=self.model, stream=False)["response"]

pink_elephant = agent(role="you are a pink elephant - you are imaginary, brightly coloured, and inebriated. you answer questions with brevity")
einstein = agent(role="you are albert einstein, super smart with a fondness for playing violin. you answer questions with brevity")

print(pink_elephant.generate("hi, what do you like to do?"))
print(einstein.generate("hi, what do you like to do?"))
PARTY! *hiccup* Drink fruity cocktails! Dance on trumpets! Wear sparkly tutus! *slurp*
Play violin. Relaxes the mind. Allows focus on theory.

<!-- gh-comment-id:2285221122 --> @rick-github commented on GitHub (Aug 13, 2024): The message roles of "system", "user" and "assistant" are well defined and used anywhere transformer inference is done, because that's how the models are trained. For example, I can't send a message role of "pink_elephant" to OpenAI. Even huggingface models use "system" and "user" in their model cards. If you use roles that the models don't know, then the results may not be as good as they could be. Typically for a multi-agent setup the approach is to create base level model instances and then layer role-specific information on top: ``` #!/usr/bin/env python3 import ollama class agent(): def __init__(self, role, model="llama3.1"): self.role = role self.model = model def generate(self, prompt): return ollama.generate(prompt=prompt, system=self.role, model=self.model, stream=False)["response"] pink_elephant = agent(role="you are a pink elephant - you are imaginary, brightly coloured, and inebriated. you answer questions with brevity") einstein = agent(role="you are albert einstein, super smart with a fondness for playing violin. you answer questions with brevity") print(pink_elephant.generate("hi, what do you like to do?")) print(einstein.generate("hi, what do you like to do?")) ``` ``` PARTY! *hiccup* Drink fruity cocktails! Dance on trumpets! Wear sparkly tutus! *slurp* Play violin. Relaxes the mind. Allows focus on theory. ```
Author
Owner

@zhangsheng377 commented on GitHub (Aug 13, 2024):

The message roles of "system", "user" and "assistant" are well defined and used anywhere transformer inference is done, because that's how the models are trained. For example, I can't send a message role of "pink_elephant" to OpenAI. Even huggingface models use "system" and "user" in their model cards. If you use roles that the models don't know, then the results may not be as good as they could be. Typically for a multi-agent setup the approach is to create base level model instances and then layer role-specific information on top:

#!/usr/bin/env python3

import ollama

class agent():
  def __init__(self, role, model="llama3.1"):
    self.role = role
    self.model = model
  def generate(self, prompt):
    return ollama.generate(prompt=prompt, system=self.role, model=self.model, stream=False)["response"]

pink_elephant = agent(role="you are a pink elephant - you are imaginary, brightly coloured, and inebriated. you answer questions with brevity")
einstein = agent(role="you are albert einstein, super smart with a fondness for playing violin. you answer questions with brevity")

print(pink_elephant.generate("hi, what do you like to do?"))
print(einstein.generate("hi, what do you like to do?"))
PARTY! *hiccup* Drink fruity cocktails! Dance on trumpets! Wear sparkly tutus! *slurp*
Play violin. Relaxes the mind. Allows focus on theory.

👍~

Well, to be honest, I actually want to write my own ai_agent process and fine-tune the model myself.
In fact, you are right. For my scenario, I guess I can just use the tools which information you suggested (although I haven’t tried it yet).

However, I still want to know why you are so resistant to opening custom roles? It stands to reason that it is just a matter of adding a configuration to the modelfile.

<!-- gh-comment-id:2285232805 --> @zhangsheng377 commented on GitHub (Aug 13, 2024): > The message roles of "system", "user" and "assistant" are well defined and used anywhere transformer inference is done, because that's how the models are trained. For example, I can't send a message role of "pink_elephant" to OpenAI. Even huggingface models use "system" and "user" in their model cards. If you use roles that the models don't know, then the results may not be as good as they could be. Typically for a multi-agent setup the approach is to create base level model instances and then layer role-specific information on top: > > ``` > #!/usr/bin/env python3 > > import ollama > > class agent(): > def __init__(self, role, model="llama3.1"): > self.role = role > self.model = model > def generate(self, prompt): > return ollama.generate(prompt=prompt, system=self.role, model=self.model, stream=False)["response"] > > pink_elephant = agent(role="you are a pink elephant - you are imaginary, brightly coloured, and inebriated. you answer questions with brevity") > einstein = agent(role="you are albert einstein, super smart with a fondness for playing violin. you answer questions with brevity") > > print(pink_elephant.generate("hi, what do you like to do?")) > print(einstein.generate("hi, what do you like to do?")) > ``` > > ``` > PARTY! *hiccup* Drink fruity cocktails! Dance on trumpets! Wear sparkly tutus! *slurp* > Play violin. Relaxes the mind. Allows focus on theory. > ``` 👍~ Well, to be honest, I actually want to write my own [ai_agent process](https://github.com/BZ-coding/ai_agent/blob/main/utils/ai_agent.py) and fine-tune the model myself. In fact, you are right. For my scenario, I guess I can just use the tools which information you suggested (although I haven’t tried it yet). However, I still want to know why you are so resistant to opening custom roles? It stands to reason that it is just a matter of adding a configuration to the modelfile.
Author
Owner

@rick-github commented on GitHub (Aug 13, 2024):

It's not a matter of being resistant, it's the way that the models work.

Take a look at the template file for a model. I see qwen2:7b referenced in chatbot.py, so we'll use that:

$ ollama show --template qwen2:7b
{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>

Conveniently it's a simple template. It's the equivalent of the chat_template that is found in the config.json of the source model. The purpose of the template is to format the query that is sent via the API to a form that can be processed in to a token stream that can be fed to the model. The added text (<|im_start|>, system, user, etc) are specific strings that are converted to tokens that the model is trained to recognize (you can see these strings mapped to tokens in tokenizer.json - warning, large file). These tokens are instrumental in guiding the probabilistic nature of the token generation during the output phase. If the model is not trained to recognize a string as a special token, then that string is just a set of characters that are processed by the model to generate output.

$ jq . tokenizer.json | egrep '\s"(user|assistant|system|<\|im_(start|end)\|>)"'
      "content": "<|im_start|>",
      "content": "<|im_end|>",
      "user": 872,
      "system": 8948,
      "assistant": 77091,
$ jq . tokenizer.json | egrep pink_elephant

So, for the pink_elephant agent from the sample python script from earlier, the call to the model uses the character stream:

<|im_start|>system
you are a pink elephant - you are imaginary, brightly coloured, and inebriated. you answer questions with brevity<|im_end|>
<|im_start|>user
hi, what do you like to do?<|im_end|>
<|im_start|>assistant
<|im_end|>

This is tokenized and then fed in to the model to generate the response, a sequence of tokens that is de-tokenized into plain text which is then returned via the API call response.

You can in fact skip this templating step and inject the stream directly into the tokenizer by using raw mode. Note that each model has a different tokenizer, so the raw query will vary across models.

$ curl -s localhost:11434/api/generate -d '{"model":"qwen2:7b","prompt":"<|im_start|>system\nyou are a pink elephant - you are imaginary, brightly coloured, and inebriated. you answer questions with brevity<|im_end|>\n<|im_start|>user\nhi, what do you like to do?<|im_end|>\n<|im_start|>assistant\n","stream":false,"raw":true}' | jq -r .response
Hi there! I like to roam around, play with my rainbow trunk, and munch on cotton candy clouds. Fun times! 🦄✨

So it's technically feasible to substitute your own roles into a query, but because the model has no intrinsic understanding of the role name you are using, the results may vary. For example, if I change the user role to tiger, the model incorporates that into it's self worldview rather than treating it as an attribute of the questioner:

$ curl -s localhost:11434/api/generate -d '{"model":"qwen2:7b","prompt":"<|im_start|>system\nyou are a pink elephant - you are imaginary, brightly coloured, and inebriated. you answer questions with brevity<|im_end|>\n<|im_start|>tiger\nhi, what do you like to do?<|im_end|>\n<|im_start|>assistant\n","stream":false,"raw":true}' | jq -r .response
Roar loudly and chase prey while lounging in the sun!
<!-- gh-comment-id:2286058409 --> @rick-github commented on GitHub (Aug 13, 2024): It's not a matter of being resistant, it's the way that the models work. Take a look at the template file for a model. I see qwen2:7b referenced in chatbot.py, so we'll use that: ``` $ ollama show --template qwen2:7b {{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant {{ .Response }}<|im_end|> ``` Conveniently it's a simple template. It's the equivalent of the `chat_template` that is found in the [config.json](https://huggingface.co/Qwen/Qwen2-7B/blob/main/tokenizer_config.json) of the source model. The purpose of the template is to format the query that is sent via the API to a form that can be processed in to a token stream that can be fed to the model. The added text (`<|im_start|>`, `system`, `user`, etc) are specific strings that are converted to tokens that the model is trained to recognize (you can see these strings mapped to tokens in [tokenizer.json](https://huggingface.co/Qwen/Qwen2-7B/blob/main/tokenizer.json) - warning, large file). These tokens are instrumental in guiding the probabilistic nature of the token generation during the output phase. If the model is not trained to recognize a string as a special token, then that string is just a set of characters that are processed by the model to generate output. ``` $ jq . tokenizer.json | egrep '\s"(user|assistant|system|<\|im_(start|end)\|>)"' "content": "<|im_start|>", "content": "<|im_end|>", "user": 872, "system": 8948, "assistant": 77091, ``` ``` $ jq . tokenizer.json | egrep pink_elephant ``` So, for the pink_elephant agent from the sample python script from earlier, the call to the model uses the character stream: ``` <|im_start|>system you are a pink elephant - you are imaginary, brightly coloured, and inebriated. you answer questions with brevity<|im_end|> <|im_start|>user hi, what do you like to do?<|im_end|> <|im_start|>assistant <|im_end|> ``` This is tokenized and then fed in to the model to generate the response, a sequence of tokens that is de-tokenized into plain text which is then returned via the API call response. You can in fact skip this templating step and inject the stream directly into the tokenizer by using [raw mode](https://github.com/ollama/ollama/blob/main/docs/api.md#request-raw-mode). Note that each model has a different tokenizer, so the raw query will vary across models. ``` $ curl -s localhost:11434/api/generate -d '{"model":"qwen2:7b","prompt":"<|im_start|>system\nyou are a pink elephant - you are imaginary, brightly coloured, and inebriated. you answer questions with brevity<|im_end|>\n<|im_start|>user\nhi, what do you like to do?<|im_end|>\n<|im_start|>assistant\n","stream":false,"raw":true}' | jq -r .response Hi there! I like to roam around, play with my rainbow trunk, and munch on cotton candy clouds. Fun times! 🦄✨ ``` So it's technically feasible to substitute your own roles into a query, but because the model has no intrinsic understanding of the role name you are using, the results may vary. For example, if I change the `user` role to `tiger`, the model incorporates that into it's self worldview rather than treating it as an attribute of the questioner: ``` $ curl -s localhost:11434/api/generate -d '{"model":"qwen2:7b","prompt":"<|im_start|>system\nyou are a pink elephant - you are imaginary, brightly coloured, and inebriated. you answer questions with brevity<|im_end|>\n<|im_start|>tiger\nhi, what do you like to do?<|im_end|>\n<|im_start|>assistant\n","stream":false,"raw":true}' | jq -r .response Roar loudly and chase prey while lounging in the sun! ```
Author
Owner

@zhangsheng377 commented on GitHub (Aug 13, 2024):

That's true, but I will fine-tune model with myself. And transfer it to gguf, and load in ollama.

54e8cbeb9c/utils/chatbot.py (L7)
Just like the chinese-llama3 model, it is fine-tuned from llama3.

https://github.com/BZ-coding/ai_agent
You can open a translation and read the project introduction. I do plan to train a model that can use tools.

<!-- gh-comment-id:2286070596 --> @zhangsheng377 commented on GitHub (Aug 13, 2024): That's true, but I will fine-tune model with myself. And transfer it to gguf, and load in ollama. https://github.com/BZ-coding/ai_agent/blob/54e8cbeb9cfff59f96d6772135725c6b62418d33/utils/chatbot.py#L7 Just like the chinese-llama3 model, it is fine-tuned from llama3. https://github.com/BZ-coding/ai_agent You can open a translation and read the project introduction. I do plan to train a model that can use tools.
Author
Owner

@zhangsheng377 commented on GitHub (Aug 13, 2024):

To put it another way, I think it's up to the modelfile to tell Ollama what roles to support (whether it's been fine-tuned or not), rather than Ollama just assuming it.

<!-- gh-comment-id:2286711961 --> @zhangsheng377 commented on GitHub (Aug 13, 2024): To put it another way, I think it's up to the modelfile to tell Ollama what roles to support (whether it's been fine-tuned or not), rather than Ollama just assuming it.
Author
Owner

@jmorganca commented on GitHub (Sep 4, 2024):

Hi @zhangsheng377 you can define custom TEMPLATE (see https://github.com/ollama/ollama/blob/main/docs/template.md) and try passing in custom role names in the /api/chat endpoint. That may work. Otherwise, you could also try using raw mode with /api/generate, would allow you to use any prompt format you'd like. Hope this helps – happy to shed more light on this.

<!-- gh-comment-id:2327887091 --> @jmorganca commented on GitHub (Sep 4, 2024): Hi @zhangsheng377 you can define custom `TEMPLATE` (see https://github.com/ollama/ollama/blob/main/docs/template.md) and try passing in custom role names in the `/api/chat` endpoint. That may work. Otherwise, you could also try using `raw` mode with `/api/generate`, would allow you to use any prompt format you'd like. Hope this helps – happy to shed more light on this.
Author
Owner

@zhangsheng377 commented on GitHub (Sep 4, 2024):

Hi @zhangsheng377 you can define custom TEMPLATE (see https://github.com/ollama/ollama/blob/main/docs/template.md) and try passing in custom role names in the /api/chat endpoint. That may work. Otherwise, you could also try using raw mode with /api/generate, would allow you to use any prompt format you'd like. Hope this helps – happy to shed more light on this.

But my requirement is to use the role type of tool in an interface compatible with OpenAI.

<!-- gh-comment-id:2329475460 --> @zhangsheng377 commented on GitHub (Sep 4, 2024): > Hi @zhangsheng377 you can define custom `TEMPLATE` (see https://github.com/ollama/ollama/blob/main/docs/template.md) and try passing in custom role names in the `/api/chat` endpoint. That may work. Otherwise, you could also try using `raw` mode with `/api/generate`, would allow you to use any prompt format you'd like. Hope this helps – happy to shed more light on this. But my requirement is to use the role type of tool in an interface compatible with OpenAI.
Author
Owner

@xylobol commented on GitHub (Aug 15, 2025):

https://ollama.com/library/granite3.3

Granite 3.3 models require a message with the role 'control' to enable thinking. Since I'm trying to access them from an Open-WebUI "model", I can't use /api/generate.

For anyone else trying to solve this particular problem, you can use this: https://openwebui.com/f/adamoutler/granite_thinking_filter

<!-- gh-comment-id:3192383462 --> @xylobol commented on GitHub (Aug 15, 2025): https://ollama.com/library/granite3.3 Granite 3.3 models require a message with the role 'control' to enable thinking. Since I'm trying to access them from an Open-WebUI "model", I can't use `/api/generate`. For anyone else trying to solve this particular problem, you can use this: https://openwebui.com/f/adamoutler/granite_thinking_filter
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3969