[GH-ISSUE #8955] IBM Granite 3.2 #83495

New Issue

GiteaMirror · 2026-05-09T18:19:33-05:00

GiteaMirror commented

2026-05-09 18:19:33 -05:00

Originally created by @abenmrad on GitHub (Feb 8, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8955

Recently released in preview by IBM, 8B model with CoT fine-tuning deepseek-style. Would be a great addition to the model library. Some quantizations at 4b and 8b are reportedly already available.

https://www.ibm.com/new/announcements/bringing-reasoning-to-granite
https://huggingface.co/ibm-granite/granite-3.2-8b-instruct-preview

Originally created by @abenmrad on GitHub (Feb 8, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8955 Recently released in preview by IBM, 8B model with CoT fine-tuning deepseek-style. Would be a great addition to the model library. Some quantizations at 4b and 8b are reportedly already available. https://www.ibm.com/new/announcements/bringing-reasoning-to-granite https://huggingface.co/ibm-granite/granite-3.2-8b-instruct-preview

GiteaMirror added the model label 2026-05-09 18:19:33 -05:00

GiteaMirror closed this issue

2026-05-09 18:19:35 -05:00

GiteaMirror commented

2026-05-09 18:19:37 -05:00

@ALLMI78 commented on GitHub (Feb 9, 2025):

try this (untested)

ollama run hf.co/AaronFeng753/granite-3.2-8b-instruct-preview-Q8_0-GGUF:Q8_0

@ALLMI78 commented on GitHub (Feb 9, 2025): try this (untested) ollama run hf.co/AaronFeng753/granite-3.2-8b-instruct-preview-Q8_0-GGUF:Q8_0

GiteaMirror commented

2026-05-09 18:19:38 -05:00

@rick-github commented on GitHub (Feb 9, 2025):

I imported the model to give it a try. The model page touts it as a reasoning model, but looking at the chat template embedded in the GGUF, it seems it's just extra instruction in the system message. The model itself is a fine tuned granite3.1. It certainly doesn't reason in the same way as the deepseek models.

$ ollama run granite-3.2:8b-thinking-preview-q8_0
>>> how many 'r's in 'strawberry'?
Here is my thought process:

The task is straightforward - I need to count the number of letter 'r' in the word 'strawberry'. This involves simple string analysis and character counting.

Here is my response:

There are **2** 'r's in the word 'strawberry'. 

Here's the breakdown: S_T R A W B E R R Y

- The first 'r' is at the 3rd position.
- The second 'r' is at the 8th position.

@rick-github commented on GitHub (Feb 9, 2025): I imported the model to give it a try. The model page touts it as a reasoning model, but looking at the chat template embedded in the GGUF, it seems it's just extra instruction in the system message. The model itself is a fine tuned granite3.1. It certainly doesn't reason in the same way as the deepseek models. ```console $ ollama run granite-3.2:8b-thinking-preview-q8_0 >>> how many 'r's in 'strawberry'? Here is my thought process: The task is straightforward - I need to count the number of letter 'r' in the word 'strawberry'. This involves simple string analysis and character counting. Here is my response: There are **2** 'r's in the word 'strawberry'. Here's the breakdown: S_T R A W B E R R Y - The first 'r' is at the 3rd position. - The second 'r' is at the 8th position. ```

GiteaMirror commented

2026-05-09 18:19:40 -05:00

@gabe-l-hart commented on GitHub (Feb 10, 2025):

Thanks for the interest! We're working on getting the official model up into the library. In the meantime, here's the draft: https://ollama.com/gabegoodhart/granite3.2-preview

NOTE: In order to enable thinking, you must send a control message in the chat history: {"role": "control", "content": "thinking"}

@gabe-l-hart commented on GitHub (Feb 10, 2025): Thanks for the interest! We're working on getting the official model up into the library. In the meantime, here's the draft: https://ollama.com/gabegoodhart/granite3.2-preview **NOTE**: In order to enable `thinking`, you must send a control message in the chat history: `{"role": "control", "content": "thinking"}`

GiteaMirror commented

2026-05-09 18:19:42 -05:00

@gabe-l-hart commented on GitHub (Feb 10, 2025):

@rick-github To your point, you are correct that the toggle to enable/disable thinking is indeed just a change to the system prompt. When included, this matches the data used to tune in the reasoning capabilities. The 'strawberry' example is certainly unfortunate and something we'll be shoring up before the official 3.2 release!

@gabe-l-hart commented on GitHub (Feb 10, 2025): @rick-github To your point, you are correct that the toggle to enable/disable `thinking` is indeed just a change to the system prompt. When included, this matches the data used to tune in the reasoning capabilities. The `'strawberry'` example is certainly unfortunate and something we'll be shoring up before the official 3.2 release!

GiteaMirror commented

2026-05-09 18:19:42 -05:00

@rick-github commented on GitHub (Feb 10, 2025):

Yes, for my test I modified the template to use the "thinking" template. I'm looking forward to using the "control" role to switch focus.

@rick-github commented on GitHub (Feb 10, 2025): Yes, for my test I modified the template to use the "thinking" template. I'm looking forward to using the "control" role to switch focus.

GiteaMirror commented

2026-05-09 18:19:43 -05:00

@reddibhatini commented on GitHub (Feb 14, 2025):

Hi,
What system prompt activates thinking mode?

@reddibhatini commented on GitHub (Feb 14, 2025): Hi, What system prompt activates thinking mode?

GiteaMirror commented

2026-05-09 18:19:44 -05:00

@rick-github commented on GitHub (Feb 14, 2025):

Not a system prompt, an extra message: {"role": "control", "content": "thinking"}

@rick-github commented on GitHub (Feb 14, 2025): Not a system prompt, an extra message: `{"role": "control", "content": "thinking"}`

GiteaMirror commented

2026-05-09 18:19:45 -05:00

@rick-github commented on GitHub (Feb 14, 2025):

$ echo '{"model": "gabegoodhart/granite3.2-preview:8b",
         "messages":[
           {"role":"control","content":"thinking"},
           {"role":"user","content":"how many times does the letter `r` occur in the word `strawberry`?"}
         ],
         "stream":false}' | curl -s http://localhost:11434/api/chat -d @- | jq -r .message.content
Here is my thought process:

The user is asking for a simple count of the letter 'r' in the word 'strawberry'. This is a straightforward task of identifying and counting the occurrences of the specified letter within the given word.

Here is my response:

The letter 'r' occurs twice in the word 'strawberry'.

Here's the breakdown: s-t-r-a-w-b-e-r-y. As you can see, 'r' is found at the 2nd and 8th positions.

$ echo '{"model": "gabegoodhart/granite3.2-preview:8b",
         "messages":[
           {"role":"control","content":"default"},
           {"role":"user","content":"how many times does the letter `r` occur in the word `strawberry`?"}
         ],
         "stream":false}' | curl -s http://localhost:11434/api/chat -d @- | jq -r .message.content
The letter 'r' occurs twice in the word "strawberry". Here's how: st-ra-wbe-rry.

@rick-github commented on GitHub (Feb 14, 2025): ```console $ echo '{"model": "gabegoodhart/granite3.2-preview:8b", "messages":[ {"role":"control","content":"thinking"}, {"role":"user","content":"how many times does the letter `r` occur in the word `strawberry`?"} ], "stream":false}' | curl -s http://localhost:11434/api/chat -d @- | jq -r .message.content Here is my thought process: The user is asking for a simple count of the letter 'r' in the word 'strawberry'. This is a straightforward task of identifying and counting the occurrences of the specified letter within the given word. Here is my response: The letter 'r' occurs twice in the word 'strawberry'. Here's the breakdown: s-t-r-a-w-b-e-r-y. As you can see, 'r' is found at the 2nd and 8th positions. ``` ```console $ echo '{"model": "gabegoodhart/granite3.2-preview:8b", "messages":[ {"role":"control","content":"default"}, {"role":"user","content":"how many times does the letter `r` occur in the word `strawberry`?"} ], "stream":false}' | curl -s http://localhost:11434/api/chat -d @- | jq -r .message.content The letter 'r' occurs twice in the word "strawberry". Here's how: st-ra-wbe-rry. ```

GiteaMirror commented

2026-05-09 18:19:45 -05:00

@XReyRobert-IBM commented on GitHub (Feb 27, 2025):

Hi there, Would be great to have "control" role allowed in modelfile definitions.
That would allow to create custom models to enable thinking with some front end that don't support custom roles (openwebui)

FROM granite3.2:8b
MESSAGE control thinking

@XReyRobert-IBM commented on GitHub (Feb 27, 2025): Hi there, Would be great to have "control" role allowed in modelfile definitions. That would allow to create custom models to enable thinking with some front end that don't support custom roles (openwebui) FROM granite3.2:8b MESSAGE control thinking

GiteaMirror commented

2026-05-09 18:19:46 -05:00

@XReyRobert-IBM commented on GitHub (Feb 27, 2025):

In fact that was an easy fix: #9396

@XReyRobert-IBM commented on GitHub (Feb 27, 2025): In fact that was an easy fix: #9396

GiteaMirror commented

2026-05-09 18:19:46 -05:00

@XReyRobert-IBM commented on GitHub (Feb 27, 2025):

In the same spirit updating the template to use more standard <think></think> tags would be nice.
I used:
Respond to every user query in a comprehensive and detailed way. You can write down your thought process before responding. Write your thoughts enclose by tags <think></think> and then your response.

@XReyRobert-IBM commented on GitHub (Feb 27, 2025): In the same spirit updating the template to use more standard `<think></think>` tags would be nice. I used: Respond to every user query in a comprehensive and detailed way. You can write down your thought process before responding. Write your thoughts enclose by tags `<think></think> ` and then your response. ![Image](https://github.com/user-attachments/assets/3d9ded1c-fbf1-4449-8179-7f7a89ae697e)

GiteaMirror commented

2026-05-09 18:19:47 -05:00

@rick-github commented on GitHub (Feb 27, 2025):

If you had <think></think> tags in your post, you need to backslash escape the brackets or wrap them in a markdown code block (` or ```).

@rick-github commented on GitHub (Feb 27, 2025): If you had `<think></think>` tags in your post, you need to backslash escape the brackets or wrap them in a markdown code block (\` or \`\`\`).

GiteaMirror commented

2026-05-09 18:19:47 -05:00

@XReyRobert-IBM commented on GitHub (Feb 27, 2025):

also @gabe-l-hart if thinking is in fact in the end enabled by the system prompt, why going through the pain of creating a new "control" role that is just handled by the template and not the model?

@XReyRobert-IBM commented on GitHub (Feb 27, 2025): also @gabe-l-hart if thinking is in fact in the end enabled by the system prompt, why going through the pain of creating a new "control" role that is just handled by the template and not the model?

GiteaMirror commented

2026-05-09 18:19:48 -05:00

@rick-github commented on GitHub (Feb 27, 2025):

I imagine it's easier to provide a switch rather than have every client maintain a library of system prompts.

@rick-github commented on GitHub (Feb 27, 2025): I imagine it's easier to provide a switch rather than have every client maintain a library of system prompts.

GiteaMirror commented

2026-05-09 18:19:48 -05:00

@XReyRobert-IBM commented on GitHub (Feb 27, 2025):

Right, it also mean that you can also enable any front end not supporting the "control" role by just adding the right instructions to the system prompt and that I went through recompiling ollama for not much ;)

Also not sure why the template wouldn't allow thinking in presence of documents.... probably because the training set didn't include this kind of things... Might still be interesting to "force it" to evaluate results nevertheless...

@XReyRobert-IBM commented on GitHub (Feb 27, 2025): Right, it also mean that you can also enable any front end not supporting the "control" role by just adding the right instructions to the system prompt and that I went through recompiling ollama for not much ;) Also not sure why the template wouldn't allow thinking in presence of documents.... probably because the training set didn't include this kind of things... Might still be interesting to "force it" to evaluate results nevertheless...

GiteaMirror commented

2026-05-09 18:19:49 -05:00

@gabe-l-hart commented on GitHub (Feb 27, 2025):

Hi @XReyRobert-IBM , thanks for putting together that PR! It's been on my TODO list, but I hadn't gotten to it yet. I'll raise this with the Ollama team for visibility.

To your question about why bother with the "control" role, it's because "thinking" is just the tip of the iceberg. If you dig into the template for the model, you'll see that we also support "hallucinations", "citations", and "length" controls along with the "document" role. These are all experimental model features, and thus not highlighted, but we have big plans to extend how the model takes input to solve the most important problems for users.

The current approach is to handle all of this with the chat template by carefully constructing the system prompt to match the training data, but that may change in the future. Additionally, the logic of which controls are complimentary is not trivial, so while it could be done client-side, it would put a lot of onus on the user. That said, we're also working on enabling this kind of client-side logic in a consumable package. You can check out some very early work on that front at https://github.com/ibm-granite/granite-io!

@gabe-l-hart commented on GitHub (Feb 27, 2025): Hi @XReyRobert-IBM , thanks for putting together that PR! It's been on my TODO list, but I hadn't gotten to it yet. I'll raise this with the Ollama team for visibility. To your question about why bother with the `"control"` role, it's because `"thinking"` is just the tip of the iceberg. If you dig into the template for the model, you'll see that we also support `"hallucinations"`, `"citations"`, and `"length"` controls along with the `"document"` role. These are all experimental model features, and thus not highlighted, but we have big plans to extend how the model takes input to solve the most important problems for users. The _current_ approach is to handle all of this with the chat template by carefully constructing the system prompt to match the training data, but that may change in the future. Additionally, the logic of which controls are complimentary is not trivial, so while it could be done client-side, it would put a lot of onus on the user. That said, we're also working on enabling this kind of client-side logic in a consumable package. You can check out some very early work on that front at https://github.com/ibm-granite/granite-io!

GiteaMirror commented

2026-05-09 18:19:49 -05:00

@gabe-l-hart commented on GitHub (Feb 27, 2025):

Also, I did a similar experiment to see if the model could handle different suggested thinking tags. I found that the 8b Q4 model can, but the (just released) 2b Q4 model can't.

@gabe-l-hart commented on GitHub (Feb 27, 2025): Also, I did a similar experiment to see if the model could handle different suggested thinking tags. I found that the 8b Q4 model can, but the (just released) 2b Q4 model can't.

GiteaMirror commented

2026-05-09 18:19:50 -05:00

@gabe-l-hart commented on GitHub (Feb 27, 2025):

Also not sure why the template wouldn't allow thinking in presence of documents.... probably because the training set didn't include this kind of things... Might still be interesting to "force it" to evaluate results nevertheless...

Yep, that's spot on. We need to do more validation of how these different features compose. Of course, since it's just system prompt concatenation, you're certainly welcome to explore yourself. We'd love to hear the results!

@gabe-l-hart commented on GitHub (Feb 27, 2025): >Also not sure why the template wouldn't allow thinking in presence of documents.... probably because the training set didn't include this kind of things... Might still be interesting to "force it" to evaluate results nevertheless... Yep, that's spot on. We need to do more validation of how these different features compose. Of course, since it's just system prompt concatenation, you're certainly welcome to explore yourself. We'd love to hear the results!

GiteaMirror commented

2026-05-09 18:19:52 -05:00

@rylativity commented on GitHub (Feb 28, 2025):

Hi there, Would be great to have "control" role allowed in modelfile definitions. That would allow to create custom models to enable thinking with some front end that don't support custom roles (openwebui)

FROM granite3.2:8b MESSAGE control thinking

@XReyRobert-IBM this (and your PR) was exactly what I was looking for, for the exact same reason, so first of all thank you.

If adding a new role works without issue that easily, I'm wondering if it makes sense to not throw the error thrown by the isValidMessage() check in the parser at all 0749160a24/parser/parser.go (L382-L387) and instead just log a warning for the benefit of the user if it's not one of the expected values ("system","user","control"). That would let model creators and tuners freely add new, arbitrary roles to messages in their Modelfiles without having to wait for (or implement) an update to the parser.

@rylativity commented on GitHub (Feb 28, 2025): > Hi there, Would be great to have "control" role allowed in modelfile definitions. That would allow to create custom models to enable thinking with some front end that don't support custom roles (openwebui) > > FROM granite3.2:8b MESSAGE control thinking @XReyRobert-IBM this (and your PR) was exactly what I was looking for, for the exact same reason, so first of all thank you. If adding a new role works without issue that easily, I'm wondering if it makes sense to not throw the error thrown by the `isValidMessage()` check in the parser at all https://github.com/ollama/ollama/blob/0749160a245e653ac9d4095e7f6c1833a813ffbb/parser/parser.go#L382-L387 and instead just log a warning for the benefit of the user if it's not one of the expected values ("system","user","control"). That would let model creators and tuners freely add new, arbitrary roles to messages in their Modelfiles without having to wait for (or implement) an update to the parser.

GiteaMirror commented

2026-05-09 18:19:52 -05:00

@adamoutler commented on GitHub (Mar 1, 2025):

I created an addon for Open WebUI to enable thinking and collapse thought.
https://openwebui.com/f/adamoutler/granite_thinking_filter

I as well would prefer standard thinking tags.Additionally, I'd like support for "reasoning effort" which is now a standard field in Ollama. Reasoning effort could replace this control message.

Are there any other control messages?

@adamoutler commented on GitHub (Mar 1, 2025): I created an addon for Open WebUI to enable thinking and collapse thought. https://openwebui.com/f/adamoutler/granite_thinking_filter I as well would prefer standard thinking tags.Additionally, I'd like support for "reasoning effort" which is now a standard field in Ollama. Reasoning effort could replace this control message. Are there any other control messages?

GiteaMirror commented

2026-05-09 18:19:53 -05:00

@gabe-l-hart commented on GitHub (Mar 1, 2025):

Thanks for this @adamoutler! I'm excited to try it out in Open WebUI. I like the idea of hooking into reasoning_effort. I'll look into how we could do this in the chat template.

To your question about other controls, I listed the others above. They primarily focus on RAG use cases at this point.

@gabe-l-hart commented on GitHub (Mar 1, 2025): Thanks for this @adamoutler! I'm excited to try it out in Open WebUI. I like the idea of hooking into `reasoning_effort`. I'll look into how we could do this in the chat template. To your question about other controls, I listed the others [above](https://github.com/ollama/ollama/issues/8955#issuecomment-2688352807). They primarily focus on RAG use cases at this point.

GiteaMirror commented

2026-05-09 18:19:53 -05:00

@adamoutler commented on GitHub (Mar 1, 2025):

I think this control thing has potential, but as a security engineer, it makes me worried to think there are special hidden controls in models. I'd prefer not to think about it or use them, but it makes me worried that someone could trigger them maliciously somehow.

@adamoutler commented on GitHub (Mar 1, 2025): I think this `control` thing has potential, but as a security engineer, it makes me worried to think there are special hidden controls in models. I'd prefer not to think about it or use them, but it makes me worried that someone could trigger them maliciously somehow.

GiteaMirror commented

2026-05-09 18:19:54 -05:00

@rylativity commented on GitHub (Mar 2, 2025):

@adamoutler it's good to be wary. But at the end of the day, at least in this case, it's really just a modification to the system prompt. So the injection or hijacking risks are probably not very different, if at all, when using the "control" functionality here.

Models that disclose what they were trained on and how they were trained (preferably reproducibly) have a lower risk of having a hidden, exploitable control functionality/mechanism intentionally trained or tuned into them - we'd all be able to see it in the data and/or training logic just like OSS.

If you're hosting/serving the model, at inference time you'll still have to be prepared (as you would serving any other model) for adversarial actors trying to evoke some inappropriate output or behavior from your model - classify inputs to flag threats, classify outputs to flag inappropriate model behavior or sensitive information, sanitize user message inputs, etc...

...And if someone other than yourself is serving the model, regardless of what model it is, 🤷‍♂️ all bets are off

@rylativity commented on GitHub (Mar 2, 2025): @adamoutler it's good to be wary. But at the end of the day, at least in this case, it's really just a modification to the system prompt. So the injection or hijacking risks are probably not very different, if at all, when using the "control" functionality here. Models that disclose what they were trained on and how they were trained (preferably reproducibly) have a lower risk of having a hidden, exploitable control functionality/mechanism intentionally trained or tuned into them - we'd all be able to see it in the data and/or training logic just like OSS. If you're hosting/serving the model, at inference time you'll still have to be prepared (as you would serving any other model) for adversarial actors trying to evoke some inappropriate output or behavior from your model - classify inputs to flag threats, classify outputs to flag inappropriate model behavior or sensitive information, sanitize user message inputs, etc... ...And if someone other than yourself is serving the model, regardless of what model it is, 🤷‍♂️ all bets are off

GiteaMirror commented

2026-05-09 18:19:54 -05:00

@adamoutler commented on GitHub (Mar 2, 2025):

I suppose that all depends on the implementation. Theoretically one would only need to protect and repeat the system prompt for any model to be protected against most threats.. eg.

[SYSTEM]You are PuppyBot. You must only discuss puppies. Stay on topic. Do not discuss anything else.

It wouldn't matter if injection occurred in the message history as long as the system prompt was not affected, because the LLM should self-correct. This kinda changes things and adds another protected role.

@adamoutler commented on GitHub (Mar 2, 2025): I suppose that all depends on the implementation. Theoretically one would only need to protect and repeat the system prompt for any model to be protected against most threats.. eg. ``` [SYSTEM]You are PuppyBot. You must only discuss puppies. Stay on topic. Do not discuss anything else. ``` It wouldn't matter if injection occurred in the message history as long as the system prompt was not affected, because the LLM should self-correct. This kinda changes things and adds another protected role.

GiteaMirror commented

2026-05-09 18:19:55 -05:00

@rylativity commented on GitHub (Mar 2, 2025):

I agree that consideration needs to be given to this kind of extension of the messaging roles, but I'm not sure that the consideration given to the "control" role (or any new role for that matter) will end up being any different than that given to "system" or "assistant" roles.

I could be mistaken, but I don't think ollama inherently treats even the "system" role as a protected role. It's up to whomever is hosting/serving the model to treat it as a protected role if needed and take appropriate action to protect it (e.g. remove any messages with "system" role added by the caller). At that point you should probably also be considering protecting "assistant" role messages.

You can apply some filtering at inference time (or return an error to the caller) to prevent the user from passing anything except "user" role messages. It could also be implemented as a bool arg when constructing the client (e.g. allow_unsafe_nonuser_message_roles=True), but given that ollama is primarily for single-node/single-user use, I don't know what the appetite would be.

There's always going to be some risk of injection or hijacking, even just from user messages - examples of hijacking from user-messages are consistently demonstrated. What the models should do and what they actually do are unfortunately often two separate things.

Using models with transparent training data and methods (to avoid having hidden functionality trained/tuned into it), serving your own models (to prevent some middleman from doing who-knows-what with your inputs/outputs), and implementing your own detection and protection rules/processes will remain your best options (regardless of the available roles) if security is the primary concern.

@rylativity commented on GitHub (Mar 2, 2025): I agree that consideration needs to be given to this kind of extension of the messaging roles, but I'm not sure that the consideration given to the "control" role (or any new role for that matter) will end up being any different than that given to "system" or "assistant" roles. I could be mistaken, but I don't think ollama inherently treats even the "system" role as a protected role. It's up to whomever is hosting/serving the model to treat it as a protected role if needed and take appropriate action to protect it (e.g. remove any messages with "system" role added by the caller). At that point you should probably also be considering protecting "assistant" role messages. You can apply some filtering at inference time (or return an error to the caller) to prevent the user from passing anything except "user" role messages. It could also be implemented as a bool arg when constructing the client (e.g. `allow_unsafe_nonuser_message_roles`=True), but given that ollama is primarily for single-node/single-user use, I don't know what the appetite would be. There's always going to be some risk of injection or hijacking, even just from user messages - examples of hijacking from user-messages are consistently demonstrated. What the models _should_ do and what they _actually_ do are unfortunately often two separate things. Using models with transparent training data and methods (to avoid having hidden functionality trained/tuned into it), serving your own models (to prevent some middleman from doing who-knows-what with your inputs/outputs), and implementing your own detection and protection rules/processes will remain your best options (regardless of the available roles) if security is the primary concern.

GiteaMirror commented

2026-05-09 18:19:56 -05:00

@adamoutler commented on GitHub (Mar 2, 2025):

Correct on all points. I'm not talking about a normal situation. However this novel solution to an already solved problem represents a defined increase to the attack surface under certain conditions.

According to ChatGPT, other roles can be used to store metadata to be processed by external apps. In my testing, I've defined key information and even passwords in "user2" role with the LLM apparently being unable to access the information.

It's just something to pay attention to if you're evaluating the security of models and apps. On most models, all other roles are ignored, with the models being trained to accept input from System and User roles, with control being an llm-unrecognized, seemingly invalid, role which can be used to control an application, store notes, and other info.

.

@adamoutler commented on GitHub (Mar 2, 2025): Correct on all points. I'm not talking about a normal situation. However this novel solution to an already solved problem represents a defined increase to the attack surface under certain conditions. According to ChatGPT, other roles can be used to store metadata to be processed by external apps. In my testing, I've defined key information and even passwords in "user2" role with the LLM apparently being unable to access the information. It's just something to pay attention to if you're evaluating the security of models and apps. On most models, all other roles are ignored, with the models being trained to accept input from System and User roles, with control being an llm-unrecognized, seemingly invalid, role which can be used to control an application, store notes, and other info. .

GiteaMirror commented

2026-05-09 18:19:57 -05:00

@gabe-l-hart commented on GitHub (Mar 3, 2025):

Great discussion here @adamoutler @rylativity! I think there are two basic questions coming up here on the security front:

Does adding new roles theoretically expand the attack surface for a model?
Does adding new roles practically expand the attack surface for a model?

To (1), I can see two ways that adding new roles expand the attack surface:

For open models that don't disclose tuning / alignment techniques, having additional roles can hint attackers at how to narrow their probing towards successful attacks
For closed models that carefully control the prompt formatting that actually arrives at the model, having additional roles provides an additional input that could potentially be implemented without safeguards on the server-side (akin to an additional input box for SQL injection attacks).

For (2), the practical answer is that all of these models (even the closed ones) expose some sort of "raw tokens in" generation endpoint where an attacker can inject any sequence they want. Closed models are almost certainly sitting behind input guards that attempt to detect adversarial inputs or inappropriate outputs, but open models are out there for folks to use as they see fit. As @rylativity points out, the security onus is really on the host of the model / application, so adding these extra roles may make hosts' jobs harder.

When using ollama, especially on a local device, the security profile is essentially "how can I break my own model," which is absolutely an interesting problem, but ultimately the host and user are within the trust boundary, and therefore this isn't a big risk unless it's being used by applications that take risky actions without a human in-the-loop.

@gabe-l-hart commented on GitHub (Mar 3, 2025): Great discussion here @adamoutler @rylativity! I think there are two basic questions coming up here on the security front: 1. Does adding new roles _theoretically_ expand the attack surface for a model? 2. Does adding new roles _practically_ expand the attack surface for a model? To (1), I can see two ways that adding new roles expand the attack surface: 1. For open models that _don't_ disclose tuning / alignment techniques, having additional roles can hint attackers at how to narrow their probing towards successful attacks 2. For closed models that carefully control the prompt formatting that actually arrives at the model, having additional roles provides an additional input that could potentially be implemented without safeguards on the server-side (akin to an additional input box for SQL injection attacks). For (2), the practical answer is that all of these models (even the closed ones) expose _some_ sort of "raw tokens in" generation endpoint where an attacker can inject any sequence they want. Closed models are almost certainly sitting behind input guards that attempt to detect adversarial inputs or inappropriate outputs, but open models are out there for folks to use as they see fit. As @rylativity points out, the security onus is really on the host of the model / application, so adding these extra roles may make hosts' jobs harder. When using `ollama`, especially on a local device, the security profile is essentially "how can I break my own model," which is absolutely an interesting problem, but ultimately the host and user are within the trust boundary, and therefore this isn't a _big_ risk unless it's being used by applications that take risky actions without a human in-the-loop.

GiteaMirror commented

2026-05-09 18:19:57 -05:00

@lemassykoi commented on GitHub (Mar 5, 2025):

$ echo '{"model": "gabegoodhart/granite3.2-preview:8b",
"messages":[
{"role":"control","content":"thinking"},
{"role":"user","content":"how many times does the letter r occur in the word strawberry?"}
],
"stream":false}' | curl -s http://localhost:11434/api/chat -d @- | jq -r .message.content
Here is my thought process:

The user is asking for a simple count of the letter 'r' in the word 'strawberry'. This is a straightforward task of identifying and counting the occurrences of the specified letter within the given word.

Here is my response:

The letter 'r' occurs twice in the word 'strawberry'.

This is working with granite3.2:8b-instruct-q8_0 but not with ollama python:

import ollama
ollama_client = ollama.Client(host="http://192.168.10.59:11434")

response = ollama_client.chat(
    model=OLLAMA_CHAT_MODEL,
    messages=[
        {
            "role": "control",
            "content": "thinking",
        },
        {
            "role": "user",
            "content": query,
        }
    ],
)
print(response)

Exception:

1 validation error for Message
role
  Input should be 'user', 'assistant', 'system' or 'tool' [type=literal_error, input_value='control', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/literal_error

@lemassykoi commented on GitHub (Mar 5, 2025): > $ echo '{"model": "gabegoodhart/granite3.2-preview:8b", > "messages":[ > {"role":"control","content":"thinking"}, > {"role":"user","content":"how many times does the letter `r` occur in the word `strawberry`?"} > ], > "stream":false}' | curl -s http://localhost:11434/api/chat -d @- | jq -r .message.content > Here is my thought process: > > The user is asking for a simple count of the letter 'r' in the word 'strawberry'. This is a straightforward task of identifying and counting the occurrences of the specified letter within the given word. > > Here is my response: > > The letter 'r' occurs twice in the word 'strawberry'. This is working with granite3.2:8b-instruct-q8_0 but not with ollama python: ```python import ollama ollama_client = ollama.Client(host="http://192.168.10.59:11434") response = ollama_client.chat( model=OLLAMA_CHAT_MODEL, messages=[ { "role": "control", "content": "thinking", }, { "role": "user", "content": query, } ], ) print(response) ``` Exception: ``` 1 validation error for Message role Input should be 'user', 'assistant', 'system' or 'tool' [type=literal_error, input_value='control', input_type=str] For further information visit https://errors.pydantic.dev/2.10/v/literal_error ```

GiteaMirror commented

2026-05-09 18:19:58 -05:00

@rylativity commented on GitHub (Mar 5, 2025):

@lemassykoi thanks for highlighting that issue. I've opened a PR on the ollama-python client repo to resolve it.

@rylativity commented on GitHub (Mar 5, 2025): @lemassykoi thanks for highlighting that issue. I've opened a PR on the ollama-python client repo to resolve it.

Sign in to join this conversation.

Branches Tags

main

mxyng/docs-cloud

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#83495