[GH-ISSUE #14560] Respect model-file overriding the RENDERER / PARSER with a custom TEMPLATE #55957

Open
opened 2026-04-29 10:03:17 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @cipriancraciun on GitHub (Mar 2, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14560

What is the issue?

At the moment, in latest Ollama v0.17.5, if one wants to create a new model that shares the weights of an existing Qwen 3.5 model but uses a completely custom TEMPLATE, this is not possible because the new model inherits the RENDERER / PARSER characteristics of the FROM parent model, and the specified TEMPLATE is not taken into account.

For example:

FROM qwen3.5:4b-q8_0

TEMPLATE """

{{- "<|im_start|>system\n" -}}

Some custom system prompt prelude...

{{- .System -}}
{{- "<|im_end|>\n" -}}

{{- "<|im_start|>user\n" -}}
{{ .Prompt }}
{{- "<|im_end|>\n" -}}

{{- "<|im_start|>assistant\n" -}}
{{- "<think>\n\n</think>\n\n" -}}

"""

Creating this model doesn't take the TEMPLATE at all into consideration (as seen in the server logs with OLLAMA_DEBUG=2).

However, if instead of FROM qwen3.5:... one uses the weights file FROM /.../blobs/sha256-acaad28d51b81c74cae813475866f274730f97e4e687464cdd0ac369ef20032c it works as expected.

Trying to set RENDERER "" and PARSER "" doesn't seem to work either.

Ollama version

0.17.5

Originally created by @cipriancraciun on GitHub (Mar 2, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14560 ### What is the issue? At the moment, in latest Ollama v0.17.5, if one wants to create a new model that shares the weights of an existing Qwen 3.5 model but uses a completely custom `TEMPLATE`, this is not possible because the new model inherits the `RENDERER` / `PARSER` characteristics of the `FROM` parent model, and the specified `TEMPLATE` is not taken into account. For example: ~~~~ FROM qwen3.5:4b-q8_0 TEMPLATE """ {{- "<|im_start|>system\n" -}} Some custom system prompt prelude... {{- .System -}} {{- "<|im_end|>\n" -}} {{- "<|im_start|>user\n" -}} {{ .Prompt }} {{- "<|im_end|>\n" -}} {{- "<|im_start|>assistant\n" -}} {{- "<think>\n\n</think>\n\n" -}} """ ~~~~ Creating this model doesn't take the `TEMPLATE` at all into consideration (as seen in the server logs with `OLLAMA_DEBUG=2`). However, if instead of `FROM qwen3.5:...` one uses the weights file `FROM /.../blobs/sha256-acaad28d51b81c74cae813475866f274730f97e4e687464cdd0ac369ef20032c` it works as expected. Trying to set `RENDERER ""` and `PARSER ""` doesn't seem to work either. ### Ollama version 0.17.5
GiteaMirror added the bug label 2026-04-29 10:03:17 -05:00
Author
Owner

@majiayu000 commented on GitHub (Mar 6, 2026):

Hi, I traced the root cause of this issue.

When creating a model with FROM <model_name> and a custom TEMPLATE, the parent model's Renderer and Parser are unconditionally inherited into the child model's config. At inference time, built-in renderers take precedence over templates (prompt.go:117), so the custom template is effectively ignored.

The fix: skip inheriting Renderer and Parser from the parent when the child model specifies its own TEMPLATE. This preserves the existing behavior for models without custom templates (renderer/parser still inherited), while respecting the user's intent when they explicitly override the template.

Before I open a PR, does this approach look right? Specifically:
A) Skip renderer/parser inheritance when TEMPLATE is set (my current approach), or
B) Add runtime logic to prefer template over renderer when the model has both?

I lean toward (A) since it's simpler and matches user intent — if you're overriding the template, you don't want the built-in renderer to silently take over.

<!-- gh-comment-id:4013148292 --> @majiayu000 commented on GitHub (Mar 6, 2026): Hi, I traced the root cause of this issue. When creating a model with `FROM <model_name>` and a custom `TEMPLATE`, the parent model's `Renderer` and `Parser` are unconditionally inherited into the child model's config. At inference time, built-in renderers take precedence over templates (`prompt.go:117`), so the custom template is effectively ignored. The fix: skip inheriting `Renderer` and `Parser` from the parent when the child model specifies its own `TEMPLATE`. This preserves the existing behavior for models without custom templates (renderer/parser still inherited), while respecting the user's intent when they explicitly override the template. Before I open a PR, does this approach look right? Specifically: A) Skip renderer/parser inheritance when TEMPLATE is set (my current approach), or B) Add runtime logic to prefer template over renderer when the model has both? I lean toward (A) since it's simpler and matches user intent — if you're overriding the template, you don't want the built-in renderer to silently take over.
Author
Owner

@cipriancraciun commented on GitHub (Mar 6, 2026):

I lean toward (A) since it's simpler and matches user intent — if you're overriding the template, you don't want the built-in renderer to silently take over.

@majiayu000 I would also say that (A) is the least surprising (for the user) and perhaps simpler to implement.

However, might I suggest that if the user specifies in the model-file both TEMPLATE and one of RENDERER or PARSER, then to issue an error (or at least warning) that the TEMPLATE would be silently ignored? (This is to stop users that copy-paste the ollama show --modelfile some-model, which contain renderer / parser, and adding their own TEMPLATE without realizing that it will be ignored.)

<!-- gh-comment-id:4014004615 --> @cipriancraciun commented on GitHub (Mar 6, 2026): > I lean toward (A) since it's simpler and matches user intent — if you're overriding the template, you don't want the built-in renderer to silently take over. @majiayu000 I would also say that `(A)` is the least surprising (for the user) and perhaps simpler to implement. However, might I suggest that if the user specifies in the model-file both `TEMPLATE` and one of `RENDERER` or `PARSER`, then to issue an error (or at least warning) that the `TEMPLATE` would be silently ignored? (This is to stop users that copy-paste the `ollama show --modelfile some-model`, which contain renderer / parser, and adding their own `TEMPLATE` without realizing that it will be ignored.)
Author
Owner

@drifkin commented on GitHub (Mar 6, 2026):

I prefer (A) as well, and agree on the warning

<!-- gh-comment-id:4014434243 --> @drifkin commented on GitHub (Mar 6, 2026): I prefer (A) as well, and agree on the warning
Author
Owner

@seamon67 commented on GitHub (Mar 9, 2026):

+1 to (A)

<!-- gh-comment-id:4024037078 --> @seamon67 commented on GitHub (Mar 9, 2026): +1 to (A)
Author
Owner

@cipriancraciun commented on GitHub (Apr 3, 2026):

Apparently, with latest release 0.20.0, using the workaround of FROM /.../blobs/sha256-... doesn't work anymore as Ollama somehow detects the model type from the GGUF, and puts the PARSER' and RENDERER` back. (Tested with Gemma4 26B-A4B.)

<!-- gh-comment-id:4183154644 --> @cipriancraciun commented on GitHub (Apr 3, 2026): Apparently, with latest release `0.20.0`, using the workaround of `FROM /.../blobs/sha256-...` doesn't work anymore as Ollama somehow detects the model type from the GGUF, and puts the `PARSER' and `RENDERER` back. (Tested with Gemma4 26B-A4B.)
Author
Owner

@cipriancraciun commented on GitHub (Apr 3, 2026):

Apparently, with latest release 0.20.0, using the workaround of FROM /.../blobs/sha256-... doesn't work anymore as Ollama somehow detects the model type from the GGUF, and puts the PARSER' and RENDERER` back. (Tested with Gemma4 26B-A4B.)

The issue seems to be in server/create.go:
96b202d34b/server/create.go (L519-L524)

Which, in case the architecture of the model is gemma4, forces the PARSER and RENDERER to "gemma4" regardless of the fact a TEMPLATE exists or not.

<!-- gh-comment-id:4183934174 --> @cipriancraciun commented on GitHub (Apr 3, 2026): > Apparently, with latest release `0.20.0`, using the workaround of `FROM /.../blobs/sha256-...` doesn't work anymore as Ollama somehow detects the model type from the GGUF, and puts the `PARSER' and `RENDERER` back. (Tested with Gemma4 26B-A4B.) The issue seems to be in `server/create.go`: https://github.com/ollama/ollama/blob/96b202d34b82d1755887bf4204e1f2e053720d4f/server/create.go#L519-L524 Which, in case the architecture of the model is `gemma4`, forces the `PARSER` and `RENDERER` to `"gemma4"` regardless of the fact a `TEMPLATE` exists or not.
Author
Owner

@seamon67 commented on GitHub (Apr 3, 2026):

So are we to assume the TEMPLATE field is slowly being deprecated in general?

<!-- gh-comment-id:4183957638 --> @seamon67 commented on GitHub (Apr 3, 2026): So are we to assume the TEMPLATE field is slowly being deprecated in general?
Author
Owner

@cipriancraciun commented on GitHub (Apr 3, 2026):

So are we to assume the TEMPLATE field is slowly being deprecated in general?

I really hope not, because this is one escape hatch for clients that don't keep up with Ollama API changes. In my particular case I rely upon this for:

  • disabling the thinking mode for models that have it enabled by default, and when the client doesn't properly convey the thinking parameter in the request; (thus, providing a custom TEMPLATE that always inserts the thinking tokens is the only solution;) (an alternative would be to have an Ollama global to control the thinking mode of all models; another alternative would be to have an Ollama model-file parameter to disable thinking mode only for this particular model;)

  • always providing a custom system prompt, even when the the tool sends its own system prompt; (some clients insist on sending their own prompts, and sometimes one wants to be sure that a certain system prompt is prepended;) (an alternative would be to have an Ollama model-file "always on" system prompt that gets prefixed to the system prompt sent via the API;)

The only other workaround for these two issues is to create a proxy that sits in front of Ollama and just mangles the requests.


(Update to include a second workaround)

Another workaround would be the following:

  • create a model file for the model in question;
  • use FROM {original-model} to select the base model;
  • include RENDERER "custom-{random}", where {random} can be an UUID, or anything unique; (special care should be taken to not reuse the same {random} for another model-file;)
  • use the desired TEMPLATE;
  • import the model as usual with ollama create --file {path-to-modelfile} {custom-model};
  • (if one now runs ollama run {custom-model}, it will fail complaining that the custom-{random} renderer is not found;)
  • look inside the Ollama store in manifests/{custom-model}/latest in the .config.digest property; (copy the long SHA256 hex);
  • look inside the Ollama store in blobs/sha256-{digest} and edit it as a text file; (that file should be a JSON;)
  • find your "custom-{random}" snippet, and replace it with "";
  • (**by changing this file you are breaking the SHA256, thus you might get unexpected results; also if this file is shared by other models, you are going to break those as well, thus the importance of uniqueness of {random});
  • save and load ollama run {custom-model};
<!-- gh-comment-id:4184001650 --> @cipriancraciun commented on GitHub (Apr 3, 2026): > So are we to assume the TEMPLATE field is slowly being deprecated in general? I really hope not, because this is one escape hatch for clients that don't keep up with Ollama API changes. In my particular case I rely upon this for: * disabling the thinking mode for models that have it enabled by default, and when the client doesn't properly convey the thinking parameter in the request; (thus, providing a custom `TEMPLATE` that always inserts the thinking tokens is the only solution;) (an alternative would be to have an Ollama global to control the thinking mode of all models; another alternative would be to have an Ollama model-file parameter to disable thinking mode only for this particular model;) * always providing a custom system prompt, even when the the tool sends its own system prompt; (some clients insist on sending their own prompts, and sometimes one wants to be sure that a certain system prompt is prepended;) (an alternative would be to have an Ollama model-file "always on" system prompt that gets prefixed to the system prompt sent via the API;) The only other workaround for these two issues is to create a proxy that sits in front of Ollama and just mangles the requests. ---- (Update to include a second workaround) Another workaround would be the following: * create a model file for the model in question; * use `FROM {original-model}` to select the base model; * include `RENDERER "custom-{random}"`, where `{random}` can be an UUID, or anything unique; (**special care should be taken to not reuse the same `{random}` for another model-file;**) * use the desired `TEMPLATE`; * import the model as usual with `ollama create --file {path-to-modelfile} {custom-model}`; * (if one now runs `ollama run {custom-model}`, it will fail complaining that the `custom-{random}` renderer is not found;) * look inside the Ollama store in `manifests/{custom-model}/latest` in the `.config.digest` property; (copy the long SHA256 hex); * look inside the Ollama store in `blobs/sha256-{digest}` and edit it as a text file; (that file should be a JSON;) * find your `"custom-{random}"` snippet, and replace it with `""`; * (**by changing this file you are breaking the SHA256, thus you might get unexpected results; also if this file is shared by other models, you are going to break those as well, thus the importance of uniqueness of `{random}`); * save and load `ollama run {custom-model}`;
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55957