qwen3-coder:30b (qwen3moe) loses tool capability by creating a new model #8408

Closed
opened 2025-11-12 14:41:08 -06:00 by GiteaMirror · 4 comments
Owner

Originally created by @dan-and on GitHub (Oct 16, 2025).

What is the issue?

Regardless if I create a simple Modelfile like:
Modelfile
FROM qwen3-coder:30b
PARAMETER num_ctx 262144

or do the same in the ollama cli, the created model is missing the tools capability. I don't have the same issue with other models (like gptoss, devstral etc.)

$ ollama show qwen3-coder:30b
Model
architecture qwen3moe
parameters 30.5B
context length 262144
embedding length 2048
quantization Q4_K_M

Capabilities
completion
tools

Parameters
stop "<|im_start|>"
stop "<|im_end|>"
stop "<|endoftext|>"
temperature 0.7
top_k 20
top_p 0.8
repeat_penalty 1.05

License
Apache License
Version 2.0, January 2004
...

$ ollama run qwen3-coder:30b

/set parameter num_ctx 262144
Set parameter 'num_ctx' to '262144'
/save qwen3-coder:30b_with_260k_context
Created new model 'qwen3-coder:30b_with_260k_context'

$ ollama show qwen3-coder:30b_with_260k_context
Model
architecture qwen3moe
parameters 30.5B
context length 262144
embedding length 2048
quantization Q4_K_M

Capabilities
completion

Parameters
num_ctx 262144
repeat_penalty 1.05
stop "<|im_start|>"
stop "<|im_end|>"
stop "<|endoftext|>"
temperature 0.7
top_k 20
top_p 0.8

License
Apache License
Version 2.0, January 2004
...

Relevant log output

$ cat create_model.log

Oct 16 14:19:18 gpu ollama[17824]: [GIN] 2025/10/16 - 14:19:18 | 200 | 50.627146765s |       127.0.0.1 | POST     "/api/generate"
Oct 16 14:19:18 gpu ollama[17824]: time=2025-10-16T14:19:18.638Z level=DEBUG source=sched.go:502 msg="context for request finished"
Oct 16 14:19:18 gpu ollama[17824]: time=2025-10-16T14:19:18.638Z level=DEBUG source=sched.go:294 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3-coder:30b runner.inference="[{ID:GPU-bd86bdbc-e249-0a40-fc1f-6284c27af2cf Library:CUDA} {ID:GPU-c863cd65-236e-6ee5-cafc-02039c50b3b5 Library:CUDA}]" runner.size="18.3 GiB" runner.vram="18.3 GiB" runner.parallel=2 runner.pid=34673 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a runner.num_ctx=4096 duration=1h0m0s
Oct 16 14:19:18 gpu ollama[17824]: time=2025-10-16T14:19:18.638Z level=DEBUG source=sched.go:312 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3-coder:30b runner.inference="[{ID:GPU-bd86bdbc-e249-0a40-fc1f-6284c27af2cf Library:CUDA} {ID:GPU-c863cd65-236e-6ee5-cafc-02039c50b3b5 Library:CUDA}]" runner.size="18.3 GiB" runner.vram="18.3 GiB" runner.parallel=2 runner.pid=34673 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a runner.num_ctx=4096 refCount=0
Oct 16 14:19:54 gpu ollama[17824]: time=2025-10-16T14:19:54.204Z level=DEBUG source=create.go:98 msg="create model from model name" from=qwen3-coder:30b
Oct 16 14:19:54 gpu ollama[17824]: time=2025-10-16T14:19:54.224Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Oct 16 14:19:54 gpu ollama[17824]: [GIN] 2025/10/16 - 14:19:54 | 200 |   28.196158ms |       127.0.0.1 | POST     "/api/create"
Oct 16 14:20:07 gpu ollama[17824]: [GIN] 2025/10/16 - 14:20:07 | 200 |      19.576µs |       127.0.0.1 | HEAD     "/"
Oct 16 14:20:08 gpu ollama[17824]: time=2025-10-16T14:20:08.009Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Oct 16 14:20:08 gpu ollama[17824]: [GIN] 2025/10/16 - 14:20:08 | 200 |    41.16944ms |       127.0.0.1 | POST     "/api/show"
Oct 16 14:25:02 gpu ollama[17824]: [GIN] 2025/10/16 - 14:25:02 | 200 |      23.694µs |       127.0.0.1 | HEAD     "/"
Oct 16 14:25:02 gpu ollama[17824]: time=2025-10-16T14:25:02.830Z level=DEBUG source=create.go:98 msg="create model from model name" from=qwen3-coder:30b
Oct 16 14:25:02 gpu ollama[17824]: time=2025-10-16T14:25:02.848Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Oct 16 14:25:02 gpu ollama[17824]: [GIN] 2025/10/16 - 14:25:02 | 200 |    27.19669ms |       127.0.0.1 | POST     "/api/create"
Oct 16 14:25:12 gpu ollama[17824]: [GIN] 2025/10/16 - 14:25:12 | 200 |      20.148µs |       127.0.0.1 | HEAD     "/"
Oct 16 14:25:12 gpu ollama[17824]: time=2025-10-16T14:25:12.134Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Oct 16 14:25:12 gpu ollama[17824]: [GIN] 2025/10/16 - 14:25:12 | 200 |   39.086398ms |       127.0.0.1 | POST     "/api/show"

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.12.5

Originally created by @dan-and on GitHub (Oct 16, 2025). ### What is the issue? Regardless if I create a simple Modelfile like: Modelfile FROM qwen3-coder:30b PARAMETER num_ctx 262144 or do the same in the ollama cli, the created model is missing the tools capability. I don't have the same issue with other models (like gptoss, devstral etc.) $ ollama show qwen3-coder:30b Model architecture qwen3moe parameters 30.5B context length 262144 embedding length 2048 quantization Q4_K_M **Capabilities completion tools** Parameters stop "<|im_start|>" stop "<|im_end|>" stop "<|endoftext|>" temperature 0.7 top_k 20 top_p 0.8 repeat_penalty 1.05 License Apache License Version 2.0, January 2004 ... $ ollama run qwen3-coder:30b >>> /set parameter num_ctx 262144 Set parameter 'num_ctx' to '262144' >>> /save qwen3-coder:30b_with_260k_context Created new model 'qwen3-coder:30b_with_260k_context' >>> $ ollama show qwen3-coder:30b_with_260k_context Model architecture qwen3moe parameters 30.5B context length 262144 embedding length 2048 quantization Q4_K_M **Capabilities completion** Parameters num_ctx 262144 repeat_penalty 1.05 stop "<|im_start|>" stop "<|im_end|>" stop "<|endoftext|>" temperature 0.7 top_k 20 top_p 0.8 License Apache License Version 2.0, January 2004 ... ### Relevant log output ```shell $ cat create_model.log Oct 16 14:19:18 gpu ollama[17824]: [GIN] 2025/10/16 - 14:19:18 | 200 | 50.627146765s | 127.0.0.1 | POST "/api/generate" Oct 16 14:19:18 gpu ollama[17824]: time=2025-10-16T14:19:18.638Z level=DEBUG source=sched.go:502 msg="context for request finished" Oct 16 14:19:18 gpu ollama[17824]: time=2025-10-16T14:19:18.638Z level=DEBUG source=sched.go:294 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3-coder:30b runner.inference="[{ID:GPU-bd86bdbc-e249-0a40-fc1f-6284c27af2cf Library:CUDA} {ID:GPU-c863cd65-236e-6ee5-cafc-02039c50b3b5 Library:CUDA}]" runner.size="18.3 GiB" runner.vram="18.3 GiB" runner.parallel=2 runner.pid=34673 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a runner.num_ctx=4096 duration=1h0m0s Oct 16 14:19:18 gpu ollama[17824]: time=2025-10-16T14:19:18.638Z level=DEBUG source=sched.go:312 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3-coder:30b runner.inference="[{ID:GPU-bd86bdbc-e249-0a40-fc1f-6284c27af2cf Library:CUDA} {ID:GPU-c863cd65-236e-6ee5-cafc-02039c50b3b5 Library:CUDA}]" runner.size="18.3 GiB" runner.vram="18.3 GiB" runner.parallel=2 runner.pid=34673 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a runner.num_ctx=4096 refCount=0 Oct 16 14:19:54 gpu ollama[17824]: time=2025-10-16T14:19:54.204Z level=DEBUG source=create.go:98 msg="create model from model name" from=qwen3-coder:30b Oct 16 14:19:54 gpu ollama[17824]: time=2025-10-16T14:19:54.224Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Oct 16 14:19:54 gpu ollama[17824]: [GIN] 2025/10/16 - 14:19:54 | 200 | 28.196158ms | 127.0.0.1 | POST "/api/create" Oct 16 14:20:07 gpu ollama[17824]: [GIN] 2025/10/16 - 14:20:07 | 200 | 19.576µs | 127.0.0.1 | HEAD "/" Oct 16 14:20:08 gpu ollama[17824]: time=2025-10-16T14:20:08.009Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Oct 16 14:20:08 gpu ollama[17824]: [GIN] 2025/10/16 - 14:20:08 | 200 | 41.16944ms | 127.0.0.1 | POST "/api/show" Oct 16 14:25:02 gpu ollama[17824]: [GIN] 2025/10/16 - 14:25:02 | 200 | 23.694µs | 127.0.0.1 | HEAD "/" Oct 16 14:25:02 gpu ollama[17824]: time=2025-10-16T14:25:02.830Z level=DEBUG source=create.go:98 msg="create model from model name" from=qwen3-coder:30b Oct 16 14:25:02 gpu ollama[17824]: time=2025-10-16T14:25:02.848Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Oct 16 14:25:02 gpu ollama[17824]: [GIN] 2025/10/16 - 14:25:02 | 200 | 27.19669ms | 127.0.0.1 | POST "/api/create" Oct 16 14:25:12 gpu ollama[17824]: [GIN] 2025/10/16 - 14:25:12 | 200 | 20.148µs | 127.0.0.1 | HEAD "/" Oct 16 14:25:12 gpu ollama[17824]: time=2025-10-16T14:25:12.134Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Oct 16 14:25:12 gpu ollama[17824]: [GIN] 2025/10/16 - 14:25:12 | 200 | 39.086398ms | 127.0.0.1 | POST "/api/show" ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.12.5
GiteaMirror added the bug label 2025-11-12 14:41:08 -06:00
Author
Owner

@rick-github commented on GitHub (Oct 16, 2025):

FROM qwen3-coder:30b
PARAMETER num_ctx 262144
PARSER qwen3-coder
RENDERER qwen3-coder
@rick-github commented on GitHub (Oct 16, 2025): ``` FROM qwen3-coder:30b PARAMETER num_ctx 262144 PARSER qwen3-coder RENDERER qwen3-coder ```
Author
Owner

@dan-and commented on GitHub (Oct 16, 2025):

Thanks @rick-github
Is that a qwen3-coder issue only?

I thought it was enough to modify one parameter and be able to store that change into a new model.

@dan-and commented on GitHub (Oct 16, 2025): Thanks @rick-github Is that a qwen3-coder issue only? I thought it was enough to modify one parameter and be able to store that change into a new model.
Author
Owner

@rick-github commented on GitHub (Oct 16, 2025):

qwen3-coder has a different tool format that is not easily captured in the usual template, so a custom renderer/parser was written to make it a better tool user. As a result the modelfile currently needs to explicitly call out the renderer/parser requirements. I believe this will be automatic in the future, but for now these items need to be added.

@rick-github commented on GitHub (Oct 16, 2025): qwen3-coder has a different tool format that is not easily captured in the usual template, so a custom renderer/parser was written to make it a better tool user. As a result the modelfile currently needs to explicitly call out the renderer/parser requirements. I believe this will be automatic in the future, but for now these items need to be added.
Author
Owner

@dan-and commented on GitHub (Oct 16, 2025):

Thanks

@dan-and commented on GitHub (Oct 16, 2025): Thanks
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#8408