[GH-ISSUE #10583] close qwen3 reasoning #69022

Closed
opened 2026-05-04 16:51:55 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @niubiqianshui on GitHub (May 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10583

How to turn off the reasoning mode of Qwen3-235B-A22B when I use Docker to run it

Originally created by @niubiqianshui on GitHub (May 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10583 How to turn off the reasoning mode of Qwen3-235B-A22B when I use Docker to run it
GiteaMirror added the feature request label 2026-05-04 16:51:55 -05:00
Author
Owner

@garychanchan commented on GitHub (May 6, 2025):

+1 目前有参数可以关闭深度思考嘛?

<!-- gh-comment-id:2853481162 --> @garychanchan commented on GitHub (May 6, 2025): +1 目前有参数可以关闭深度思考嘛?
Author
Owner

@wawameizhangya commented on GitHub (May 6, 2025):

m

<!-- gh-comment-id:2853509548 --> @wawameizhangya commented on GitHub (May 6, 2025): m
Author
Owner

@niubiqianshui commented on GitHub (May 6, 2025):

curl http://127.0.0.1:11456/api/generate -d '{ "model": "qwen3:235b-a22b", "prompt": "Why is the sky blue? /no_think","stream":false }'

<!-- gh-comment-id:2853571367 --> @niubiqianshui commented on GitHub (May 6, 2025): curl http://127.0.0.1:11456/api/generate -d '{ "model": "qwen3:235b-a22b", "prompt": "Why is the sky blue? /no_think","stream":false }'
Author
Owner

@sunisstar commented on GitHub (May 6, 2025):

curl http://127.0.0.1:11456/api/generate -d '{ "model": "qwen3:235b-a22b", "prompt": "Why is the sky blue? /no_think","stream":false }'

but a tag is still remained, a parameter will prevent from this.

<!-- gh-comment-id:2853832985 --> @sunisstar commented on GitHub (May 6, 2025): > curl http://127.0.0.1:11456/api/generate -d '{ "model": "qwen3:235b-a22b", "prompt": "Why is the sky blue? /no_think","stream":false }' but a <think> </think> tag is still remained, a parameter will prevent from this.
Author
Owner

@niubiqianshui commented on GitHub (May 6, 2025):

curl http://127.0.0.1:11456/api/generate -d '{ "model": "qwen3:235b-a22b", "prompt": "Why is the sky blue? /no_think","stream":false }'

but a tag is still remained, a parameter will prevent from this.

what parameter?

<!-- gh-comment-id:2853837922 --> @niubiqianshui commented on GitHub (May 6, 2025): > > curl http://127.0.0.1:11456/api/generate -d '{ "model": "qwen3:235b-a22b", "prompt": "Why is the sky blue? /no_think","stream":false }' > > but a tag is still remained, a parameter will prevent from this. what parameter?
Author
Owner

@y-tor commented on GitHub (May 6, 2025):

@ehartford shared this model file:

https://gist.github.com/ehartford/a465d24116b7df07170efe69116084a3

This is a clever way of forcing the model to not generate the think tag in the output.

<!-- gh-comment-id:2854059208 --> @y-tor commented on GitHub (May 6, 2025): @ehartford shared this model file: https://gist.github.com/ehartford/a465d24116b7df07170efe69116084a3 This is a clever way of forcing the model to not generate the think tag in the output.
Author
Owner

@xlorne commented on GitHub (May 6, 2025):

I am using OpenAI’s standard protocol and have developed the implementation based on SpringBoot-AI. By dynamically appending /no_think in the prompt, I can enable or disable the thinking mode. As for the returned tag data, it can be filtered using an advisor.

package com.codingapi.agent.advisor;

import lombok.NonNull;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.advisor.api.*;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.model.Generation;
import reactor.core.publisher.Flux;

import java.util.ArrayList;
import java.util.List;

@Slf4j
public class Qwen3ThinkFilterAdvisor implements CallAroundAdvisor, StreamAroundAdvisor {

    private final boolean thinkEnabled;

    private boolean qwen3Model = false;

    public Qwen3ThinkFilterAdvisor(boolean thinkEnabled) {
        this.thinkEnabled = thinkEnabled;
    }

    private void checkModelName(AdvisedRequest advisedRequest) {
        String modelName = advisedRequest.chatModel().getDefaultOptions().getModel();
        this.qwen3Model = modelName != null && modelName.contains("qwen3");
    }

    private List<Generation> filterGenerations(List<Generation> generations) {
        if (thinkEnabled) {
            return generations;
        }
        List<Generation> generationList = new ArrayList<>();
        for (Generation generation : generations) {
            AssistantMessage assistantMessage = generation.getOutput();
            String text = assistantMessage.getText();
            if (text.contains("<think>")) {
                text = text.replaceAll("(?s)<think>.*?</think>", "");
                text = text.replaceAll("(?m)^[ \\t]*\\r?\\n", "");
            }
            AssistantMessage responseMessage = new AssistantMessage(text,
                    assistantMessage.getMetadata(),
                    assistantMessage.getToolCalls(),
                    assistantMessage.getMedia());
            generationList.add(new Generation(responseMessage));
        }
        return generationList;
    }


    private AdvisedRequest filterRequest(AdvisedRequest advisedRequest) {
        if (thinkEnabled) {
            return advisedRequest;
        }
        String question = advisedRequest.userText();
        question = question + "/no_think";
        return AdvisedRequest.from(advisedRequest)
                .userText(question)
                .build();
    }

    @Override
    @NonNull
    public AdvisedResponse aroundCall(@NonNull AdvisedRequest advisedRequest, CallAroundAdvisorChain chain) {
        this.checkModelName(advisedRequest);
        if (qwen3Model) {
            AdvisedRequest request = this.filterRequest(advisedRequest);
            AdvisedResponse response = chain.nextAroundCall(request);
            assert response.response() != null;
            List<Generation> generations = this.filterGenerations(response.response().getResults());
            return AdvisedResponse.builder()
                    .response(ChatResponse.builder()
                            .generations(generations)
                            .build())
                    .adviseContext(advisedRequest.adviseContext())
                    .build();
        }else {
            return chain.nextAroundCall(advisedRequest);
        }
    }

    @Override
    @NonNull
    public Flux<AdvisedResponse> aroundStream(@NonNull AdvisedRequest advisedRequest, StreamAroundAdvisorChain chain) {
        this.checkModelName(advisedRequest);
        if (qwen3Model) {
            AdvisedRequest request = this.filterRequest(advisedRequest);
            return chain.nextAroundStream(request)
                    .map(advisedResponse -> {
                        assert advisedResponse.response() != null;
                        List<Generation> generations = this.filterGenerations(advisedResponse.response().getResults());
                        return AdvisedResponse.builder()
                                .response(ChatResponse.builder()
                                        .generations(generations)
                                        .build())
                                .adviseContext(advisedRequest.adviseContext())
                                .build();
                    });
        }else {
            return chain.nextAroundStream(advisedRequest);
        }
    }

    @NonNull
    @Override
    public String getName() {
        return "qwen3ThinkFilter";
    }

    @Override
    public int getOrder() {
        return 0;
    }
}

https://github.com/xlorne/springboot-agent

<!-- gh-comment-id:2854067602 --> @xlorne commented on GitHub (May 6, 2025): I am using OpenAI’s standard protocol and have developed the implementation based on SpringBoot-AI. By dynamically appending /no_think in the prompt, I can enable or disable the thinking mode. As for the returned <think></think> tag data, it can be filtered using an advisor. ``` package com.codingapi.agent.advisor; import lombok.NonNull; import lombok.extern.slf4j.Slf4j; import org.springframework.ai.chat.client.advisor.api.*; import org.springframework.ai.chat.messages.AssistantMessage; import org.springframework.ai.chat.model.ChatResponse; import org.springframework.ai.chat.model.Generation; import reactor.core.publisher.Flux; import java.util.ArrayList; import java.util.List; @Slf4j public class Qwen3ThinkFilterAdvisor implements CallAroundAdvisor, StreamAroundAdvisor { private final boolean thinkEnabled; private boolean qwen3Model = false; public Qwen3ThinkFilterAdvisor(boolean thinkEnabled) { this.thinkEnabled = thinkEnabled; } private void checkModelName(AdvisedRequest advisedRequest) { String modelName = advisedRequest.chatModel().getDefaultOptions().getModel(); this.qwen3Model = modelName != null && modelName.contains("qwen3"); } private List<Generation> filterGenerations(List<Generation> generations) { if (thinkEnabled) { return generations; } List<Generation> generationList = new ArrayList<>(); for (Generation generation : generations) { AssistantMessage assistantMessage = generation.getOutput(); String text = assistantMessage.getText(); if (text.contains("<think>")) { text = text.replaceAll("(?s)<think>.*?</think>", ""); text = text.replaceAll("(?m)^[ \\t]*\\r?\\n", ""); } AssistantMessage responseMessage = new AssistantMessage(text, assistantMessage.getMetadata(), assistantMessage.getToolCalls(), assistantMessage.getMedia()); generationList.add(new Generation(responseMessage)); } return generationList; } private AdvisedRequest filterRequest(AdvisedRequest advisedRequest) { if (thinkEnabled) { return advisedRequest; } String question = advisedRequest.userText(); question = question + "/no_think"; return AdvisedRequest.from(advisedRequest) .userText(question) .build(); } @Override @NonNull public AdvisedResponse aroundCall(@NonNull AdvisedRequest advisedRequest, CallAroundAdvisorChain chain) { this.checkModelName(advisedRequest); if (qwen3Model) { AdvisedRequest request = this.filterRequest(advisedRequest); AdvisedResponse response = chain.nextAroundCall(request); assert response.response() != null; List<Generation> generations = this.filterGenerations(response.response().getResults()); return AdvisedResponse.builder() .response(ChatResponse.builder() .generations(generations) .build()) .adviseContext(advisedRequest.adviseContext()) .build(); }else { return chain.nextAroundCall(advisedRequest); } } @Override @NonNull public Flux<AdvisedResponse> aroundStream(@NonNull AdvisedRequest advisedRequest, StreamAroundAdvisorChain chain) { this.checkModelName(advisedRequest); if (qwen3Model) { AdvisedRequest request = this.filterRequest(advisedRequest); return chain.nextAroundStream(request) .map(advisedResponse -> { assert advisedResponse.response() != null; List<Generation> generations = this.filterGenerations(advisedResponse.response().getResults()); return AdvisedResponse.builder() .response(ChatResponse.builder() .generations(generations) .build()) .adviseContext(advisedRequest.adviseContext()) .build(); }); }else { return chain.nextAroundStream(advisedRequest); } } @NonNull @Override public String getName() { return "qwen3ThinkFilter"; } @Override public int getOrder() { return 0; } } ``` https://github.com/xlorne/springboot-agent
Author
Owner

@cycloarcane commented on GitHub (May 6, 2025):

/think and /no_think can be used as a toggle in either any message as others have said, or in a system prompt in whatever framework you are using.

You can toggle as many times as you want in the same conversation. The model will go by the latest toggle used.

<!-- gh-comment-id:2854752810 --> @cycloarcane commented on GitHub (May 6, 2025): /think and /no_think can be used as a toggle in either any message as others have said, or in a system prompt in whatever framework you are using. You can toggle as many times as you want in the same conversation. The model will go by the latest toggle used.
Author
Owner

@niubiqianshui commented on GitHub (May 7, 2025):

/

<!-- gh-comment-id:2856695301 --> @niubiqianshui commented on GitHub (May 7, 2025): /
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69022