[GH-ISSUE #10496] Can the empty <think> tags be removed in non-thinking mode for the Qwen3 series models? #6905

Closed
opened 2026-04-12 18:47:29 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @somnifex on GitHub (Apr 30, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10496

In the description of Qwen3, due to compatibility issues, even though the "think" mode is not used, the model's output still includes an empty tag. However, for Ollama, as a downstream application of the base model, removing this part of the tag might be more user-friendly for end users

Originally created by @somnifex on GitHub (Apr 30, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10496 In the description of Qwen3, due to compatibility issues, even though the "think" mode is not used, the model's output still includes an empty tag. However, for Ollama, as a downstream application of the base model, removing this part of the tag might be more user-friendly for end users
GiteaMirror added the feature request label 2026-04-12 18:47:29 -05:00
Author
Owner

@ChenDianWzh commented on GitHub (Apr 30, 2025):

from langchain_ollama import ChatOllama
llm = ChatOllama(
model="qwen3:14b", temperature=0.6, base_url="",
)
print(llm.invoke("Hello, world!").content)

I'd like to know if there are any parameters that can be used to implement control thinking

<!-- gh-comment-id:2841048183 --> @ChenDianWzh commented on GitHub (Apr 30, 2025): from langchain_ollama import ChatOllama llm = ChatOllama( model="qwen3:14b", temperature=0.6, base_url="", ) print(llm.invoke("Hello, world!").content) I'd like to know if there are any parameters that can be used to implement control thinking
Author
Owner

@smileyboy2019 commented on GitHub (Apr 30, 2025):

自定义参数可以吗

<!-- gh-comment-id:2841403533 --> @smileyboy2019 commented on GitHub (Apr 30, 2025): 自定义参数可以吗
Author
Owner

@regmibijay commented on GitHub (May 1, 2025):

You can use /no_think either infront of system prompt or user query to bypass thinking meanwhile

<!-- gh-comment-id:2844670836 --> @regmibijay commented on GitHub (May 1, 2025): You can use `/no_think` either infront of system prompt or user query to bypass thinking meanwhile
Author
Owner

@lasseedfast commented on GitHub (May 1, 2025):

#10482

<!-- gh-comment-id:2844691892 --> @lasseedfast commented on GitHub (May 1, 2025): #10482
Author
Owner

@somnifex commented on GitHub (May 1, 2025):

You can use /no_think either infront of system prompt or user query to bypass thinking meanwhile

This is an effective method for eliminating the thought process, but the consequences of doing so are the core issue: even without utilizing a thinking mode, an empty <think> tag remains in the output. This tag exists within the model for compatibility purposes, but its presence can lead to unpredictable results in downstream applications – for example, inappropriate tags in a workflow could cause contextual anomalies, and certain text processing tasks may not yield directly usable answers. As an application layer built on foundational models, and potentially an upstream layer for user-facing applications, proper handling of these empty tags can simplify matters considerably.

<!-- gh-comment-id:2844697509 --> @somnifex commented on GitHub (May 1, 2025): > You can use `/no_think` either infront of system prompt or user query to bypass thinking meanwhile This is an effective method for eliminating the thought process, but the consequences of doing so are the core issue: even without utilizing a thinking mode, an empty `<think>` tag remains in the output. This tag exists within the model for compatibility purposes, but its presence can lead to unpredictable results in downstream applications – for example, inappropriate tags in a workflow could cause contextual anomalies, and certain text processing tasks may not yield directly usable answers. As an application layer built on foundational models, and potentially an upstream layer for user-facing applications, proper handling of these empty tags can simplify matters considerably.
Author
Owner

@190679163 commented on GitHub (May 14, 2025):


import openai
import re

client = openai.OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)

def chat_with_ollama(query: str):
    response = client.chat.completions.create(
        model="qwen3:32b",
        messages=[{"role": "user", "content": query}],
        stream=True,
        temperature=0,
        seed=42
    )

    buffer = ""
    in_think = False  # Flag to check if inside <think> and </no_think>
    print("AI:", end=" ", flush=True)
    for chunk in response:
        if hasattr(chunk, "choices") and chunk.choices:
            delta = chunk.choices[0].delta
            if hasattr(delta, "content") and delta.content:
                content = delta.content

                # Check if entering <think> block
                if not in_think:
                    idx = content.find("<think>")
                    if idx != -1:
                        # Print content before <think>
                        print(content[:idx], end="", flush=True)
                        in_think = True
                        # Discard content after <think>
                        buffer = ""
                        continue
                    else:
                        print(content, end="", flush=True)
                
                if in_think:
                    # Check if </no_think> is encountered
                    end_idx = content.find("</think>")
                    if end_idx != -1:
                        # Exit <think> block
                        in_think = False
                        # Skip </no_think> and the content after it
                        after = content[end_idx + len("</think>"):]
                        # Remove leading \n\n
                        after = re.sub(r'^\n\n', '', after)
                        print(after, end="", flush=True)
                        continue
                    # Otherwise discard content inside <think>
                    continue
    print()  # Newline after the response

if __name__ == "__main__":
    chat_with_ollama("how to use ollama")

import subprocess
import json
import re

def chat_with_ollama(query: str):
    # 使用 curl 命令发起请求
    curl_command = [
        "curl",
        "-X", "POST",
        "http://localhost:11434/v1/chat/completions",
        "-H", "Content-Type: application/json",
        "-H", "Authorization: Bearer ollama",
        "-d", json.dumps({
            "model": "qwen3:32b",
            "messages": [{"role": "user", "content": query}],
            "stream": True,
            "temperature": 0,
            "seed": 42
        })
    ]

    # 使用 subprocess 执行 curl 命令
    process = subprocess.Popen(curl_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    buffer = ""
    in_think = False  # Flag to check if inside <think> and </no_think>
    print("AI:", end=" ", flush=True)

    for line in process.stdout:
        # Decode the line from the curl output
        decoded_line = line.decode("utf-8").strip()

        # Skip empty lines or irrelevant lines
        if not decoded_line or not decoded_line.startswith("data:"):
            continue

        # Remove the "data:" prefix
        decoded_line = decoded_line[5:].strip()

        # Parse the JSON payload
        try:
            chunk = json.loads(decoded_line)
        except json.JSONDecodeError:
            continue

        # Process the chunk content
        if "choices" in chunk and chunk["choices"]:
            delta = chunk["choices"][0].get("delta", {})
            if "content" in delta:
                content = delta["content"]

                # Check if entering <think> block
                if not in_think:
                    idx = content.find("<think>")
                    if idx != -1:
                        # Print content before <think>
                        print(content[:idx], end="", flush=True)
                        in_think = True
                        # Discard content after <think>
                        buffer = ""
                        continue
                    else:
                        print(content, end="", flush=True)

                if in_think:
                    # Check if </no_think> is encountered
                    end_idx = content.find("</think>")
                    if end_idx != -1:
                        # Exit <think> block
                        in_think = False
                        # Skip </no_think> and the content after it
                        after = content[end_idx + len("</think>"):]
                        # Remove leading \n\n
                        after = re.sub(r'^\n\n', '', after)
                        print(after, end="", flush=True)
                        continue
                    # Otherwise discard content inside <think>
                    continue

    print()  # Newline after the response

    # Wait for the process to complete and handle errors
    process.wait()
    if process.returncode != 0:
        stderr = process.stderr.read().decode("utf-8").strip()
        print(f"Error: {stderr}")

if __name__ == "__main__":
    chat_with_ollama("how to use ollama \\no_think")

This is my solution, See if this would help you

<!-- gh-comment-id:2879372596 --> @190679163 commented on GitHub (May 14, 2025): ``` import openai import re client = openai.OpenAI( base_url="http://localhost:11434/v1", api_key="ollama", ) def chat_with_ollama(query: str): response = client.chat.completions.create( model="qwen3:32b", messages=[{"role": "user", "content": query}], stream=True, temperature=0, seed=42 ) buffer = "" in_think = False # Flag to check if inside <think> and </no_think> print("AI:", end=" ", flush=True) for chunk in response: if hasattr(chunk, "choices") and chunk.choices: delta = chunk.choices[0].delta if hasattr(delta, "content") and delta.content: content = delta.content # Check if entering <think> block if not in_think: idx = content.find("<think>") if idx != -1: # Print content before <think> print(content[:idx], end="", flush=True) in_think = True # Discard content after <think> buffer = "" continue else: print(content, end="", flush=True) if in_think: # Check if </no_think> is encountered end_idx = content.find("</think>") if end_idx != -1: # Exit <think> block in_think = False # Skip </no_think> and the content after it after = content[end_idx + len("</think>"):] # Remove leading \n\n after = re.sub(r'^\n\n', '', after) print(after, end="", flush=True) continue # Otherwise discard content inside <think> continue print() # Newline after the response if __name__ == "__main__": chat_with_ollama("how to use ollama") ``` ``` import subprocess import json import re def chat_with_ollama(query: str): # 使用 curl 命令发起请求 curl_command = [ "curl", "-X", "POST", "http://localhost:11434/v1/chat/completions", "-H", "Content-Type: application/json", "-H", "Authorization: Bearer ollama", "-d", json.dumps({ "model": "qwen3:32b", "messages": [{"role": "user", "content": query}], "stream": True, "temperature": 0, "seed": 42 }) ] # 使用 subprocess 执行 curl 命令 process = subprocess.Popen(curl_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) buffer = "" in_think = False # Flag to check if inside <think> and </no_think> print("AI:", end=" ", flush=True) for line in process.stdout: # Decode the line from the curl output decoded_line = line.decode("utf-8").strip() # Skip empty lines or irrelevant lines if not decoded_line or not decoded_line.startswith("data:"): continue # Remove the "data:" prefix decoded_line = decoded_line[5:].strip() # Parse the JSON payload try: chunk = json.loads(decoded_line) except json.JSONDecodeError: continue # Process the chunk content if "choices" in chunk and chunk["choices"]: delta = chunk["choices"][0].get("delta", {}) if "content" in delta: content = delta["content"] # Check if entering <think> block if not in_think: idx = content.find("<think>") if idx != -1: # Print content before <think> print(content[:idx], end="", flush=True) in_think = True # Discard content after <think> buffer = "" continue else: print(content, end="", flush=True) if in_think: # Check if </no_think> is encountered end_idx = content.find("</think>") if end_idx != -1: # Exit <think> block in_think = False # Skip </no_think> and the content after it after = content[end_idx + len("</think>"):] # Remove leading \n\n after = re.sub(r'^\n\n', '', after) print(after, end="", flush=True) continue # Otherwise discard content inside <think> continue print() # Newline after the response # Wait for the process to complete and handle errors process.wait() if process.returncode != 0: stderr = process.stderr.read().decode("utf-8").strip() print(f"Error: {stderr}") if __name__ == "__main__": chat_with_ollama("how to use ollama \\no_think") ``` This is my solution, See if this would help you
Author
Owner

@lasseedfast commented on GitHub (May 14, 2025):

I’m using a similar buffer construction, but I hope this can be made a lot more simple in the API in some coming update!

<!-- gh-comment-id:2880013882 --> @lasseedfast commented on GitHub (May 14, 2025): I’m using a similar buffer construction, but I hope this can be made a lot more simple in the API in some coming update!
Author
Owner

@rick-github commented on GitHub (May 30, 2025):

https://github.com/ollama/ollama/releases/tag/v0.9.0

<!-- gh-comment-id:2921921351 --> @rick-github commented on GitHub (May 30, 2025): https://github.com/ollama/ollama/releases/tag/v0.9.0
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6905