[GH-ISSUE #11010] think=false still shows <think> tags in the output #53772

Open
opened 2026-04-29 04:44:08 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @raffaelemancuso on GitHub (Jun 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11010

Originally assigned to: @drifkin on GitHub.

What is the issue?

ollama_model = "deepseek-r1"
resp = chat(
            model=ollama_model,
            messages=[
                {"role": "system", "content": "You are a research assistant."},
                {"role": "user", "content": 
                """
                    You are given a job description. 
                    Your task is to determine if the job requires the use of data science or not.
                    If it does, respond with "yes". If it doesn't, respond "no".
                    Then provide an explanation for your answer.
                    Here is the job description:
                    Evaluate the quality of data science education in France's high schools.
                    """
                },
            ],
            options={"temperature": 0},  # Make responses more deterministic
            think=False,
        )
print(resp.message.content)

Output:

The task involves evaluating the quality of data science education in French high schools. This requires analyzing existing educational programs, curricula, and possibly comparing them with standards or benchmarks. To do this effectively, one would need to gather and process data on various aspects such as course content, student outcomes, teacher qualifications, infrastructure, etc.

Data science inherently involves working with data—collecting it, cleaning it, analyzing it, interpreting results, and often visualizing findings. Evaluating education programs falls under the purview of educational research or assessment, which typically uses quantitative methods (like surveys, performance metrics) and qualitative analysis (like interviews, case studies). 

In this context, "data science" could refer to applying data-driven methodologies to assess educational quality. However, the job description does not explicitly mention using advanced statistical models, machine learning algorithms, or computational tools that are core to data science as a field. Instead, it seems more focused on traditional evaluation methods.

Therefore, while there might be some overlap with data analysis skills, this task is primarily about assessment and review rather than applying full-fledged data science techniques. The job does not require building predictive models or using big data tools but focuses on evaluating the current state of education programs.

Answer: no

Explanation: Although the evaluation process may involve collecting and analyzing data (which could be considered basic data handling), it does not explicitly call for advanced data science methods such as machine learning, statistical modeling at scale, or complex computational analysis. The task is more about educational assessment than applying core data science principles.
</think>
The job description requires evaluating the quality of data science education in French high schools.

This involves analyzing existing educational programs and curricula related to data science. To do this effectively, one would need to gather and process quantitative and qualitative data on various aspects such as course content, student outcomes, teacher qualifications, infrastructure, etc., which falls under the domain of data science.

Therefore, I determine that **yes**, this job requires the use of data science or at least involves tasks typically associated with it.

The thinking ending with the </think> tag is still there.

Looks like only the opening <think> tag was removed.

Ollama version

0.9.0

Originally created by @raffaelemancuso on GitHub (Jun 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11010 Originally assigned to: @drifkin on GitHub. ### What is the issue? ``` ollama_model = "deepseek-r1" resp = chat( model=ollama_model, messages=[ {"role": "system", "content": "You are a research assistant."}, {"role": "user", "content": """ You are given a job description. Your task is to determine if the job requires the use of data science or not. If it does, respond with "yes". If it doesn't, respond "no". Then provide an explanation for your answer. Here is the job description: Evaluate the quality of data science education in France's high schools. """ }, ], options={"temperature": 0}, # Make responses more deterministic think=False, ) print(resp.message.content) ``` Output: ``` The task involves evaluating the quality of data science education in French high schools. This requires analyzing existing educational programs, curricula, and possibly comparing them with standards or benchmarks. To do this effectively, one would need to gather and process data on various aspects such as course content, student outcomes, teacher qualifications, infrastructure, etc. Data science inherently involves working with data—collecting it, cleaning it, analyzing it, interpreting results, and often visualizing findings. Evaluating education programs falls under the purview of educational research or assessment, which typically uses quantitative methods (like surveys, performance metrics) and qualitative analysis (like interviews, case studies). In this context, "data science" could refer to applying data-driven methodologies to assess educational quality. However, the job description does not explicitly mention using advanced statistical models, machine learning algorithms, or computational tools that are core to data science as a field. Instead, it seems more focused on traditional evaluation methods. Therefore, while there might be some overlap with data analysis skills, this task is primarily about assessment and review rather than applying full-fledged data science techniques. The job does not require building predictive models or using big data tools but focuses on evaluating the current state of education programs. Answer: no Explanation: Although the evaluation process may involve collecting and analyzing data (which could be considered basic data handling), it does not explicitly call for advanced data science methods such as machine learning, statistical modeling at scale, or complex computational analysis. The task is more about educational assessment than applying core data science principles. </think> The job description requires evaluating the quality of data science education in French high schools. This involves analyzing existing educational programs and curricula related to data science. To do this effectively, one would need to gather and process quantitative and qualitative data on various aspects such as course content, student outcomes, teacher qualifications, infrastructure, etc., which falls under the domain of data science. Therefore, I determine that **yes**, this job requires the use of data science or at least involves tasks typically associated with it. ``` The thinking ending with the `</think>` tag is still there. Looks like only the opening `<think>` tag was removed. ### Ollama version 0.9.0
GiteaMirror added the thinkingbug labels 2026-04-29 04:44:08 -05:00
Author
Owner

@rick-github commented on GitHub (Jun 7, 2025):

deepseek-r1 doesn't actually have a no-think mode, so the template tries to fake one for the model by inserting a blank think block at the end of the prompt. It looks like the model recognizes the opening <think>, since it doesn't output a new one, but not the closing </think>, since it writes one out.

Padding the think block with some text helps it recognize the trailing </think>. However, the instruction following is poor: it doesn't respond 'yes' or 'no' with explanation as per the instructions. If thinking is enabled, then it does follow the instructions (or performed adequately the few times I ran the prompt).

So I think for deepseek-r1, the only really valid options for think are unset or true.

<!-- gh-comment-id:2953090616 --> @rick-github commented on GitHub (Jun 7, 2025): deepseek-r1 doesn't actually have a no-think mode, so the template tries to fake one for the model by inserting a blank think block at the end of the prompt. It looks like the model recognizes the opening `<think>`, since it doesn't output a new one, but not the closing `</think>`, since it writes one out. Padding the think block with some text helps it recognize the trailing `</think>`. However, the instruction following is poor: it doesn't respond 'yes' or 'no' with explanation as per the instructions. If thinking is enabled, then it does follow the instructions (or performed adequately the few times I ran the prompt). So I think for deepseek-r1, the only really valid options for `think` are unset or `true`.
Author
Owner

@zhangiser commented on GitHub (Jun 10, 2025):

same problem

<!-- gh-comment-id:2957483629 --> @zhangiser commented on GitHub (Jun 10, 2025): same problem
Author
Owner

@rick-github commented on GitHub (Jun 10, 2025):

Same response.

<!-- gh-comment-id:2957485490 --> @rick-github commented on GitHub (Jun 10, 2025): Same response.
Author
Owner

@raffaelemancuso commented on GitHub (Jun 10, 2025):

So ollama si concatenating <think></think> after my prompt?

<!-- gh-comment-id:2957891202 --> @raffaelemancuso commented on GitHub (Jun 10, 2025): So ollama si concatenating `<think></think>` after my prompt?
Author
Owner

@rick-github commented on GitHub (Jun 10, 2025):

So ollama si concatenating <think></think> after my prompt?

Yes.

<!-- gh-comment-id:2958028177 --> @rick-github commented on GitHub (Jun 10, 2025): > So ollama si concatenating `<think></think>` after my prompt? Yes.
Author
Owner

@raffaelemancuso commented on GitHub (Jun 10, 2025):

  1. And we don't know why sometimes it works and sometimes it doesn't?
  2. I think that DeepSeek-R1 would perform substantially worse if we do that. Maybe its a thinking-only mode. Is there some kind of performance measure of DS-R1 working under that circumstances?
<!-- gh-comment-id:2958561354 --> @raffaelemancuso commented on GitHub (Jun 10, 2025): 1. And we don't know why sometimes it works and sometimes it doesn't? 2. I think that DeepSeek-R1 would perform substantially worse if we do that. Maybe its a thinking-only mode. Is there some kind of performance measure of DS-R1 working under that circumstances?
Author
Owner

@rick-github commented on GitHub (Jun 10, 2025):

  1. And we don't know why sometimes it works and sometimes it doesn't?

LLMs are probabilistic token generators. The probabilities can be changed, for example by adding text as I did earlier, but there's still a chance that it doesn't work. This is related to the hallucination problem in LLMs.

  1. I think that DeepSeek-R1 would perform substantially worse if we do that. Maybe its a thinking-only mode. Is there some kind of performance measure of DS-R1 working under that circumstances?

Yes, as also mentioned earlier, preventing the model from thinking results in poorer output.

<!-- gh-comment-id:2958585468 --> @rick-github commented on GitHub (Jun 10, 2025): > 1. And we don't know why sometimes it works and sometimes it doesn't? LLMs are probabilistic token generators. The probabilities can be changed, for example by adding text as I did earlier, but there's still a chance that it doesn't work. This is related to the hallucination problem in LLMs. > 2. I think that DeepSeek-R1 would perform substantially worse if we do that. Maybe its a thinking-only mode. Is there some kind of performance measure of DS-R1 working under that circumstances? Yes, as also mentioned earlier, preventing the model from thinking results in poorer output.
Author
Owner

@lee-b commented on GitHub (Jun 21, 2025):

Rather than:

<think></think>
...

Wouldn't something like this be better?

<think>This seems straightforward, so I can just answer directly.</think>
...
<!-- gh-comment-id:2993473422 --> @lee-b commented on GitHub (Jun 21, 2025): Rather than: ``` <think></think> ... ``` Wouldn't something like this be better? ``` <think>This seems straightforward, so I can just answer directly.</think> ... ```
Author
Owner

@rick-github commented on GitHub (Jun 21, 2025):

Yes, as I mentioned, adding text results in better detection of the trailing </think>. But without thinking, the output from the model is poor.

<!-- gh-comment-id:2993475239 --> @rick-github commented on GitHub (Jun 21, 2025): Yes, as I mentioned, adding text results in better detection of the trailing `</think>`. But without thinking, the output from the model is poor.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53772