[GH-ISSUE #9627] removing <think> tag as an option #68338

Closed
opened 2026-05-04 13:15:41 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @Shahin-rmz on GitHub (Mar 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9627

Hello, Ollama made my life much easier. Thanks a lot.
For the new generation of reasoning models that they have thing tag, is there a way to delete/remove the
</think> tag if we want? if not it will be cool idea to have that option both in API and direct usage one. thanks a lot.

Originally created by @Shahin-rmz on GitHub (Mar 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9627 Hello, Ollama made my life much easier. Thanks a lot. For the new generation of reasoning models that they have thing tag, is there a way to delete/remove the ```</think>``` tag if we want? if not it will be cool idea to have that option both in API and direct usage one. thanks a lot.
GiteaMirror added the feature request label 2026-05-04 13:15:41 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 10, 2025):

The <think> tags are a function of the model and there's currently no mechanism for removing them from the token stream generated by the model. The client usually does this, for example OpenWebUI detects them and hides them behind a spinner. API clients can detect the opening tag and drop output until the closing tag is detected. If you are using the deepseek distillate models, then you can use the base model (qwen2.5 or llama3.1) and get output without the reasoning steps.

<!-- gh-comment-id:2711527607 --> @rick-github commented on GitHub (Mar 10, 2025): The `<think>` tags are a function of the model and there's currently no mechanism for removing them from the token stream generated by the model. The client usually does this, for example OpenWebUI detects them and hides them behind a spinner. API clients can detect the opening tag and drop output until the closing tag is detected. If you are using the deepseek distillate models, then you can use the base model (qwen2.5 or llama3.1) and get output without the reasoning steps.
Author
Owner

@pdevine commented on GitHub (Mar 12, 2025):

@Shahin-rmz I could see us potentially putting something into the CLI to hide it, but I'm not sure it's a great idea for the API. As @rick-github mentioned, usually the client would filter these out.

<!-- gh-comment-id:2716129534 --> @pdevine commented on GitHub (Mar 12, 2025): @Shahin-rmz I could see us potentially putting something into the CLI to hide it, but I'm not sure it's a great idea for the API. As @rick-github mentioned, usually the client would filter these out.
Author
Owner

@allenporter commented on GitHub (Mar 15, 2025):

Usually thinking steps are expose separate in the API. The client shouldn't need to implement its own way to strip internal model tokens, just like any other internal model token. It can show or filter by processing the thinking steps in the API. This is how it works in other cloud APIs (Anthropic, google, etc) as well

<!-- gh-comment-id:2726748005 --> @allenporter commented on GitHub (Mar 15, 2025): Usually thinking steps are expose separate in the API. The client shouldn't need to implement its own way to strip internal model tokens, just like any other internal model token. It can show or filter by processing the thinking steps in the API. This is how it works in other cloud APIs (Anthropic, google, etc) as well
Author
Owner

@Kwisss commented on GitHub (Mar 19, 2025):

So if I understand correctly, there is no way for Ollama to distinct the thinking from the actual response in a model like qwq?
like returning it as a different type or separated from content?

I did this to separate the response but it could probably be done a lot better:

def generate():
	nonlocal assistant_response
	for line in response.iter_lines():
		if line:
			chunk = json.loads(line.decode('utf-8'))
			if 'message' in chunk and 'content' in chunk['message']:
				content = chunk['message']['content']
				assistant_response += content
				yield f"data: {json.dumps({'content': content})}\n\n"
			if chunk.get('done', False):
				# Split into thinking and content
				if '</think>' in assistant_response:
					thinking_response, content_response = assistant_response.split('</think>', 1)
				else:
					thinking_response = ''
					content_response = assistant_response
				# Store in conversation history
				conversations[session_id].append({
					"role": "assistant",
					"content": content_response.strip(),
					"thinking": thinking_response.strip()
				})
				break
	yield f"data: {json.dumps({'done': True})}\n\n"

return app.response_class(generate(), mimetype='text/event-stream')
<!-- gh-comment-id:2737072079 --> @Kwisss commented on GitHub (Mar 19, 2025): So if I understand correctly, there is no way for Ollama to distinct the thinking from the actual response in a model like qwq? like returning it as a different type or separated from content? I did this to separate the response but it could probably be done a lot better: ``` def generate(): nonlocal assistant_response for line in response.iter_lines(): if line: chunk = json.loads(line.decode('utf-8')) if 'message' in chunk and 'content' in chunk['message']: content = chunk['message']['content'] assistant_response += content yield f"data: {json.dumps({'content': content})}\n\n" if chunk.get('done', False): # Split into thinking and content if '</think>' in assistant_response: thinking_response, content_response = assistant_response.split('</think>', 1) else: thinking_response = '' content_response = assistant_response # Store in conversation history conversations[session_id].append({ "role": "assistant", "content": content_response.strip(), "thinking": thinking_response.strip() }) break yield f"data: {json.dumps({'done': True})}\n\n" return app.response_class(generate(), mimetype='text/event-stream') ```
Author
Owner

@rick-github commented on GitHub (Mar 19, 2025):

https://github.com/ollama/ollama/issues/8528#issuecomment-2736910261

<!-- gh-comment-id:2737081427 --> @rick-github commented on GitHub (Mar 19, 2025): https://github.com/ollama/ollama/issues/8528#issuecomment-2736910261
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68338