[GH-ISSUE #15293] gemma4 thinking not forwarded via /v1/chat/completions #71845

Open
opened 2026-05-05 02:41:29 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @roamiiing on GitHub (Apr 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15293

Description

When using Gemma 4 models, the think: true parameter is ignored by the
/v1/chat/completions endpoint. The same parameter works correctly via
the native /api/chat endpoint.

Steps to reproduce

Native API (works ✓):
curl http://127.0.0.1:11434/api/chat
-d '{"model":"gemma4:e4b","think":true,"stream":true, "messages":[{"role":"user","content":"What is 2+2?"}]}'

Response includes "thinking" field with reasoning content

OpenAI-compatible API (broken ✗):
curl http://127.0.0.1:11434/v1/chat/completions
-H "Content-Type: application/json"
-d '{"model":"gemma4:e4b","think":true,"stream":true, "messages":[{"role":"user","content":"What is 2+2?"}]}'

Response: just "4", no thinking content
Also tried: "options":{"think":true}, "reasoning_effort":"high" — none work.

Environment

  • Ollama version: 0.20.0
  • Model: gemma4:e4b
  • OS: Linux (WSL2)

Expected behavior

think:true should be forwarded to the model when using /v1/,
same as it was fixed for deepseek-r1 in #15036.

Originally created by @roamiiing on GitHub (Apr 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15293 ## Description When using Gemma 4 models, the `think: true` parameter is ignored by the `/v1/chat/completions` endpoint. The same parameter works correctly via the native `/api/chat` endpoint. ## Steps to reproduce Native API (works ✓): curl http://127.0.0.1:11434/api/chat -d '{"model":"gemma4:e4b","think":true,"stream":true, "messages":[{"role":"user","content":"What is 2+2?"}]}' Response includes "thinking" field with reasoning content OpenAI-compatible API (broken ✗): curl http://127.0.0.1:11434/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"gemma4:e4b","think":true,"stream":true, "messages":[{"role":"user","content":"What is 2+2?"}]}' Response: just "4", no thinking content Also tried: `"options":{"think":true}`, `"reasoning_effort":"high"` — none work. ## Environment - Ollama version: 0.20.0 - Model: gemma4:e4b - OS: Linux (WSL2) ## Expected behavior `think:true` should be forwarded to the model when using `/v1/`, same as it was fixed for deepseek-r1 in #15036.
Author
Owner

@rick-github commented on GitHub (Apr 3, 2026):

Use reasoning_effort and give it a prompt it has to think about.

$ curl -s http://127.0.0.1:11434/v1/chat/completions -d '{
  "model":"gemma4:e4b",
  "reasoning_effort":"low",
  "messages":[{"role":"user","content":"Why is the sky blue?"}]
}' | jq -r '.choices[0].message|"reasoning: \(.reasoning[0:50])\ncontent  : \(.content[0:50])"'
reasoning: Here's a thinking process that leads to the sugges
content  : This is one of the most beautiful and common quest
<!-- gh-comment-id:4184159137 --> @rick-github commented on GitHub (Apr 3, 2026): Use `reasoning_effort` and give it a prompt it has to think about. ```console $ curl -s http://127.0.0.1:11434/v1/chat/completions -d '{ "model":"gemma4:e4b", "reasoning_effort":"low", "messages":[{"role":"user","content":"Why is the sky blue?"}] }' | jq -r '.choices[0].message|"reasoning: \(.reasoning[0:50])\ncontent : \(.content[0:50])"' reasoning: Here's a thinking process that leads to the sugges content : This is one of the most beautiful and common quest ```
Author
Owner

@verdverm commented on GitHub (Apr 7, 2026):

The gemma 4 model card mentions multiple thinking levels, how do those map onto the available options in the openai compat api?

This seems like a general question or documentation gap for most thinking models.

For gemma4, it looks like all mentioned levels [low,medium,high] look to trigger thoughts, however "none" does as well, which seems like a bug. It's unclear if the different values have meaning to gemma

<!-- gh-comment-id:4196791210 --> @verdverm commented on GitHub (Apr 7, 2026): The gemma 4 model card mentions multiple thinking levels, how do those map onto the available options in the openai compat api? This seems like a general question or documentation gap for most thinking models. For gemma4, it looks like all mentioned levels [low,medium,high] look to trigger thoughts, however "none" does as well, which seems like a bug. It's unclear if the different values have meaning to gemma
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

The original template for the model doesn't indicate support for multiple thinking levels. Perhaps "configurable thinking modes" from the model card means "on" and "off".

$ for t in high medium low none ; do r="$(curl -s localhost:11434/v1/chat/completions -d '{"model":"gemma4","messages":[{"role":"user","content":"why is the sky blue"}],"reas
oning_effort":"'$t'","stream":false,"seed":0,"temperature":0}' | jq -r '.choices[0].message.reasoning//""')" ; printf "%-7s %4d %s\n" $t ${#r} "${r::20}" ; done
high    2607 Here's a thinking pr
medium  2607 Here's a thinking pr
low     2607 Here's a thinking pr
none       0 
<!-- gh-comment-id:4198643909 --> @rick-github commented on GitHub (Apr 7, 2026): The [original template](https://huggingface.co/google/gemma-4-E2B-it/blob/main/chat_template.jinja) for the model doesn't indicate support for multiple thinking levels. Perhaps "configurable thinking modes" from the model card means "on" and "off". ```console $ for t in high medium low none ; do r="$(curl -s localhost:11434/v1/chat/completions -d '{"model":"gemma4","messages":[{"role":"user","content":"why is the sky blue"}],"reas oning_effort":"'$t'","stream":false,"seed":0,"temperature":0}' | jq -r '.choices[0].message.reasoning//""')" ; printf "%-7s %4d %s\n" $t ${#r} "${r::20}" ; done high 2607 Here's a thinking pr medium 2607 Here's a thinking pr low 2607 Here's a thinking pr none 0 ```
Author
Owner

@verdverm commented on GitHub (Apr 7, 2026):

perhaps, wish they made it less ambiguous

based on https://ai.google.dev/gemini-api/docs/thinking#thinking-levels, Gemini has distinct levels and Gemma is made from Gemini

Is it possible their template is wrong?

<!-- gh-comment-id:4200325593 --> @verdverm commented on GitHub (Apr 7, 2026): perhaps, wish they made it less ambiguous based on https://ai.google.dev/gemini-api/docs/thinking#thinking-levels, Gemini has distinct levels and Gemma is made from Gemini Is it possible their template is wrong?
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

Is it possible their template is wrong?

Pretty sure Google would know how to write a template for their model. According to the documentation, it only supports enabling and disabling.

<!-- gh-comment-id:4200379211 --> @rick-github commented on GitHub (Apr 7, 2026): > Is it possible their template is wrong? Pretty sure Google would know how to write a template for their model. According to the [documentation](https://huggingface.co/google/gemma-4-E4B-it#2-thinking-mode-configuration), it only supports enabling and disabling.
Author
Owner

@verdverm commented on GitHub (Apr 7, 2026):

thanks for the clarity @rick-github, I learned a few things along the way!

<!-- gh-comment-id:4200827696 --> @verdverm commented on GitHub (Apr 7, 2026): thanks for the clarity @rick-github, I learned a few things along the way!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71845