[GH-ISSUE #15288] Gemma 4: /v1/chat/completions returns empty content with all text in reasoning field — no think=false support on OpenAI endpoint #35541

Closed
opened 2026-04-22 20:06:43 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @gofastercloud on GitHub (Apr 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15288

What is the issue?

When using Gemma 4 models (tested: gemma4:26b) via the OpenAI-compatible /v1/chat/completions endpoint, the content field is always empty and all generated text appears only in the reasoning field. There is no way to disable thinking mode via this endpoint.

The native /api/chat endpoint works correctly with "think": false, producing content in the expected content field. But the OpenAI-compatible endpoint does not accept or pass through the think parameter.

This breaks all OpenAI-compatible tools and frameworks (OpenClaw, LangChain, etc.) that expect the response in choices[0].message.content.

Steps to reproduce

OpenAI-compatible endpoint (broken):

curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gemma4:26b","messages":[{"role":"user","content":"What is the capital of Australia?"}],"max_tokens":50}' | jq '.choices[0].message'

Result:

{
  "role": "assistant",
  "content": "",
  "reasoning": "The user is asking for the capital of Australia.\nThe capital of Australia is Canberra.\nAnswer concisely and directly."
}

content is empty. The actual response is buried in reasoning.

Native API (works correctly):

curl -s http://localhost:11434/api/chat \
  -d '{"model":"gemma4:26b","messages":[{"role":"user","content":"What is the capital of Australia?"}],"think":false,"stream":false}'

Result: Content correctly appears in message.content as "Canberra".

Expected behavior

Either:

  1. The /v1/chat/completions endpoint should support a think parameter (or extra_body.think) to disable thinking mode, OR
  2. When thinking is enabled, the final non-thinking response should be placed in content (with thinking in reasoning), not left empty

Environment

  • Ollama version: 0.20.0
  • Model: gemma4:26b (SHA: 5571076f3d70), also tested with custom Modelfile
  • OS: macOS 15 (Sequoia), Apple M4 Pro
  • Tested via: curl, direct HTTP

Impact

This effectively makes Gemma 4 unusable with any tool that uses the OpenAI-compatible API. OpenClaw, LangChain, and similar frameworks all read choices[0].message.content and get an empty string.

  • #10976 — Thinking + tools = empty output (Qwen3, similar symptom)
  • #15260 — think=false breaks format for gemma4
  • #14645 — format ignored when think disabled for qwen3.5
Originally created by @gofastercloud on GitHub (Apr 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15288 ## What is the issue? When using Gemma 4 models (tested: `gemma4:26b`) via the OpenAI-compatible `/v1/chat/completions` endpoint, the `content` field is always empty and all generated text appears only in the `reasoning` field. There is no way to disable thinking mode via this endpoint. The native `/api/chat` endpoint works correctly with `"think": false`, producing content in the expected `content` field. But the OpenAI-compatible endpoint does not accept or pass through the `think` parameter. This breaks all OpenAI-compatible tools and frameworks (OpenClaw, LangChain, etc.) that expect the response in `choices[0].message.content`. ## Steps to reproduce **OpenAI-compatible endpoint (broken):** ```bash curl -s http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"gemma4:26b","messages":[{"role":"user","content":"What is the capital of Australia?"}],"max_tokens":50}' | jq '.choices[0].message' ``` **Result:** ```json { "role": "assistant", "content": "", "reasoning": "The user is asking for the capital of Australia.\nThe capital of Australia is Canberra.\nAnswer concisely and directly." } ``` `content` is empty. The actual response is buried in `reasoning`. **Native API (works correctly):** ```bash curl -s http://localhost:11434/api/chat \ -d '{"model":"gemma4:26b","messages":[{"role":"user","content":"What is the capital of Australia?"}],"think":false,"stream":false}' ``` **Result:** Content correctly appears in `message.content` as "Canberra". ## Expected behavior Either: 1. The `/v1/chat/completions` endpoint should support a `think` parameter (or `extra_body.think`) to disable thinking mode, OR 2. When thinking is enabled, the final non-thinking response should be placed in `content` (with thinking in `reasoning`), not left empty ## Environment - **Ollama version:** 0.20.0 - **Model:** gemma4:26b (SHA: 5571076f3d70), also tested with custom Modelfile - **OS:** macOS 15 (Sequoia), Apple M4 Pro - **Tested via:** curl, direct HTTP ## Impact This effectively makes Gemma 4 unusable with any tool that uses the OpenAI-compatible API. OpenClaw, LangChain, and similar frameworks all read `choices[0].message.content` and get an empty string. ## Related issues - #10976 — Thinking + tools = empty output (Qwen3, similar symptom) - #15260 — think=false breaks format for gemma4 - #14645 — format ignored when think disabled for qwen3.5
Author
Owner

@rick-github commented on GitHub (Apr 3, 2026):

$ curl -s  http://localhost:11434/v1/chat/completions   -H "Content-Type: application/json"   -d '{
  "model":"gemma4:26b",
  "messages":[{"role":"user","content":"What is the capital of Australia?"}],
  "max_tokens":50,
  "reasoning_effort":"none"
}' | jq '.choices[0].message'
{
  "role": "assistant",
  "content": "The capital of Australia is **Canberra**."
}
<!-- gh-comment-id:4183724041 --> @rick-github commented on GitHub (Apr 3, 2026): ```console $ curl -s http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{ "model":"gemma4:26b", "messages":[{"role":"user","content":"What is the capital of Australia?"}], "max_tokens":50, "reasoning_effort":"none" }' | jq '.choices[0].message' { "role": "assistant", "content": "The capital of Australia is **Canberra**." } ```
Author
Owner

@jmorganca commented on GitHub (Apr 3, 2026):

I think this is because you have "max_tokens":50 set. Thanks @rick-github 🙏 !!

<!-- gh-comment-id:4183945056 --> @jmorganca commented on GitHub (Apr 3, 2026): I think this is because you have `"max_tokens":50` set. Thanks @rick-github 🙏 !!
Author
Owner

@rick-github commented on GitHub (Apr 3, 2026):

@jmorganca There is an issue that content that is truncated by max_tokens goes in the content field rather than the thinking field.

<!-- gh-comment-id:4183963876 --> @rick-github commented on GitHub (Apr 3, 2026): @jmorganca There is an issue that content that is truncated by `max_tokens` goes in the content field rather than the thinking field.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35541