[GH-ISSUE #13750] Tool calling ignored when response_format is present in self-hosted Ollama (works on Ollama.com) #34774

Open
opened 2026-04-22 18:36:33 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @sambaptista on GitHub (Jan 16, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13750

What is the issue?

Tool calling behavior is inconsistent between Ollama.com and self-hosted Ollama instances when using the same model and request that includes both response_format and tool calling.

Actual behavior

On self-hosted Ollama : When both tools and response_format are present in the request, response_format takes complete priority over tools. The model is forced to generate a hallucinated JSON response conforming to the schema without ever considering tool calls.

Ollama.com: Works correctly, using tool and formating response.

Expected behavior

Self-hosted Ollama should behave consistently with Ollama.com (see result bellow)

Environment

Self-hosted ollama setup 1 (Windows):

  • Ollama version: 0.14.1 and 0.16.1
  • GPU: RTX 4090
  • OS: Windows

Self-hosted ollama setup 2 (macOS):

  • Ollama version: 0.14.1
  • OS: Apple M2 Max - Sequoia 15.6.1 (24G90)

Both produce identical broken behavior

Ollama.com version :

  • Build commit: 9b984e8653945c87b53e5f0b95c0d6e29682de56
  • Build date: January 15, 2025 (x-build-time: 1768538406)

Model tested

ministral-3:14b
(same observation with bigger models, but different from local and remove host)

Reproduction

Request with both tools and response_format:

curl -X POST http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ministral-3:14b",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather in Paris?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a city",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {
                "type": "string",
                "description": "The city name"
              }
            },
            "required": ["city"]
          }
        }
      }
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "weather_response",
        "strict": true,
        "schema": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string"
            },
            "temperature": {
              "type": "number"
            }
          },
          "required": ["city", "temperature"],
          "additionalProperties": false
        }
      }
    }
  }'

Result: Tool is completely ignored, model hallucinates structured response

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "{\"city\":\"Paris\",\"temperature\":15}"
      }
    }
  ]
}

The model returns a JSON matching the schema but never calls the tool. The temperature value is made up since no actual tool was executed.

Replacing url to ollama.com...

curl -X POST https://ollama.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \

...returns correct result calling tool and formating answer :

{
  "id": "chatcmpl-170",
  "object": "chat.completion",
  "created": 1768588887,
  "model": "ministral-3:14b",
  "system_fingerprint": "fp_ollama",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "",
      "tool_calls": [{
        "id": "YuxkvfCv8",
        "index": 0,
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"city\":\"Paris\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }],
  "usage": {
    "prompt_tokens": 67,
    "completion_tokens": 12,
    "total_tokens": 79
  }
}

Self hosted ollama Working case

On self hosted, tools work WITHOUT response_format :

curl -X POST http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ministral-3:14b",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather in Paris?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a city",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {
                "type": "string",
                "description": "The city name"
              }
            },
            "required": ["city"]
          }
        }
      }
    ]
  }'

Result: Tool is called correctly

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Paris\"}"
            }
          }
        ]
      }
    }
  ]
}

Impact

This makes it impossible to use/develop structured output with tool calling on self-hosted instances.

Question

Does Ollama.com run a different version or have custom patches that enable this functionality? If so, can this behavior be backported to self-hosted instances? Or did I miss something obvious that would cause this dysfunction on self-hosted instances?

Originally created by @sambaptista on GitHub (Jan 16, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13750 ### What is the issue? Tool calling behavior is inconsistent between Ollama.com and self-hosted Ollama instances when using the same model and request that includes both response_format and tool calling. ### Actual behavior **On self-hosted Ollama :** When both tools and response_format are present in the request, response_format takes complete priority over tools. The model is forced to generate a hallucinated JSON response conforming to the schema without ever considering tool calls. **Ollama.com:** Works correctly, using tool and formating response. ### Expected behavior Self-hosted Ollama should behave consistently with Ollama.com (see result bellow) ### Environment Self-hosted ollama setup 1 (Windows): - Ollama version: 0.14.1 and 0.16.1 - GPU: RTX 4090 - OS: Windows Self-hosted ollama setup 2 (macOS): - Ollama version: 0.14.1 - OS: Apple M2 Max - Sequoia 15.6.1 (24G90) Both produce identical broken behavior Ollama.com version : - Build commit: 9b984e8653945c87b53e5f0b95c0d6e29682de56 - Build date: January 15, 2025 (x-build-time: 1768538406) ### Model tested ministral-3:14b (same observation with bigger models, but different from local and remove host) ### Reproduction Request with both tools and response_format: ```bash curl -X POST http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ministral-3:14b", "messages": [ { "role": "user", "content": "What is the weather in Paris?" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "The city name" } }, "required": ["city"] } } } ], "response_format": { "type": "json_schema", "json_schema": { "name": "weather_response", "strict": true, "schema": { "type": "object", "properties": { "city": { "type": "string" }, "temperature": { "type": "number" } }, "required": ["city", "temperature"], "additionalProperties": false } } } }' ``` Result: Tool is completely ignored, model hallucinates structured response ```json { "choices": [ { "message": { "role": "assistant", "content": "{\"city\":\"Paris\",\"temperature\":15}" } } ] } ``` The model returns a JSON matching the schema but never calls the tool. The temperature value is made up since no actual tool was executed. Replacing url to ollama.com... ```bash curl -X POST https://ollama.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <API_KEY>" \ ``` ...returns correct result calling tool and formating answer : ```json { "id": "chatcmpl-170", "object": "chat.completion", "created": 1768588887, "model": "ministral-3:14b", "system_fingerprint": "fp_ollama", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "", "tool_calls": [{ "id": "YuxkvfCv8", "index": 0, "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\":\"Paris\"}" } }] }, "finish_reason": "tool_calls" }], "usage": { "prompt_tokens": 67, "completion_tokens": 12, "total_tokens": 79 } } ``` ### Self hosted ollama Working case On self hosted, tools work WITHOUT response_format : ```bash curl -X POST http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ministral-3:14b", "messages": [ { "role": "user", "content": "What is the weather in Paris?" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "The city name" } }, "required": ["city"] } } } ] }' ``` Result: Tool is called correctly ```json { "choices": [ { "message": { "role": "assistant", "tool_calls": [ { "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\":\"Paris\"}" } } ] } } ] } ``` ### Impact This makes it impossible to use/develop structured output with tool calling on self-hosted instances. ### Question Does Ollama.com run a different version or have custom patches that enable this functionality? If so, can this behavior be backported to self-hosted instances? Or did I miss something obvious that would cause this dysfunction on self-hosted instances?
Author
Owner

@sumitkm commented on GitHub (Jan 21, 2026):

Thanks for this pointer.

Using the ollama python library, I haven't been able to get tool calling to work with any model apart from gpt-oss. Everything else generates inconsistent garbage.

I had hopes on using requests library, but that didn't work any better. I'll give it a go again later. But tool calling seems to be an utter waste of time with half the models listed as supporting tool calling only support it for 400Gb+ models and not the plethora of model sizes that are all listed on the same page.

Anyway, it's 3AM /end-rant. Apologies.

<!-- gh-comment-id:3775899336 --> @sumitkm commented on GitHub (Jan 21, 2026): Thanks for this pointer. Using the ollama python library, I haven't been able to get tool calling to work with any model apart from `gpt-oss`. Everything else generates inconsistent garbage. I had hopes on using `requests` library, but that didn't work any better. I'll give it a go again later. But tool calling seems to be an utter waste of time with half the models listed as supporting tool calling only support it for 400Gb+ models and not the plethora of model sizes that are all listed on the same page. Anyway, it's 3AM /end-rant. Apologies.
Author
Owner

@sambaptista commented on GitHub (Feb 15, 2026):

Seems related to https://github.com/ollama/ollama/issues/8095 and https://github.com/vllm-project/vllm/issues/16313

<!-- gh-comment-id:3905367417 --> @sambaptista commented on GitHub (Feb 15, 2026): Seems related to https://github.com/ollama/ollama/issues/8095 and https://github.com/vllm-project/vllm/issues/16313
Author
Owner

@sumitkm commented on GitHub (Mar 6, 2026):

Got back to this side-project of mine and with Ollama version 0.17.4 I can implement tool calling using both gpt-oss and qwen3.5:27b. Haven't gone back to Devstral or Mistral, but currently using direct http requests using the requests library and they seem to work. Will probably skip the ollama.py wrapper for now. Since models change so fast and newer models are better trained for tool calling, I don't see the point in trying to shoe-horne tool-calling into older models.

Once again, apologies for earlier frustration. Ollama community is honestly fantastic. The speed at which bugs are acknowledged and fixed is epic. So a hearty thank you to every contributor.

<!-- gh-comment-id:4014771968 --> @sumitkm commented on GitHub (Mar 6, 2026): Got back to this side-project of mine and with Ollama version `0.17.4` I can implement tool calling using both `gpt-oss` and `qwen3.5:27b`. Haven't gone back to Devstral or Mistral, but currently using direct http requests using the requests library and they seem to work. Will probably skip the `ollama.py` wrapper for now. Since models change so fast and newer models are better trained for tool calling, I don't see the point in trying to shoe-horne tool-calling into older models. Once again, apologies for earlier frustration. Ollama community is honestly fantastic. The speed at which bugs are acknowledged and fixed is epic. So a hearty thank you to every contributor.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34774