[GH-ISSUE #7051] Tool call support in Qwen 2.5 hallucinates with Maybe pattern #4476

Closed
opened 2026-04-12 15:24:12 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @ChristianWeyer on GitHub (Sep 30, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7051

Originally assigned to: @ParthSareen on GitHub.

According to https://python.useinstructor.com/concepts/maybe/.

There is an issue with tool calling in a case like this:

{
  "messages": [
    {
      "role": "system",
      "content": "Today's date is 2024-09-30. Please consider this when processing the availability information.\nIf you cannot extract the start date, use today.\nThis is the list of employees, with the initials, employee ID, full name, and skills:\n...\n\nDO NOT invent data. DO NOT hallucinate!"
    },
    {
      "role": "user",
      "content": "When does our colleague XYZ have two days available for a 2 days appointment?"
    }
  ],
  "model": "qwen2.5:7b-instruct-fp16",
  "tool_choice": {
    "type": "function",
    "function": {
      "name": "MaybeAvailabilityRequest"
    }
  },
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "MaybeAvailabilityRequest",
        "description": "Correctly extracted `MaybeAvailabilityRequest` with all the required parameters with correct types",
        "parameters": {
          "$defs": {
            "AvailabilityRequest": {
              "properties": {
                "personIds": {
                  "description": "List of person IDs to check availability for",
                  "items": {
                    "type": "integer"
                  },
                  "title": "Personids",
                  "type": "array"
                },
                "startDate": {
                  "description": "Start date for the availability check",
                  "title": "Startdate",
                  "type": "string"
                },
                "endDate": {
                  "anyOf": [
                    {
                      "type": "string"
                    },
                    {
                      "type": "null"
                    }
                  ],
                  "description": "End date for the availability check",
                  "title": "Enddate"
                },
                "numberOfConsecutiveDays": {
                  "description": "Number of consecutive days required",
                  "title": "Numberofconsecutivedays",
                  "type": "integer"
                }
              },
              "required": [
                "personIds",
                "startDate",
                "endDate",
                "numberOfConsecutiveDays"
              ],
              "title": "AvailabilityRequest",
              "type": "object"
            }
          },
          "properties": {
            "result": {
              "anyOf": [
                {
                  "$ref": "#/$defs/AvailabilityRequest"
                },
                {
                  "type": "null"
                }
              ],
              "default": null
            },
            "error": {
              "default": false,
              "title": "Error",
              "type": "boolean"
            },
            "message": {
              "anyOf": [
                {
                  "type": "string"
                },
                {
                  "type": "null"
                }
              ],
              "default": null,
              "title": "Message"
            }
          },
          "type": "object",
          "required": []
        }
      }
    }
  ]
}

it answers with this:

{
  "id": "chatcmpl-485",
  "object": "chat.completion",
  "created": 1727721983,
  "model": "qwen2.5:7b-instruct-fp16",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_oe8h5as1",
            "type": "function",
            "function": {
              "name": "MaybeAvailabilityRequest",
              "arguments": "{\"error\":false,\"message\":null,\"result\":{\"availableDateRange\":[{\"end_date\":\"2024-10-03\",\"start_date\":\"2024-10-01\"}]}}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 477,
    "completion_tokens": 158,
    "total_tokens": 635
  }
}

Which is obviously wrong and not following the JSON schema from the tool call.

When I use non function calling and craft the prompt manually, it always gets the answer right.

.cc @JianxinMa

Thanks!

Originally created by @ChristianWeyer on GitHub (Sep 30, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7051 Originally assigned to: @ParthSareen on GitHub. According to https://python.useinstructor.com/concepts/maybe/. There is an issue with tool calling in a case like this: ```json { "messages": [ { "role": "system", "content": "Today's date is 2024-09-30. Please consider this when processing the availability information.\nIf you cannot extract the start date, use today.\nThis is the list of employees, with the initials, employee ID, full name, and skills:\n...\n\nDO NOT invent data. DO NOT hallucinate!" }, { "role": "user", "content": "When does our colleague XYZ have two days available for a 2 days appointment?" } ], "model": "qwen2.5:7b-instruct-fp16", "tool_choice": { "type": "function", "function": { "name": "MaybeAvailabilityRequest" } }, "tools": [ { "type": "function", "function": { "name": "MaybeAvailabilityRequest", "description": "Correctly extracted `MaybeAvailabilityRequest` with all the required parameters with correct types", "parameters": { "$defs": { "AvailabilityRequest": { "properties": { "personIds": { "description": "List of person IDs to check availability for", "items": { "type": "integer" }, "title": "Personids", "type": "array" }, "startDate": { "description": "Start date for the availability check", "title": "Startdate", "type": "string" }, "endDate": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "description": "End date for the availability check", "title": "Enddate" }, "numberOfConsecutiveDays": { "description": "Number of consecutive days required", "title": "Numberofconsecutivedays", "type": "integer" } }, "required": [ "personIds", "startDate", "endDate", "numberOfConsecutiveDays" ], "title": "AvailabilityRequest", "type": "object" } }, "properties": { "result": { "anyOf": [ { "$ref": "#/$defs/AvailabilityRequest" }, { "type": "null" } ], "default": null }, "error": { "default": false, "title": "Error", "type": "boolean" }, "message": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Message" } }, "type": "object", "required": [] } } } ] } ``` it answers with this: ``` json { "id": "chatcmpl-485", "object": "chat.completion", "created": 1727721983, "model": "qwen2.5:7b-instruct-fp16", "system_fingerprint": "fp_ollama", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_oe8h5as1", "type": "function", "function": { "name": "MaybeAvailabilityRequest", "arguments": "{\"error\":false,\"message\":null,\"result\":{\"availableDateRange\":[{\"end_date\":\"2024-10-03\",\"start_date\":\"2024-10-01\"}]}}" } } ] }, "finish_reason": "tool_calls" } ], "usage": { "prompt_tokens": 477, "completion_tokens": 158, "total_tokens": 635 } } ``` Which is obviously wrong and not following the JSON schema from the tool call. When I use non function calling and craft the prompt manually, it always gets the answer right. .cc @JianxinMa Thanks!
GiteaMirror added the feature request label 2026-04-12 15:24:12 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 1, 2024):

Can you post the code you used to get the results? I took the code from maybe, adjusted it for ollama as per ollama, used the model qwen2.5:7b-instruct-q8_0 since I don't have the fp16 quant on hand, and got the following results, which seems the correct response for the Maybe pattern.

{
  "result": {
    "age": 25,
    "name": "Jason",
    "role": "scientist"
  },
  "error": false,
  "message": null
}
{
  "result": null,
  "error": true,
  "message": "Unknown user"
}
<!-- gh-comment-id:2384954437 --> @rick-github commented on GitHub (Oct 1, 2024): Can you post the code you used to get the results? I took the code from [maybe](https://python.useinstructor.com/concepts/maybe/), adjusted it for ollama as per [ollama](https://python.useinstructor.com/hub/ollama/), used the model qwen2.5:7b-instruct-q8_0 since I don't have the fp16 quant on hand, and got the following results, which seems the correct response for the `Maybe` pattern. ```json { "result": { "age": 25, "name": "Jason", "role": "scientist" }, "error": false, "message": null } { "result": null, "error": true, "message": "Unknown user" } ```
Author
Owner

@ChristianWeyer commented on GitHub (Oct 1, 2024):

Yes, you seem to be using instructor.Mode.JSON.
This does not utilize Function/Tool calling, but rather 'hand-crafts' the prompt.

@rick-github Try omitting the mode parameter which defaults to Tool calling - it should fail then. Like for me above.

client = instructor.from_openai(openai.OpenAI(base_url="http://localhost:11434/v1"))

class UserDetail(BaseModel):
    age: int
    name: str
    role: Optional[str] = Field(default=None)

class MaybeUser(BaseModel):
    result: Optional[UserDetail] = Field(default=None)
    error: bool = Field(default=False)
    message: Optional[str] = Field(default=None)

    def __bool__(self):
        return self.result is not None

def extract(content: str) -> MaybeUser:
    return client.chat.completions.create(
        model="qwen2.5:7b-instruct-fp16",
        response_model=MaybeUser,
        messages=[
            {"role": "user", "content": f"Extract `{content}`"},
        ],
    )

user1 = extract("Jason is a 25-year-old scientist")
print(user1.model_dump_json(indent=2))

user2 = extract("Unknown user")
print(user2.model_dump_json(indent=2))

Make sure to look what is going over the wire. I am using a local HTTP debugging proxy (ProxyMan in my case).

<!-- gh-comment-id:2385702845 --> @ChristianWeyer commented on GitHub (Oct 1, 2024): Yes, you seem to be using `instructor.Mode.JSON`. This does not utilize Function/Tool calling, but rather 'hand-crafts' the prompt. @rick-github Try omitting the mode parameter which defaults to Tool calling - it should fail then. Like for me above. ``` client = instructor.from_openai(openai.OpenAI(base_url="http://localhost:11434/v1")) class UserDetail(BaseModel): age: int name: str role: Optional[str] = Field(default=None) class MaybeUser(BaseModel): result: Optional[UserDetail] = Field(default=None) error: bool = Field(default=False) message: Optional[str] = Field(default=None) def __bool__(self): return self.result is not None def extract(content: str) -> MaybeUser: return client.chat.completions.create( model="qwen2.5:7b-instruct-fp16", response_model=MaybeUser, messages=[ {"role": "user", "content": f"Extract `{content}`"}, ], ) user1 = extract("Jason is a 25-year-old scientist") print(user1.model_dump_json(indent=2)) user2 = extract("Unknown user") print(user2.model_dump_json(indent=2)) ``` Make sure to look what is going over the wire. I am using a local HTTP debugging proxy (ProxyMan in my case).
Author
Owner

@rick-github commented on GitHub (Oct 1, 2024):

I suspect that if you use the tool in the way suggested by the authors, the results will be more acceptable. If you are using it the way they suggest and results are not acceptable, then we can debug that.

<!-- gh-comment-id:2386358171 --> @rick-github commented on GitHub (Oct 1, 2024): I suspect that if you use the tool in the way suggested by the authors, the results will be more acceptable. If you are using it the way they suggest and results are not acceptable, then we can debug that.
Author
Owner

@ChristianWeyer commented on GitHub (Oct 1, 2024):

The point is: We have Function Calling / Tool Calling support in Ollama now.

When we use a tool that uses Function Calling and an FC-enabled model in Ollama does not behave correctly, it would be nice to have this working correctly.

The code works e.g. with gpt-4o. And the "promise" of Function Calling API compat is to just change the base URL and the model, and it will still work (at least on the protocol level).

Do you agree here?

<!-- gh-comment-id:2386402661 --> @ChristianWeyer commented on GitHub (Oct 1, 2024): The point is: We have Function Calling / Tool Calling support in Ollama now. When we use a tool that uses Function Calling and an FC-enabled model in Ollama does not behave correctly, it would be nice to have this working correctly. The code works e.g. with gpt-4o. And the "promise" of Function Calling API compat is to just change the base URL and the model, and it will still work (at least on the protocol level). Do you agree here?
Author
Owner

@ivanstepanovftw commented on GitHub (Oct 12, 2024):

What you need is BNF sampler, with generated JSON GBNF from relevant Pydantic's BaseModel. I am referring to llama.cpp GBNF examples.

<!-- gh-comment-id:2408285148 --> @ivanstepanovftw commented on GitHub (Oct 12, 2024): What you need is BNF sampler, with generated JSON GBNF from relevant Pydantic's `BaseModel`. I am referring to llama.cpp GBNF examples.
Author
Owner

@sheffler commented on GitHub (Apr 8, 2025):

Are you still seeing this problem? I believe it is related to the way the $defs clause is handled.

<!-- gh-comment-id:2787172600 --> @sheffler commented on GitHub (Apr 8, 2025): Are you still seeing this problem? I believe it is related to the way the $defs clause is handled.
Author
Owner

@ChristianWeyer commented on GitHub (Apr 8, 2025):

I think nothing has changed since then.

<!-- gh-comment-id:2787186839 --> @ChristianWeyer commented on GitHub (Apr 8, 2025): I think nothing has changed since then.
Author
Owner

@sheffler commented on GitHub (Apr 8, 2025):

See this https://github.com/ollama/ollama/pull/10091

<!-- gh-comment-id:2787801652 --> @sheffler commented on GitHub (Apr 8, 2025): See this https://github.com/ollama/ollama/pull/10091
Author
Owner

@ParthSareen commented on GitHub (Apr 9, 2025):

Hey @ChristianWeyer there should also be some general improvements to tools coming in the next release which should help with this. I'd also give our python repo a go and see if that helps: https://github.com/ollama/ollama-python/blob/main/examples/tools.py

<!-- gh-comment-id:2787917367 --> @ParthSareen commented on GitHub (Apr 9, 2025): Hey @ChristianWeyer there should also be some general improvements to tools coming in the next release which should help with this. I'd also give our python repo a go and see if that helps: https://github.com/ollama/ollama-python/blob/main/examples/tools.py
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4476