[GH-ISSUE #7572] OpenAI API tool calling doesn't work #66881

Closed
opened 2026-05-04 08:35:52 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @SinanAkkoyun on GitHub (Nov 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7572

What is the issue?

Then OpenAI API endpoint does not support proper tool calling, it just outputs the tool in plaintext as message content: {"name": "get_time", "parameters": {"timezone": "Europe/Berlin"}}

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.4.0

Originally created by @SinanAkkoyun on GitHub (Nov 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7572 ### What is the issue? Then OpenAI API endpoint does not support proper tool calling, it just outputs the tool in plaintext as message content: `{"name": "get_time", "parameters": {"timezone": "Europe/Berlin"}}` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.4.0
GiteaMirror added the bugapi labels 2026-05-04 08:35:52 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 8, 2024):

curl -s http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"llama3.1:8b-instruct-q4_K_M",
  "messages": [
    {
      "role": "user",
      "content": "What is the weather like in Boston and New York?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "stream":false
}' | jq
{
  "id": "chatcmpl-211",
  "object": "chat.completion",
  "created": 1731063537,
  "model": "llama3.1:8b-instruct-q4_K_M",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_jd2vds65",
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"location\":\"Boston, MA\",\"unit\":\"fahrenheit\"}"
            }
          },
          {
            "id": "call_f11n0cfo",
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"location\":\"New York, NY\",\"unit\":\"celsius\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 186,
    "completion_tokens": 55,
    "total_tokens": 241
  }
}
<!-- gh-comment-id:2464411883 --> @rick-github commented on GitHub (Nov 8, 2024): ``` curl -s http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"llama3.1:8b-instruct-q4_K_M", "messages": [ { "role": "user", "content": "What is the weather like in Boston and New York?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } } ], "tool_choice": "auto", "stream":false }' | jq ``` ```json { "id": "chatcmpl-211", "object": "chat.completion", "created": 1731063537, "model": "llama3.1:8b-instruct-q4_K_M", "system_fingerprint": "fp_ollama", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_jd2vds65", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"location\":\"Boston, MA\",\"unit\":\"fahrenheit\"}" } }, { "id": "call_f11n0cfo", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"location\":\"New York, NY\",\"unit\":\"celsius\"}" } } ] }, "finish_reason": "tool_calls" } ], "usage": { "prompt_tokens": 186, "completion_tokens": 55, "total_tokens": 241 } } ```
Author
Owner

@SinanAkkoyun commented on GitHub (Nov 8, 2024):

@rick-github
Sorry, I forgot to say:

  • static doesn't work with Llama3.2
  • streaming doesn't work with Llama3.1
<!-- gh-comment-id:2465785220 --> @SinanAkkoyun commented on GitHub (Nov 8, 2024): @rick-github Sorry, I forgot to say: - static doesn't work with Llama3.2 - streaming doesn't work with Llama3.1
Author
Owner

@rick-github commented on GitHub (Nov 8, 2024):

#5796

By static, do you mean non-streaming?

If you can supply some examples or logs, it will be much easier to resolve your issue.

<!-- gh-comment-id:2465810679 --> @rick-github commented on GitHub (Nov 8, 2024): #5796 By static, do you mean non-streaming? If you can supply some examples or logs, it will be much easier to resolve your issue.
Author
Owner

@SinanAkkoyun commented on GitHub (Nov 8, 2024):

Sure thing! Just passing your exact prompt (non-streaming and yes I meant that with static) to Llama3.2 results in a response without tool_call, it just brings the output into a normal chat message. Perhaps because of the pythonic call? I can't tell why

(Sorry I also don't know where to get the correct logs from)

<!-- gh-comment-id:2465843175 --> @SinanAkkoyun commented on GitHub (Nov 8, 2024): Sure thing! Just passing your exact prompt (non-streaming and yes I meant that with static) to Llama3.2 results in a response without tool_call, it just brings the output into a normal chat message. Perhaps because of the pythonic call? I can't tell why (Sorry I also don't know where to get the correct logs from)
Author
Owner

@SinanAkkoyun commented on GitHub (Nov 8, 2024):

Does the ollama openai API perchance support special tokens in the response somehow?

<!-- gh-comment-id:2465845938 --> @SinanAkkoyun commented on GitHub (Nov 8, 2024): Does the ollama openai API perchance support special tokens in the response somehow?
Author
Owner

@rick-github commented on GitHub (Nov 8, 2024):

Could you supply the script that you are using to call the server? If I feed my example to llama3.2, I get a tool call:

curl -s http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"llama3.2:3b-instruct-q4_K_M",
  "messages": [
    {
      "role": "user",
      "content": "What is the weather like in Boston and New York?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "stream":false
}' | jq
{
  "id": "chatcmpl-168",
  "object": "chat.completion",
  "created": 1731105942,
  "model": "llama3.2:3b-instruct-q4_K_M",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_vxmlccki",
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"location\":\"Boston, MA\",\"unit\":\"celsius\"}"
            }
          },
          {
            "id": "call_abyfceyx",
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"location\":\"New York, NY\",\"unit\":\"celsius\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 195,
    "completion_tokens": 55,
    "total_tokens": 250
  }
}

If you did the usual curl|sh installation, logs can be retrieved with journalctl -u ollama --no-pager. See here for other methods.

Special tokens (like <|user|>) are defined on a model-by-model basis, and are processed through template expansion. There are no special tokens in the generic API handling.

<!-- gh-comment-id:2465868360 --> @rick-github commented on GitHub (Nov 8, 2024): Could you supply the script that you are using to call the server? If I feed my example to llama3.2, I get a tool call: ```console curl -s http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"llama3.2:3b-instruct-q4_K_M", "messages": [ { "role": "user", "content": "What is the weather like in Boston and New York?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } } ], "tool_choice": "auto", "stream":false }' | jq ``` ```json { "id": "chatcmpl-168", "object": "chat.completion", "created": 1731105942, "model": "llama3.2:3b-instruct-q4_K_M", "system_fingerprint": "fp_ollama", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_vxmlccki", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"location\":\"Boston, MA\",\"unit\":\"celsius\"}" } }, { "id": "call_abyfceyx", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"location\":\"New York, NY\",\"unit\":\"celsius\"}" } } ] }, "finish_reason": "tool_calls" } ], "usage": { "prompt_tokens": 195, "completion_tokens": 55, "total_tokens": 250 } } ``` If you did the usual `curl|sh` installation, logs can be retrieved with `journalctl -u ollama --no-pager`. See [here](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) for other methods. Special tokens (like `<|user|>`) are defined on a model-by-model basis, and are processed through template expansion. There are no special tokens in the generic API handling.
Author
Owner

@SinanAkkoyun commented on GitHub (Nov 8, 2024):

That's super strange, I just copied your curl, I will delete and repull 3.2

<!-- gh-comment-id:2465873487 --> @SinanAkkoyun commented on GitHub (Nov 8, 2024): That's super strange, I just copied your curl, I will delete and repull 3.2
Author
Owner

@SinanAkkoyun commented on GitHub (Nov 8, 2024):

Yup, I had some old llama3.2 model, thanks!

<!-- gh-comment-id:2465876944 --> @SinanAkkoyun commented on GitHub (Nov 8, 2024): Yup, I had some old llama3.2 model, thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66881