[GH-ISSUE #12187] GPT-OSS not completing tool calls #54617

New Issue

@lefoulkrod commented on GitHub (Sep 6, 2025):

Maybe related to https://github.com/ollama/ollama/issues/12203 (thinking output being put into tool call results)

@lefoulkrod commented on GitHub (Sep 6, 2025): Maybe related to https://github.com/ollama/ollama/issues/12203 (thinking output being put into tool call results)

GiteaMirror commented

@z0rb commented on GitHub (Sep 6, 2025):

This behavior happens when "Function calling" for the model is set to "Native" in Open WebUI.

When function calling is set to "Default", tool calls happen again, but they are unstable. E.g. the browser gets started but then no further navigation to the page.

For any other MCP server and tool I have tried, e.g. DuckDuckGo, the function calls work well with "Native".

@z0rb commented on GitHub (Sep 6, 2025): This behavior happens when "Function calling" for the model is set to "Native" in Open WebUI. When function calling is set to "Default", tool calls happen again, but they are unstable. E.g. the browser gets started but then no further navigation to the page. For any other MCP server and tool I have tried, e.g. DuckDuckGo, the function calls work well with "Native".

GiteaMirror commented

@raffaeler commented on GitHub (Sep 16, 2025):

I also am seeing the same thing.
The call is the following:

    response = chat(
        model=model,
        messages=messages,
        tools=[store_document_section],
        options={"max_tokens": 6000, "temperature": 0.8}
    )

If I use llama3.2, the tool is always called. When using gpt-oss the tool is never called.
In this test the prompt is hard-coded and explicitly ask to call the tool.

Ollama version is 0.11.10

@raffaeler commented on GitHub (Sep 16, 2025): I also am seeing the same thing. The call is the following: ``` response = chat( model=model, messages=messages, tools=[store_document_section], options={"max_tokens": 6000, "temperature": 0.8} ) ``` If I use `llama3.2`, the tool is always called. When using `gpt-oss` the tool is never called. In this test the prompt is hard-coded and explicitly ask to call the tool. Ollama version is `0.11.10`

GiteaMirror commented

2026-04-29 06:34:04 -05:00

@rick-github commented on GitHub (Sep 16, 2025):

Can you provide a full script that demonstrates the problem?

@rick-github commented on GitHub (Sep 16, 2025): Can you provide a full script that demonstrates the problem?

GiteaMirror commented

@raffaeler commented on GitHub (Sep 16, 2025):

@rick-github thanks for the prompt response
I also tried with 0.11.11 few seconds ago. I am now trying to create the smallest possible repro of the issue.
Does a notebook works as well for you?

@raffaeler commented on GitHub (Sep 16, 2025): @rick-github thanks for the prompt response I also tried with `0.11.11` few seconds ago. I am now trying to create the smallest possible repro of the issue. Does a notebook works as well for you?

GiteaMirror commented

2026-04-29 06:34:04 -05:00

@rick-github commented on GitHub (Sep 16, 2025):

Does a notebook works as well for you?

If that works best for you, sure.

@rick-github commented on GitHub (Sep 16, 2025): > Does a notebook works as well for you? If that works best for you, sure.

GiteaMirror commented

2026-04-29 06:34:04 -05:00

@raffaeler commented on GitHub (Sep 16, 2025):

@rick-github Here is the repro.
Notes:

the last cell contains the functions to run with gpt-oss or llama3.2
when using gpt-oss sometimes work. Strangely, when it works, the tool is called only once instead of multiple times as it happens with llama3.2.
the cell with containing the extract_chunks function currently use a global function. At lines 15-21, you can switch to using an instance function. When you do this, calling the function is more reliable.

Please do run this multiple times bu resetting the whole notebook every time.
Thank you

ollama_test.ipynb

@raffaeler commented on GitHub (Sep 16, 2025): @rick-github Here is the repro. Notes: - the last cell contains the functions to run with `gpt-oss` or `llama3.2` - when using `gpt-oss` **sometimes** work. Strangely, when it works, the tool is called only once instead of multiple times as it happens with `llama3.2`. - the cell with containing the `extract_chunks` function currently use a global function. At lines 15-21, you can switch to using an instance function. When you do this, calling the function is more reliable. Please do run this multiple times bu resetting the whole notebook every time. Thank you [ollama_test.ipynb](https://github.com/user-attachments/files/22362450/ollama_test.ipynb)

GiteaMirror commented

@rick-github commented on GitHub (Sep 16, 2025):

gpt-oss:20b or gpt-oss:120b? Was the gpt-oss model downloaded from the ollama library or imported from somewhere else?

@rick-github commented on GitHub (Sep 16, 2025): gpt-oss:20b or gpt-oss:120b? Was the gpt-oss model downloaded from the ollama library or imported from somewhere else?

GiteaMirror commented

@raffaeler commented on GitHub (Sep 16, 2025):

@rick-github It is the latest tag: gpt-oss:latest.
I downloaded it via ollama pull gpt-oss

@raffaeler commented on GitHub (Sep 16, 2025): @rick-github It is the `latest` tag: `gpt-oss:latest`. I downloaded it via `ollama pull gpt-oss`

GiteaMirror commented

@rick-github commented on GitHub (Sep 16, 2025):

What configuration variables are you running ollama with?

@rick-github commented on GitHub (Sep 16, 2025): What configuration variables are you running ollama with?

GiteaMirror commented

@raffaeler commented on GitHub (Sep 16, 2025):

Plain default values, no env vars set

@raffaeler commented on GitHub (Sep 16, 2025): Plain default values, no env vars set

GiteaMirror commented

@raffaeler commented on GitHub (Sep 17, 2025):

@rick-github Out of my curiosity, were you able to reproduce the issue?

@raffaeler commented on GitHub (Sep 17, 2025): @rick-github Out of my curiosity, were you able to reproduce the issue?

GiteaMirror commented

@rick-github commented on GitHub (Sep 17, 2025):

There are a few issues with the process.

You are using the defaults with a 4060, which means the context size used by the model is 4096 tokens. The template for gpt-oss is large, which uses up token space in the context buffer. gpt-oss is a reasoning model. meaning it generates reasoning tokens, further filling the context buffer. The net result is that the context buffer fills up and is shifted.

ollama  | time=2025-09-17T14:12:32.438Z level=DEBUG source=cache.go:280 msg="context limit hit - shifting" id=0 limit=4096 input=4096 keep=4 discard=2046

This can be impactful because the act of shifting buffer causes the loss of tokens from the head of the buffer, which is where the system message is stored. As a result the integrity of the system message is impaired and model is doing completions based on the remaining content, not the instructions.

This can be remedied by setting the size of the context buffer in the call:

--- 12187.py.orig	2025-09-17 15:49:07.063243867 +0200
+++ 12187.py	2025-09-17 16:10:09.837914523 +0200
@@ -172,7 +172,7 @@
         model=model,
         messages=messages,
         tools=tools,  
-        options={"max_tokens": 6000, "temperature": 0.8},
+        options={"num_ctx": 16384, "num_predict":6000, "temperature": 0.8},
     )
 
     if response.message.tool_calls:

As you noted above, gpt-oss only makes one tool call. This is because the model is not trained for parallel tool calling. In order for it to store multiple document records, the function must be written in a way that gpt-oss only needs to call it once.

--- 12187.py.orig	2025-09-17 15:49:07.063243867 +0200
+++ 12187.py	2025-09-17 17:00:05.602335918 +0200
@@ -99,21 +99,28 @@
 
 global_chunks = []
 
-def store_document_section(section_title: str, equivalent_concept: str) -> bool:
+from typing import List, Dict
+
+def store_document_section(data: List[Dict[str, str]]) -> bool:
     """
     Stores a portion of the document along with its distilled concepts
 
     Args:
-        section_title: The original heading of the section
-        equivalent_concept: A brief yet accurate summary of the section
+        data: a list of JSON objects of the form {"section_title": <The original heading of the section>, "equivalent_concept": <A brief yet accurate summary of the section>}
 
     Returns:
         bool: Whether or not the provided data was stored correctly
     """
 
-    global_chunks.append(
-        {"section_title": section_title, "equivalent_concept": equivalent_concept}
-    )
+    if not isinstance(data, list):
+      raise Exception("Not a list")
+    for d in data:
+      if "section_title" not in d:
+        raise Exception(f"Missing section title in `{d}`")
+      if "equivalent_concept" not in d:
+        raise Exception(f"Missing equivalent concept in `{d}`")
+
+    global_chunks.append(data)
     return True
 
 def get_system_prompt() -> str:

@@ -180,7 +187,10 @@
         for tool in response.message.tool_calls:
             # Ensure the function is available, and then call it
             if function_to_call := available_functions.get(tool.function.name):
-                output = function_to_call(**tool.function.arguments)
+                try:
+                  output = function_to_call(**tool.function.arguments)
+                except Exception as e:
+                  output = e
                 messages.append(response.message)
                 messages.append(
                     {

I'm not sure what the last chat call ("Final response") is trying to achieve. The call is made without tools, and since the instructions explicitly call for using a tool, the model hallucinates a tool call. For gpt-oss this results in an error logged in the server log and no output:

ollama  | time=2025-09-17T14:13:48.164Z level=WARN source=harmonyparser.go:401 msg="harmony parser: no reverse mapping found for function name" harmonyFunctionName=store_document_section

and llama3.2 returns the call in the content as a <|python_tag|> chunk.

@rick-github commented on GitHub (Sep 17, 2025): There are a few issues with the process. You are using the defaults with a [4060](https://github.com/ollama/ollama/issues/10956#issuecomment-3289637238), which means the context size used by the model is 4096 tokens. The [template](https://ollama.com/library/gpt-oss:20b/blobs/fa6710a93d78) for gpt-oss is large, which uses up token space in the context buffer. gpt-oss is a reasoning model. meaning it generates reasoning tokens, further filling the context buffer. The net result is that the context buffer fills up and is shifted. ``` ollama | time=2025-09-17T14:12:32.438Z level=DEBUG source=cache.go:280 msg="context limit hit - shifting" id=0 limit=4096 input=4096 keep=4 discard=2046 ``` This can be impactful because the act of shifting buffer causes the loss of tokens from the head of the buffer, which is where the system message is stored. As a result the integrity of the system message is impaired and model is doing completions based on the remaining content, not the instructions. This can be remedied by setting the size of the context buffer in the call: ```diff --- 12187.py.orig 2025-09-17 15:49:07.063243867 +0200 +++ 12187.py 2025-09-17 16:10:09.837914523 +0200 @@ -172,7 +172,7 @@ model=model, messages=messages, tools=tools, - options={"max_tokens": 6000, "temperature": 0.8}, + options={"num_ctx": 16384, "num_predict":6000, "temperature": 0.8}, ) if response.message.tool_calls: ``` As you noted above, gpt-oss only makes one tool call. This is because the model is not trained for parallel tool calling. In order for it to store multiple document records, the function must be written in a way that gpt-oss only needs to call it once. ```diff --- 12187.py.orig 2025-09-17 15:49:07.063243867 +0200 +++ 12187.py 2025-09-17 17:00:05.602335918 +0200 @@ -99,21 +99,28 @@ global_chunks = [] -def store_document_section(section_title: str, equivalent_concept: str) -> bool: +from typing import List, Dict + +def store_document_section(data: List[Dict[str, str]]) -> bool: """ Stores a portion of the document along with its distilled concepts Args: - section_title: The original heading of the section - equivalent_concept: A brief yet accurate summary of the section + data: a list of JSON objects of the form {"section_title": <The original heading of the section>, "equivalent_concept": <A brief yet accurate summary of the section>} Returns: bool: Whether or not the provided data was stored correctly """ - global_chunks.append( - {"section_title": section_title, "equivalent_concept": equivalent_concept} - ) + if not isinstance(data, list): + raise Exception("Not a list") + for d in data: + if "section_title" not in d: + raise Exception(f"Missing section title in `{d}`") + if "equivalent_concept" not in d: + raise Exception(f"Missing equivalent concept in `{d}`") + + global_chunks.append(data) return True def get_system_prompt() -> str: @@ -180,7 +187,10 @@ for tool in response.message.tool_calls: # Ensure the function is available, and then call it if function_to_call := available_functions.get(tool.function.name): - output = function_to_call(**tool.function.arguments) + try: + output = function_to_call(**tool.function.arguments) + except Exception as e: + output = e messages.append(response.message) messages.append( { ``` I'm not sure what the last `chat` call ("Final response") is trying to achieve. The call is made without tools, and since the instructions explicitly call for using a tool, the model hallucinates a tool call. For gpt-oss this results in an error logged in the server log and no output: ``` ollama | time=2025-09-17T14:13:48.164Z level=WARN source=harmonyparser.go:401 msg="harmony parser: no reverse mapping found for function name" harmonyFunctionName=store_document_section ``` and llama3.2 returns the call in the content as a `<|python_tag|>` chunk.

GiteaMirror commented

@z0rb commented on GitHub (Sep 17, 2025):

Since you have mentioned: "ollama | time=2025-09-17T14:13:48.164Z level=WARN source=harmonyparser.go:401 msg="harmony parser: no reverse mapping found for function name" harmonyFunctionName=store_document_section"

I started having correct tool calls in with Open WebUI and Ollama, when I changed the model config in Open WebUI from native tool calls to "default". But then the line that you have mentioned started appearing. Then it really calls a function and I can see its result, but it then hallucinates another tool call from the template and breaks off without a proper completion notification in Open WebUI.

I just didn't report back since my previous comment, since I wasn't sure if it is relevant. @raffaeler You might try the same change in your case, switching away from native tool calls. I just wouldn't know how to set the parameter outside of Open WebUI.

@z0rb commented on GitHub (Sep 17, 2025): Since you have mentioned: "ollama | time=2025-09-17T14:13:48.164Z level=WARN source=harmonyparser.go:401 msg="harmony parser: no reverse mapping found for function name" harmonyFunctionName=store_document_section" I started having correct tool calls in with Open WebUI and Ollama, when I changed the model config in Open WebUI from native tool calls to "default". But then the line that you have mentioned started appearing. Then it really calls a function and I can see its result, but it then hallucinates another tool call from the template and breaks off without a proper completion notification in Open WebUI. I just didn't report back since my previous comment, since I wasn't sure if it is relevant. @raffaeler You might try the same change in your case, switching away from native tool calls. I just wouldn't know how to set the parameter outside of Open WebUI.

GiteaMirror commented

2026-04-29 06:34:07 -05:00

@rick-github commented on GitHub (Sep 17, 2025):

Can you provide an example of a tool that you use in OpenWebUI that changes success rate based on native/default tool calling?

@rick-github commented on GitHub (Sep 17, 2025): Can you provide an example of a tool that you use in OpenWebUI that changes success rate based on native/default tool calling?

GiteaMirror commented

@raffaeler commented on GitHub (Sep 17, 2025):

@rick-github Thank you very much for the detailed explanation.
Anyway, this raises a number of question marks in my head:

The "overflow" is something that popped into my mind at a certain point. Shouldn't Ollama detect this and provide an error?
Is the 16384 for num_ctx an estimation/guess or anything else?
The official OpenAI and Hugging Face model cards for GPT-OSS are very scarce. How did you get the parameters? (context length and parallel tool support?) Can I read them through Ollama?
I read that Llama3.2 does not support parallel tool calling as well. Anyway, if I use llama3.2 the tool is called multiple times (one for each piece of the document). Why?
The last call comes from my experiments using OpenAI endpoints. When I don't specify anything, the model typically returns all the content already provided to the tool which is just a waste of tokens. While I could stop before, the idea is to let the model tell if there were issues with the document.

With regards to making a single call, this is something I experimented using OpenAI endpoints and i could see far better results in cycling the tool calls rather than trying to "digest" the document with a single tool call. I will have to find a different strategy (or changing model).

Thanks for the patience

@raffaeler commented on GitHub (Sep 17, 2025): @rick-github Thank you very much for the detailed explanation. Anyway, this raises a number of question marks in my head: - The "overflow" is something that popped into my mind at a certain point. Shouldn't Ollama detect this and provide an error? - Is the `16384` for `num_ctx` an estimation/guess or anything else? - The official OpenAI and Hugging Face model cards for GPT-OSS are very scarce. How did you get the parameters? (context length and parallel tool support?) Can I read them through Ollama? - I read that Llama3.2 does not support parallel tool calling as well. Anyway, if I use llama3.2 the tool is called multiple times (one for each piece of the document). Why? - The last call comes from my experiments using OpenAI endpoints. When I don't specify anything, the model typically returns all the content already provided to the tool which is just a waste of tokens. While I could stop before, the idea is to let the model tell if there were issues with the document. With regards to making a single call, this is something I experimented using OpenAI endpoints and i could see far better results in cycling the tool calls rather than trying to "digest" the document with a single tool call. I will have to find a different strategy (or changing model). Thanks for the patience

GiteaMirror commented

2026-04-29 06:34:07 -05:00

@rick-github commented on GitHub (Sep 17, 2025):

@rick-github Thank you very much for the detailed explanation. Anyway, this raises a number of question marks in my head:

The "overflow" is something that popped into my mind at a certain point. Shouldn't Ollama detect this and provide an error?

It's a feature, not a bug. There are use cases where the client doesn't provide system instructions and just wants tokens. The effect of shifting can be mitigated by specifying num_predict, which stops generation at the given token count. If the number of input tokens + number of output tokens < num_predict, no shift occurs. This requires knowing how many tokens your input tokenizes to, so is somewhat approximate. I have a PR which allows maximum use of the context buffer but sadly the PR has languished in the review queue for 6 months.

Is the 16384 for num_ctx an estimation/guess or anything else?

Estimation. The input is around 5K characters and is converted to around 2K tokens. Your original script set max_tokens to 6K, so about 8K tokens in total. Since gpt-oss is a reasoning model, I doubled it to leave room for reasoning tokens.

The official OpenAI and Hugging Face model cards for GPT-OSS are very scarce. How did you get the parameters? (context length and parallel tool support?) Can I read them through Ollama?

The maximum context length supported by a model is available in ollama show <model>, look for "context length". The context length that ollama uses is configurable, see here.

Model capabilities are a result of training and templating and are not always easily determined. Capabilities like thought and tool use are again available via ollama show, or through the API in /api/show. Whether the model supports parallel tool calls is not available, I determined that with testing.

I read that Llama3.2 does not support parallel tool calling as well. Anyway, if I use llama3.2 the tool is called multiple times (one for each piece of the document). Why?

llama3.2 is obviously capable of parallel tool calling so what you read is incorrect. The model processes the sections and generates a tool call for each section.

@rick-github commented on GitHub (Sep 17, 2025): > [@rick-github](https://github.com/rick-github) Thank you very much for the detailed explanation. Anyway, this raises a number of question marks in my head: > > * The "overflow" is something that popped into my mind at a certain point. Shouldn't Ollama detect this and provide an error? It's a feature, not a bug. There are use cases where the client doesn't provide system instructions and just wants tokens. The effect of shifting can be mitigated by specifying `num_predict`, which stops generation at the given token count. If the number of input tokens + number of output tokens < num_predict, no shift occurs. This requires knowing how many tokens your input tokenizes to, so is somewhat approximate. I have a [PR](https://github.com/ollama/ollama/pull/9547) which allows maximum use of the context buffer but sadly the PR has languished in the review queue for 6 months. > * Is the `16384` for `num_ctx` an estimation/guess or anything else? Estimation. The input is around 5K characters and is converted to around 2K tokens. Your original script set `max_tokens` to 6K, so about 8K tokens in total. Since gpt-oss is a reasoning model, I doubled it to leave room for reasoning tokens. > * The official OpenAI and Hugging Face model cards for GPT-OSS are very scarce. How did you get the parameters? (context length and parallel tool support?) Can I read them through Ollama? The maximum context length supported by a model is available in `ollama show <model>`, look for "context length". The context length that ollama uses is configurable, see [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size). Model capabilities are a result of training and templating and are not always easily determined. Capabilities like thought and tool use are again available via `ollama show`, or through the API in `/api/show`. Whether the model supports parallel tool calls is not available, I determined that with testing. > * I read that Llama3.2 does not support parallel tool calling as well. Anyway, if I use llama3.2 the tool is called multiple times (one for each piece of the document). Why? llama3.2 is obviously capable of parallel tool calling so what you read is incorrect. The model processes the sections and generates a tool call for each section.

GiteaMirror commented

2026-04-29 06:34:07 -05:00

@raffaeler commented on GitHub (Sep 17, 2025):

Thanks again @rick-github, precious info!

The info for the gpt-oss are:

$ ollama show gpt-oss
  Model
    architecture        gptoss    
    parameters          20.9B     
    context length      131072    
    embedding length    2880      
    quantization        MXFP4     

  Capabilities
    completion    
    tools         
    thinking      

  Parameters
    temperature    1    

  License
    Apache License               
    Version 2.0, January 2004    
    ...                     


$ ollama show llama3.2
  Model
    architecture        llama     
    parameters          3.2B      
    context length      131072    
    embedding length    3072      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         

  Parameters
    stop    "<|start_header_id|>"    
    stop    "<|end_header_id|>"      
    stop    "<|eot_id|>"             

  License
    LLAMA 3.2 COMMUNITY LICENSE AGREEMENT                 
    Llama 3.2 Version Release Date: September 25, 2024    
    ...

I don't see a difference about parallel tools on the two models
I see 128K context, the template is roughly 7.5K and you previously said:

You are using the defaults with a https://github.com/ollama/ollama/issues/10956#issuecomment-3289637238, which means the context size used by the model is 4096 tokens.

This should leave more than 4K for my content. Isn't it?
I also don't get how this depend on the GPU I am using.

Thanks!

@raffaeler commented on GitHub (Sep 17, 2025): Thanks again @rick-github, precious info! The info for the `gpt-oss` are: ``` $ ollama show gpt-oss Model architecture gptoss parameters 20.9B context length 131072 embedding length 2880 quantization MXFP4 Capabilities completion tools thinking Parameters temperature 1 License Apache License Version 2.0, January 2004 ... $ ollama show llama3.2 Model architecture llama parameters 3.2B context length 131072 embedding length 3072 quantization Q4_K_M Capabilities completion tools Parameters stop "<|start_header_id|>" stop "<|end_header_id|>" stop "<|eot_id|>" License LLAMA 3.2 COMMUNITY LICENSE AGREEMENT Llama 3.2 Version Release Date: September 25, 2024 ... ``` - I don't see a difference about parallel tools on the two models - I see `128K` context, the template is roughly 7.5K and you previously said: > You are using the defaults with a https://github.com/ollama/ollama/issues/10956#issuecomment-3289637238, which means the context size used by the model is 4096 tokens. This should leave more than 4K for my content. Isn't it? I also don't get how this depend on the GPU I am using. Thanks!

GiteaMirror commented

@rick-github commented on GitHub (Sep 17, 2025):

I don't see a difference about parallel tools on the two models

Correct.

I see 128K context, the template is roughly 7.5K and you previously said:

You are using the defaults with a #10956 (comment), which means the context size used by the model is 4096 tokens.

This should leave more than 4K for my content. Isn't it? I also don't get how this depend on the GPU I am using.

128k is the maximum context of the model. The context that ollama uses is configurable, see here. The default context (ie, not configured in the environment, Modelfile or API call) is 4096 tokens. gpt-oss is a special case in that the template is large, so the default depends on how much VRAM your GPU has. If the GPU has > 20G VRAM, the default context for gpt-oss is 8k. If not, the default is the same as for other models, 4k.

@rick-github commented on GitHub (Sep 17, 2025): > * I don't see a difference about parallel tools on the two models Correct. > * I see `128K` context, the template is roughly 7.5K and you previously said: > > > You are using the defaults with a [#10956 (comment)](https://github.com/ollama/ollama/issues/10956#issuecomment-3289637238), which means the context size used by the model is 4096 tokens. > > This should leave more than 4K for my content. Isn't it? I also don't get how this depend on the GPU I am using. 128k is the maximum context of the model. The context that ollama uses is configurable, see [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size). The default context (ie, not configured in the environment, Modelfile or API call) is 4096 tokens. gpt-oss is a special case in that the template is large, so the default depends on how much VRAM your GPU has. If the GPU has > 20G VRAM, the default context for gpt-oss is 8k. If not, the default is the same as for other models, 4k.

GiteaMirror commented

@raffaeler commented on GitHub (Sep 17, 2025):

@rick-github Got it, thanks again.

@raffaeler commented on GitHub (Sep 17, 2025): @rick-github Got it, thanks again.

GiteaMirror commented

@formigarafa commented on GitHub (Sep 26, 2025):

Here is an example that fails similarly when Python tools are enabled.

curl using openai format. (but same happens using ollama api format)

$ curl 'http://localhost:11434/v1/chat/completions'   -H 'Content-Type: application/json'   --data-raw '{
    "model": "gpt-oss:20b",
    "messages":[
      {
        "role":"user",
        "content": "Considering that the 1st number in the Fibonacci sequence is 1, calculate the exact value for the 53rd number in the Fibonacci sequence."
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "python"
        }
      }
    ],
    "stream":false
  }'

Response: 500 Internal Server Error

{
  "error": {
    "message": "error parsing tool call: raw=def fib(n):\n    a,b=1,1\n    for _ in range(3,n+1):\n        a,b=b,a+b\n    return b\nprint(fib(53))\n, err=invalid character d looking for beginning of value",
    "type": "api_error",
    "param": null,
    "code": null
  }
}

@formigarafa commented on GitHub (Sep 26, 2025): Here is an example that fails similarly when Python tools are enabled. curl using openai format. (but same happens using ollama api format) ```sh $ curl 'http://localhost:11434/v1/chat/completions' -H 'Content-Type: application/json' --data-raw '{ "model": "gpt-oss:20b", "messages":[ { "role":"user", "content": "Considering that the 1st number in the Fibonacci sequence is 1, calculate the exact value for the 53rd number in the Fibonacci sequence." } ], "tools": [ { "type": "function", "function": { "name": "python" } } ], "stream":false }' ``` Response: 500 Internal Server Error ```json { "error": { "message": "error parsing tool call: raw=def fib(n):\n a,b=1,1\n for _ in range(3,n+1):\n a,b=b,a+b\n return b\nprint(fib(53))\n, err=invalid character d looking for beginning of value", "type": "api_error", "param": null, "code": null } } ```

GiteaMirror commented