[GH-ISSUE #12064] Tool call parsing errors #70071

Open
opened 2026-05-04 20:15:57 -05:00 by GiteaMirror · 27 comments
Owner

Originally created by @lefoulkrod on GitHub (Aug 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12064

Originally assigned to: @drifkin, @jmorganca, @ParthSareen on GitHub.

What is the issue?

ollama 0.11.6
gpt-oss:120b

I get frequent errors from ollama ollama._types.ResponseError: error parsing tool call. These seem to occur mostly when a tool call is being generated for a write_file tool that I have.

Here is a truncated console log from my application

ERROR: agents.ollama.sdk.tool_loop: Unhandled exception in tool loop Message: error parsing tool call Raw: {"path":"src/pipe.py","content":"...[truncated]...","encoding":"utf-8"} Error: invalid character ']' after object key:value pair Status Code: 500 Traceback: File "/home/larry/repos/computron_9000/agents/ollama/sdk/tool_loop.py", line 80, in run_tool_call_loop response = await client.chat(...) File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 854, in chat return await self._request(...) File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 692, in _request return cls(**(await self._request_raw(...)).json()) File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 636, in _request_raw raise ResponseError(e.response.text, e.response.status_code) ollama._types.ResponseError: error parsing tool call: invalid character ']' after object key:value pair (status code: 500)

This is the non-truncated portion of the error that shows the invalid json syntax being returned for the tool call.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.11.6

Originally created by @lefoulkrod on GitHub (Aug 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12064 Originally assigned to: @drifkin, @jmorganca, @ParthSareen on GitHub. ### What is the issue? ollama 0.11.6 gpt-oss:120b I get frequent errors from ollama `ollama._types.ResponseError: error parsing tool call`. These seem to occur mostly when a tool call is being generated for a write_file tool that I have. Here is a truncated console log from my application ``` ERROR: agents.ollama.sdk.tool_loop: Unhandled exception in tool loop Message: error parsing tool call Raw: {"path":"src/pipe.py","content":"...[truncated]...","encoding":"utf-8"} Error: invalid character ']' after object key:value pair Status Code: 500 Traceback: File "/home/larry/repos/computron_9000/agents/ollama/sdk/tool_loop.py", line 80, in run_tool_call_loop response = await client.chat(...) File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 854, in chat return await self._request(...) File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 692, in _request return cls(**(await self._request_raw(...)).json()) File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 636, in _request_raw raise ResponseError(e.response.text, e.response.status_code) ollama._types.ResponseError: error parsing tool call: invalid character ']' after object key:value pair (status code: 500) ``` This is the non-truncated portion of the error that shows the invalid json syntax being returned for the tool call. ```return self.pipes\n\n def draw(self, surface: pygame.Surface) -> None:\n """Draw all active pipes onto the supplied surface.\n\n Parameters\n ----------\n surface : pygame.Surface\n The surface to render the pipes onto.\n """\n\n for pipe in self.pipes:\n pipe.draw(surface)\n"], "encoding":"utf-8"}', err=invalid character ']' after object key:value pair (status code: 500) ``` ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.11.6
GiteaMirror added the bug label 2026-05-04 20:15:57 -05:00
Author
Owner

@onestardao commented on GitHub (Aug 25, 2025):

this error usually means the tool-call JSON coming back isn’t valid, so the parser chokes. it’s not your code, it’s the model drifting and adding extra text around the JSON. i’ve got a checklist for this kind of parsing/formatting failure (ProblemMap No.2). do you want me to share the link?

<!-- gh-comment-id:3218633789 --> @onestardao commented on GitHub (Aug 25, 2025): this error usually means the tool-call JSON coming back isn’t valid, so the parser chokes. it’s not your code, it’s the model drifting and adding extra text around the JSON. i’ve got a checklist for this kind of parsing/formatting failure (ProblemMap No.2). do you want me to share the link?
Author
Owner

@rick-github commented on GitHub (Aug 25, 2025):

This is the non-truncated portion of the error that shows the invalid json syntax being returned for the tool call.

Can you add the bit with the invalid JSON syntax?

<!-- gh-comment-id:3220235844 --> @rick-github commented on GitHub (Aug 25, 2025): > This is the non-truncated portion of the error that shows the invalid json syntax being returned for the tool call. Can you add the bit with the invalid JSON syntax?
Author
Owner

@lefoulkrod commented on GitHub (Aug 26, 2025):

clean up this error log so that its easier to see the actual error. truncate text but dont truncate the failing text ERROR:agents.ollama.sdk.tool_loop:Unhandled exception in tool loop: error parsing tool call: raw='{"path":"src/pipe.py","content":""""Pipe entity and manager for the Flappy Bird style game.\n\nThis module defines two classes:\n\n* :class:Pipe – Represents a pair of top and bottom pipe sections with a\n configurable vertical gap.\n* :class:PipeManager – Handles spawning, updating and rendering of multiple\n :class:Pipe instances.\n\nAll behaviour is driven by constants defined in :mod:src.config.\n"""\n\nfrom future import annotations\n\nimport random\nfrom typing import List\n\nimport pygame\n\nfrom src.config import (\n PIPE_COLOR,\n PIPE_GAP_MAX,\n PIPE_GAP_MIN,\n PIPE_SPEED,\n PIPE_SPAWN_INTERVAL,\n PIPE_WIDTH,\n SCREEN_HEIGHT,\n SCREEN_WIDTH,\n)\n\n\nclass Pipe:\n """A pair of pipe rectangles with a vertical gap.\n\n Parameters\n ----------\n x : int\n The horizontal position (left edge) where the pipe pair is created.\n """\n\n def init(self, x: int) -> None:\n # Store the x coordinate as a float for sub‑pixel movement.\n self._x: float = float(x)\n\n # Choose a random gap size between the configured minimum and maximum.\n gap_size: int = random.randint(PIPE_GAP_MIN, PIPE_GAP_MAX)\n # Ensure the gap fits inside the screen vertically.\n max_gap_top: int = SCREEN_HEIGHT - gap_size\n gap_top: int = random.randint(0, max_gap_top)\n\n # Create the top and bottom rectangles.\n self.top_rect: pygame.Rect = pygame.Rect(\n int(self._x), 0, PIPE_WIDTH, gap_top\n )\n self.bottom_rect: pygame.Rect = pygame.Rect(\n int(self._x),\n gap_top + gap_size,\n PIPE_WIDTH,\n SCREEN_HEIGHT - (gap_top + gap_size),\n )\n\n # ---------------------------------------------------------------------\n # Public API\n # ---------------------------------------------------------------------\n @property\n def rects(self) -> List[pygame.Rect]:\n """Return a list containing the top and bottom rectangles.\n\n The list order is [top_rect, bottom_rect] which matches the\n expectation of collision‑checking code.\n """\n\n return [self.top_rect, self.bottom_rect]\n\n def update(self, dt: float) -> None:\n """Move the pipe leftward according to the configured speed.\n\n Parameters\n ----------\n dt : float\n Time step in seconds since the last update.\n """\n\n # Update the stored x coordinate with sub‑pixel precision.\n self._x -= PIPE_SPEED * dt\n # Sync the pygame.Rect objects (they store integer coordinates).\n int_x = int(self._x)\n self.top_rect.x = int_x\n self.bottom_rect.x = int_x\n\n def draw(self, surface: pygame.Surface) -> None:\n """Render both pipe sections onto the given surface.\n\n Parameters\n ----------\n surface : pygame.Surface\n The surface to draw the pipe on.\n """\n\n pygame.draw.rect(surface, PIPE_COLOR, self.top_rect)\n pygame.draw.rect(surface, PIPE_COLOR, self.bottom_rect)\n\n\nclass PipeManager:\n """Manage a collection of :class:Pipe objects.\n\n The manager is responsible for spawning new pipes at regular intervals,\n updating existing pipes, removing those that have moved off‑screen and\n delegating drawing to each pipe.\n """\n\n def init(self) -> None:\n self.pipes: List[Pipe] = []\n self._time_since_last_spawn: float = 0.0\n\n def update(self, dt: float) -> List[Pipe]:\n """Update all managed pipes and handle spawning/removal.\n\n Parameters\n ----------\n dt : float\n Time step in seconds since the last update.\n\n Returns\n -------\n List[Pipe]\n The current list of active pipes after cleanup.\n """\n\n # Spawn new pipe if the interval has elapsed.\n self._time_since_last_spawn += dt\n while self._time_since_last_spawn >= PIPE_SPAWN_INTERVAL:\n self.pipes.append(Pipe(SCREEN_WIDTH))\n self._time_since_last_spawn -= PIPE_SPAWN_INTERVAL\n\n # Update existing pipes.\n for pipe in self.pipes:\n pipe.update(dt)\n\n # Remove pipes that have completely moved off the left side of the screen.\n self.pipes = [p for p in self.pipes if p.top_rect.right >= 0]\n return self.pipes\n\n def draw(self, surface: pygame.Surface) -> None:\n """Draw all active pipes onto the supplied surface.\n\n Parameters\n ----------\n surface : pygame.Surface\n The surface to render the pipes onto.\n """\n\n for pipe in self.pipes:\n pipe.draw(surface)\n"], "encoding":"utf-8"}', err=invalid character ']' after object key:value pair (status code: 500)
Traceback (most recent call last):
File "/home/larry/repos/computron_9000/agents/ollama/sdk/tool_loop.py", line 80, in run_tool_call_loop
response = await client.chat(
^^^^^^^^^^^^^^^^^^
File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 854, in chat
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 692, in _request
return cls(**(await self._request_raw(*args, kwargs)).json())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 636, in _request_raw
raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: error parsing tool call: raw='{"path":"src/pipe.py","content":""""Pipe entity and manager for the Flappy Bird style game.\n\nThis module defines two classes:\n\n
:class:Pipe – Represents a pair of top and bottom pipe sections with a\n configurable vertical gap.\n
:class:PipeManager – Handles spawning, updating and rendering of multiple\n :class:Pipe instances.\n\nAll behaviour is driven by constants defined in :mod:src.config.\n"""\n\nfrom future import annotations\n\nimport random\nfrom typing import List\n\nimport pygame\n\nfrom src.config import (\n PIPE_COLOR,\n PIPE_GAP_MAX,\n PIPE_GAP_MIN,\n PIPE_SPEED,\n PIPE_SPAWN_INTERVAL,\n PIPE_WIDTH,\n SCREEN_HEIGHT,\n SCREEN_WIDTH,\n)\n\n\nclass Pipe:\n """A pair of pipe rectangles with a vertical gap.\n\n Parameters\n ----------\n x : int\n The horizontal position (left edge) where the pipe pair is created.\n """\n\n def init(self, x: int) -> None:\n # Store the x coordinate as a float for sub‑pixel movement.\n self._x: float = float(x)\n\n # Choose a random gap size between the configured minimum and maximum.\n gap_size: int = random.randint(PIPE_GAP_MIN, PIPE_GAP_MAX)\n # Ensure the gap fits inside the screen vertically.\n max_gap_top: int = SCREEN_HEIGHT - gap_size\n gap_top: int = random.randint(0, max_gap_top)\n\n # Create the top and bottom rectangles.\n self.top_rect: pygame.Rect = pygame.Rect(\n int(self._x), 0, PIPE_WIDTH, gap_top\n )\n self.bottom_rect: pygame.Rect = pygame.Rect(\n int(self._x),\n gap_top + gap_size,\n PIPE_WIDTH,\n SCREEN_HEIGHT - (gap_top + gap_size),\n )\n\n # ---------------------------------------------------------------------\n # Public API\n # ---------------------------------------------------------------------\n @property\n def rects(self) -> List[pygame.Rect]:\n """Return a list containing the top and bottom rectangles.\n\n The list order is [top_rect, bottom_rect] which matches the\n expectation of collision‑checking code.\n """\n\n return [self.top_rect, self.bottom_rect]\n\n def update(self, dt: float) -> None:\n """Move the pipe leftward according to the configured speed.\n\n Parameters\n ----------\n dt : float\n Time step in seconds since the last update.\n """\n\n # Update the stored x coordinate with sub‑pixel precision.\n self._x -= PIPE_SPEED * dt\n # Sync the pygame.Rect objects (they store integer coordinates).\n int_x = int(self._x)\n self.top_rect.x = int_x\n self.bottom_rect.x = int_x\n\n def draw(self, surface: pygame.Surface) -> None:\n """Render both pipe sections onto the given surface.\n\n Parameters\n ----------\n surface : pygame.Surface\n The surface to draw the pipe on.\n """\n\n pygame.draw.rect(surface, PIPE_COLOR, self.top_rect)\n pygame.draw.rect(surface, PIPE_COLOR, self.bottom_rect)\n\n\nclass PipeManager:\n """Manage a collection of :class:Pipe objects.\n\n The manager is responsible for spawning new pipes at regular intervals,\n updating existing pipes, removing those that have moved off‑screen and\n delegating drawing to each pipe.\n """\n\n def init(self) -> None:\n self.pipes: List[Pipe] = []\n self._time_since_last_spawn: float = 0.0\n\n def update(self, dt: float) -> List[Pipe]:\n """Update all managed pipes and handle spawning/removal.\n\n Parameters\n ----------\n dt : float\n Time step in seconds since the last update.\n\n Returns\n -------\n List[Pipe]\n The current list of active pipes after cleanup.\n """\n\n # Spawn new pipe if the interval has elapsed.\n self._time_since_last_spawn += dt\n while self._time_since_last_spawn >= PIPE_SPAWN_INTERVAL:\n self.pipes.append(Pipe(SCREEN_WIDTH))\n self._time_since_last_spawn -= PIPE_SPAWN_INTERVAL\n\n # Update existing pipes.\n for pipe in self.pipes:\n pipe.update(dt)\n\n # Remove pipes that have completely moved off the left side of the screen.\n self.pipes = [p for p in self.pipes if p.top_rect.right >= 0]\n return self.pipes\n\n def draw(self, surface: pygame.Surface) -> None:\n """Draw all active pipes onto the supplied surface.\n\n Parameters\n ----------\n surface : pygame.Surface\n The surface to render the pipes onto.\n """\n\n for pipe in self.pipes:\n pipe.draw(surface)\n"], "encoding":"utf-8"}', err=invalid character ']' after object key:value pair (status code: 500)

<!-- gh-comment-id:3222190790 --> @lefoulkrod commented on GitHub (Aug 26, 2025): clean up this error log so that its easier to see the actual error. truncate text but dont truncate the failing text ERROR:agents.ollama.sdk.tool_loop:Unhandled exception in tool loop: error parsing tool call: raw='{"path":"src/pipe.py","content":"\"\"\"Pipe entity and manager for the Flappy Bird style game.\n\nThis module defines two classes:\n\n* :class:`Pipe` – Represents a pair of top and bottom pipe sections with a\n configurable vertical gap.\n* :class:`PipeManager` – Handles spawning, updating and rendering of multiple\n :class:`Pipe` instances.\n\nAll behaviour is driven by constants defined in :mod:`src.config`.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport random\nfrom typing import List\n\nimport pygame\n\nfrom src.config import (\n PIPE_COLOR,\n PIPE_GAP_MAX,\n PIPE_GAP_MIN,\n PIPE_SPEED,\n PIPE_SPAWN_INTERVAL,\n PIPE_WIDTH,\n SCREEN_HEIGHT,\n SCREEN_WIDTH,\n)\n\n\nclass Pipe:\n \"\"\"A pair of pipe rectangles with a vertical gap.\n\n Parameters\n ----------\n x : int\n The horizontal position (left edge) where the pipe pair is created.\n \"\"\"\n\n def __init__(self, x: int) -> None:\n # Store the x coordinate as a float for sub‑pixel movement.\n self._x: float = float(x)\n\n # Choose a random gap size between the configured minimum and maximum.\n gap_size: int = random.randint(PIPE_GAP_MIN, PIPE_GAP_MAX)\n # Ensure the gap fits inside the screen vertically.\n max_gap_top: int = SCREEN_HEIGHT - gap_size\n gap_top: int = random.randint(0, max_gap_top)\n\n # Create the top and bottom rectangles.\n self.top_rect: pygame.Rect = pygame.Rect(\n int(self._x), 0, PIPE_WIDTH, gap_top\n )\n self.bottom_rect: pygame.Rect = pygame.Rect(\n int(self._x),\n gap_top + gap_size,\n PIPE_WIDTH,\n SCREEN_HEIGHT - (gap_top + gap_size),\n )\n\n # ---------------------------------------------------------------------\n # Public API\n # ---------------------------------------------------------------------\n @property\n def rects(self) -> List[pygame.Rect]:\n \"\"\"Return a list containing the top and bottom rectangles.\n\n The list order is ``[top_rect, bottom_rect]`` which matches the\n expectation of collision‑checking code.\n \"\"\"\n\n return [self.top_rect, self.bottom_rect]\n\n def update(self, dt: float) -> None:\n \"\"\"Move the pipe leftward according to the configured speed.\n\n Parameters\n ----------\n dt : float\n Time step in seconds since the last update.\n \"\"\"\n\n # Update the stored x coordinate with sub‑pixel precision.\n self._x -= PIPE_SPEED * dt\n # Sync the pygame.Rect objects (they store integer coordinates).\n int_x = int(self._x)\n self.top_rect.x = int_x\n self.bottom_rect.x = int_x\n\n def draw(self, surface: pygame.Surface) -> None:\n \"\"\"Render both pipe sections onto the given surface.\n\n Parameters\n ----------\n surface : pygame.Surface\n The surface to draw the pipe on.\n \"\"\"\n\n pygame.draw.rect(surface, PIPE_COLOR, self.top_rect)\n pygame.draw.rect(surface, PIPE_COLOR, self.bottom_rect)\n\n\nclass PipeManager:\n \"\"\"Manage a collection of :class:`Pipe` objects.\n\n The manager is responsible for spawning new pipes at regular intervals,\n updating existing pipes, removing those that have moved off‑screen and\n delegating drawing to each pipe.\n \"\"\"\n\n def __init__(self) -> None:\n self.pipes: List[Pipe] = []\n self._time_since_last_spawn: float = 0.0\n\n def update(self, dt: float) -> List[Pipe]:\n \"\"\"Update all managed pipes and handle spawning/removal.\n\n Parameters\n ----------\n dt : float\n Time step in seconds since the last update.\n\n Returns\n -------\n List[Pipe]\n The current list of active pipes after cleanup.\n \"\"\"\n\n # Spawn new pipe if the interval has elapsed.\n self._time_since_last_spawn += dt\n while self._time_since_last_spawn >= PIPE_SPAWN_INTERVAL:\n self.pipes.append(Pipe(SCREEN_WIDTH))\n self._time_since_last_spawn -= PIPE_SPAWN_INTERVAL\n\n # Update existing pipes.\n for pipe in self.pipes:\n pipe.update(dt)\n\n # Remove pipes that have completely moved off the left side of the screen.\n self.pipes = [p for p in self.pipes if p.top_rect.right >= 0]\n return self.pipes\n\n def draw(self, surface: pygame.Surface) -> None:\n \"\"\"Draw all active pipes onto the supplied surface.\n\n Parameters\n ----------\n surface : pygame.Surface\n The surface to render the pipes onto.\n \"\"\"\n\n for pipe in self.pipes:\n pipe.draw(surface)\n"], "encoding":"utf-8"}', err=invalid character ']' after object key:value pair (status code: 500) Traceback (most recent call last): File "/home/larry/repos/computron_9000/agents/ollama/sdk/tool_loop.py", line 80, in run_tool_call_loop response = await client.chat( ^^^^^^^^^^^^^^^^^^ File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 854, in chat return await self._request( ^^^^^^^^^^^^^^^^^^^^ File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 692, in _request return cls(**(await self._request_raw(*args, **kwargs)).json()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/larry/repos/computron_9000/.venv/lib/python3.12/site-packages/ollama/_client.py", line 636, in _request_raw raise ResponseError(e.response.text, e.response.status_code) from None ollama._types.ResponseError: error parsing tool call: raw='{"path":"src/pipe.py","content":"\"\"\"Pipe entity and manager for the Flappy Bird style game.\n\nThis module defines two classes:\n\n* :class:`Pipe` – Represents a pair of top and bottom pipe sections with a\n configurable vertical gap.\n* :class:`PipeManager` – Handles spawning, updating and rendering of multiple\n :class:`Pipe` instances.\n\nAll behaviour is driven by constants defined in :mod:`src.config`.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport random\nfrom typing import List\n\nimport pygame\n\nfrom src.config import (\n PIPE_COLOR,\n PIPE_GAP_MAX,\n PIPE_GAP_MIN,\n PIPE_SPEED,\n PIPE_SPAWN_INTERVAL,\n PIPE_WIDTH,\n SCREEN_HEIGHT,\n SCREEN_WIDTH,\n)\n\n\nclass Pipe:\n \"\"\"A pair of pipe rectangles with a vertical gap.\n\n Parameters\n ----------\n x : int\n The horizontal position (left edge) where the pipe pair is created.\n \"\"\"\n\n def __init__(self, x: int) -> None:\n # Store the x coordinate as a float for sub‑pixel movement.\n self._x: float = float(x)\n\n # Choose a random gap size between the configured minimum and maximum.\n gap_size: int = random.randint(PIPE_GAP_MIN, PIPE_GAP_MAX)\n # Ensure the gap fits inside the screen vertically.\n max_gap_top: int = SCREEN_HEIGHT - gap_size\n gap_top: int = random.randint(0, max_gap_top)\n\n # Create the top and bottom rectangles.\n self.top_rect: pygame.Rect = pygame.Rect(\n int(self._x), 0, PIPE_WIDTH, gap_top\n )\n self.bottom_rect: pygame.Rect = pygame.Rect(\n int(self._x),\n gap_top + gap_size,\n PIPE_WIDTH,\n SCREEN_HEIGHT - (gap_top + gap_size),\n )\n\n # ---------------------------------------------------------------------\n # Public API\n # ---------------------------------------------------------------------\n @property\n def rects(self) -> List[pygame.Rect]:\n \"\"\"Return a list containing the top and bottom rectangles.\n\n The list order is ``[top_rect, bottom_rect]`` which matches the\n expectation of collision‑checking code.\n \"\"\"\n\n return [self.top_rect, self.bottom_rect]\n\n def update(self, dt: float) -> None:\n \"\"\"Move the pipe leftward according to the configured speed.\n\n Parameters\n ----------\n dt : float\n Time step in seconds since the last update.\n \"\"\"\n\n # Update the stored x coordinate with sub‑pixel precision.\n self._x -= PIPE_SPEED * dt\n # Sync the pygame.Rect objects (they store integer coordinates).\n int_x = int(self._x)\n self.top_rect.x = int_x\n self.bottom_rect.x = int_x\n\n def draw(self, surface: pygame.Surface) -> None:\n \"\"\"Render both pipe sections onto the given surface.\n\n Parameters\n ----------\n surface : pygame.Surface\n The surface to draw the pipe on.\n \"\"\"\n\n pygame.draw.rect(surface, PIPE_COLOR, self.top_rect)\n pygame.draw.rect(surface, PIPE_COLOR, self.bottom_rect)\n\n\nclass PipeManager:\n \"\"\"Manage a collection of :class:`Pipe` objects.\n\n The manager is responsible for spawning new pipes at regular intervals,\n updating existing pipes, removing those that have moved off‑screen and\n delegating drawing to each pipe.\n \"\"\"\n\n def __init__(self) -> None:\n self.pipes: List[Pipe] = []\n self._time_since_last_spawn: float = 0.0\n\n def update(self, dt: float) -> List[Pipe]:\n \"\"\"Update all managed pipes and handle spawning/removal.\n\n Parameters\n ----------\n dt : float\n Time step in seconds since the last update.\n\n Returns\n -------\n List[Pipe]\n The current list of active pipes after cleanup.\n \"\"\"\n\n # Spawn new pipe if the interval has elapsed.\n self._time_since_last_spawn += dt\n while self._time_since_last_spawn >= PIPE_SPAWN_INTERVAL:\n self.pipes.append(Pipe(SCREEN_WIDTH))\n self._time_since_last_spawn -= PIPE_SPAWN_INTERVAL\n\n # Update existing pipes.\n for pipe in self.pipes:\n pipe.update(dt)\n\n # Remove pipes that have completely moved off the left side of the screen.\n self.pipes = [p for p in self.pipes if p.top_rect.right >= 0]\n return self.pipes\n\n def draw(self, surface: pygame.Surface) -> None:\n \"\"\"Draw all active pipes onto the supplied surface.\n\n Parameters\n ----------\n surface : pygame.Surface\n The surface to render the pipes onto.\n \"\"\"\n\n for pipe in self.pipes:\n pipe.draw(surface)\n"], "encoding":"utf-8"}', err=invalid character ']' after object key:value pair (status code: 500)
Author
Owner

@onestardao commented on GitHub (Aug 26, 2025):

@lefoulkrod You posted log so I thnk you want the solution it's generated by AI with my own WFGY method, check it

===

quick read of your new logs: this is a classic “tool-call JSON is not valid” case. the model is returning either a stringified arguments or extra prose around the JSON, so the parser chokes. this lines up with our ProblemMap No.10 (schema/format drift). sometimes No.2 (interpretation collapse) is involved if the system prompt is too wordy.

fast fix checklist

  1. force JSON only

    • OpenAI-compat: set response_format: {"type":"json_object"} or format: "json" in the request.
    • keep the tool schema minimal and put the example JSON on a single line.
  2. ensure native object in arguments

    • do not pass a stringified blob. remove backticks, code fences, comments. the tool wrapper should hand a plain JSON object to the runtime.
  3. pre-parse guard

    • validate with a JSON parser before tool dispatch. if invalid, re-ask the model with a short “return only valid JSON, no prose” nudge and resend.
  4. reduce prompt noise

    • short system prompt, no extra explanations. start with a tiny example, then scale.

reference + full checklist is here:
WFGY ProblemMap / README — No.10 schema & format drift

if you want, i can paste the exact No.10 step-by-step block tailored to your payload snippet. just drop one failing tool-call JSON (redact secrets) and i’ll map it line by line.

<!-- gh-comment-id:3222585158 --> @onestardao commented on GitHub (Aug 26, 2025): @lefoulkrod You posted log so I thnk you want the solution it's generated by AI with my own WFGY method, check it === quick read of your new logs: this is a classic “tool-call JSON is not valid” case. the model is returning either a stringified `arguments` or extra prose around the JSON, so the parser chokes. this lines up with our ProblemMap No.10 (schema/format drift). sometimes No.2 (interpretation collapse) is involved if the system prompt is too wordy. **fast fix checklist** 1. **force JSON only** * OpenAI-compat: set `response_format: {"type":"json_object"}` or `format: "json"` in the request. * keep the tool schema minimal and put the example JSON on a single line. 2. **ensure native object in `arguments`** * do not pass a stringified blob. remove backticks, code fences, comments. the tool wrapper should hand a plain JSON object to the runtime. 3. **pre-parse guard** * validate with a JSON parser before tool dispatch. if invalid, re-ask the model with a short “return only valid JSON, no prose” nudge and resend. 4. **reduce prompt noise** * short system prompt, no extra explanations. start with a tiny example, then scale. reference + full checklist is here: **[WFGY ProblemMap / README — No.10 schema & format drift](https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md)** if you want, i can paste the exact No.10 step-by-step block tailored to your payload snippet. just drop one failing tool-call JSON (redact secrets) and i’ll map it line by line.
Author
Owner

@mdlmarkham commented on GitHub (Aug 31, 2025):

I'm using n8n connected to the Ollama Turbo API... keep getting "Invalid tool usage: mismatch between tool calls and tool results" even with very simple (one search tool) setup... any way around this?

<!-- gh-comment-id:3240196657 --> @mdlmarkham commented on GitHub (Aug 31, 2025): I'm using n8n connected to the Ollama Turbo API... keep getting "Invalid tool usage: mismatch between tool calls and tool results" even with very simple (one search tool) setup... any way around this?
Author
Owner

@onestardao commented on GitHub (Sep 1, 2025):

@mdlmarkham

quick note, hope it helps. your “mismatch between tool calls and tool results” is almost always tool-I/O schema drift. the model mixes prose with the call or returns a different count/shape than the results array.

map it to Problem Map No.6 (logic collapse & recovery) and No.2 (interpretation / format collapse). fix is simple: force strict JSON only, one tool per turn, pre-validate the JSON before dispatch, and hard-reset the turn if the shape does not match.

it is a semantic firewall you can copy into your wrapper without changing your stack. checklink above

<!-- gh-comment-id:3240658107 --> @onestardao commented on GitHub (Sep 1, 2025): @mdlmarkham quick note, hope it helps. your “mismatch between tool calls and tool results” is almost always tool-I/O schema drift. the model mixes prose with the call or returns a different count/shape than the results array. map it to Problem Map No.6 (logic collapse & recovery) and No.2 (interpretation / format collapse). fix is simple: force strict JSON only, one tool per turn, pre-validate the JSON before dispatch, and hard-reset the turn if the shape does not match. it is a semantic firewall you can copy into your wrapper without changing your stack. checklink above
Author
Owner

@mcr-ksh commented on GitHub (Sep 1, 2025):

@onestardao some example using your TXT OS would be great in this given context.

<!-- gh-comment-id:3243257736 --> @mcr-ksh commented on GitHub (Sep 1, 2025): @onestardao some example using your TXT OS would be great in this given context.
Author
Owner

@onestardao commented on GitHub (Sep 2, 2025):

you don’t have to over-debug this one.
fastest way: just grab my plain-text OS (TXTOS), load it in your model, then literally ask:

use WFGY to solve the tool call parsing bug in this screenshot

and paste the screenshot of your error.

the OS is written for the AI itself so it knows which problem map number you’re hitting

once you try it, the model will point you to the exact guardrail page inside the Problem Map with the step-by-step fix.

<!-- gh-comment-id:3244000842 --> @onestardao commented on GitHub (Sep 2, 2025): you don’t have to over-debug this one. fastest way: just grab my plain-text OS (TXTOS), load it in your model, then literally ask: ``` use WFGY to solve the tool call parsing bug in this screenshot ``` and paste the screenshot of your error. the OS is written *for the AI itself* so it knows which problem map number you’re hitting once you try it, the model will point you to the exact guardrail page inside the Problem Map with the step-by-step fix.
Author
Owner

@lefoulkrod commented on GitHub (Sep 6, 2025):

Possibly related to https://github.com/ollama/ollama/issues/12203

<!-- gh-comment-id:3262165992 --> @lefoulkrod commented on GitHub (Sep 6, 2025): Possibly related to https://github.com/ollama/ollama/issues/12203
Author
Owner

@socram8888 commented on GitHub (Sep 30, 2025):

I've just stumbled upon this error. The exact request, done manually against the chat API:

{
   "model":"huihui_ai/gpt-oss-abliterated:20b",
   "messages":[
      {
         "role":"user",
         "content":"Calculate the length in words of this chunk of text. Use the count_words tool.\n\nHow tool parsing works in Ollama\nBackground\nWe\u2019ve built a new parser that focuses on understanding the structure of a tool call rather than simply looking for JSON.\n\nPreviously, when tools were passed into the model, the system had to wait until the entire output was generated and then parse it as JSON to determine whether it contained a tool call or normal content. Users had to wait for the complete generation before seeing any streamed token. This approach was reliable against malformed output, but blocked streaming because a tool call might occur at any point in the text.\n\nOllama supports a wide range of models, some trained with tool-specific tokens and some without. The parsing logic would needs to stream user content while being able to detect, suppress, and parse the tool call tokens.\n\nIncremental Parser\nThe new parser directly references each model\u2019s template to understand the prefix of the tool call. This is necessary for Ollama to understand and separate the tool calls and content.\n\nWhen a model is not directly trained on tool usage directly (trained with a prefix/tool token), it may still be able to output valid tool calls based on the sheer amount of knowledge it has. In this case, the parser is able to handle the partial prefixes output by the model and correctly separate tool calls and content.\n\nSome models also elect to output a tool call without a prefix, even though they were trained on using a prefix for calling tools. Empirically, this behavior happens at the start of a model output only. To address this, the parser can fallback to parsing JSON as a tool call when it recognizes the start of a JSON. If the JSON does not match the tool call format for the model, the JSON will be returned."
      }
   ],
   "options":{
      "num_ctx":8192
   },
   "think":true,
   "stream":true,
   "tools":[
      {
         "type":"function",
         "function":{
            "name":"count_words",
            "description":"Calculate the word count for a given text",
            "parameters":{
               "type":"object",
               "properties":{
                  "text":{
                     "type":"string",
                     "description":"The text to calculate the length of"
                  }
               },
               "required":[
                  "text"
               ]
            }
         }
      }
   ]
}

The streaming response:

{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:48.8421465Z', 'message': {'role': 'assistant', 'content': '', 'thinking': 'We'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:48.8918969Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' need'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:48.943321Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' to'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:48.9943184Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' call'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.0466919Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.0975886Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' count'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.1477533Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '_words'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.1993845Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' tool'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.250604Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' with'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.3023189Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.3541248Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' given'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.4060203Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' text'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.4569036Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.5083889Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' The'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.5593626Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' text'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.6104598Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' is'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.6617102Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.7136654Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' chunk'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.7657388Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' starting'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.8168979Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' with'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.8693789Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' "'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.9213727Z', 'message': {'role': 'assistant', 'content': '', 'thinking': 'Calculate'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.972173Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.0243279Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' length'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.0762522Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' in'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.1288321Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' words'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.1823557Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' of'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.2347046Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' this'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.2891573Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' chunk'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.3417142Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' of'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.3937481Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' text'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.446075Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.4985808Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' Use'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.5516047Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.6038104Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' count'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.6554842Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '_words'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.7086585Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' tool'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.7607187Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '."'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.812525Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' Then'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.8644452Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.9170759Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' rest'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.9702597Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' of'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.0235476Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.0763046Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' chunk'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.1307694Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.1836988Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' We'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.2365574Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' must'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.2888922Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' ensure'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.3413114Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.3944374Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' text'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.447811Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' is'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.5013433Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' passed'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.5560489Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' correctly'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.6101454Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.6635471Z', 'message': {'role': 'assistant', 'content': '', 'thinking': " We'll"}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.7176046Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' produce'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.7717075Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' JSON'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.8244351Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' with'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.8785793Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' key'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.9315896Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' "'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.9912788Z', 'message': {'role': 'assistant', 'content': '', 'thinking': 'text'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.0451975Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '".'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.0992761Z', 'message': {'role': 'assistant', 'content': '', 'thinking': " Let's"}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.152886Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' count'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.2061728Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' words'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.259895Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.313304Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' We'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.3667153Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' can'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.4205601Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' do'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.4744018Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' it'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.5306094Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' manually'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.5842288Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' quickly'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.6387154Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' or'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.6917305Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' rely'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.7451629Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' on'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.7980629Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' tool'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.8552752Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.9079159Z', 'message': {'role': 'assistant', 'content': '', 'thinking': " We'll"}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.9607948Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' call'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:53.0129743Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:53.0657066Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' tool'}, 'done': False}
{'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:53.1191227Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False}
{'error': 'error parsing tool call: raw=\'{"text":"Calculate the length in words of this chunk of text. Use the count_words tool.\\n\\nHow tool parsing works in Ollama\\nBackground\\nWe’ve built a new parser that focuses on understanding the structure of a tool call rather than simply looking for JSON.\\n\\nPreviously, when tools were passed into the model, the system had to wait until the entire output was generated and then parse it as JSON to determine whether it contained a tool call or normal content. Users had to wait for the complete generation before seeing any streamed token. This approach was reliable against malformed output, but blocked streaming because a tool call might occur at any point in the text.\\n\\nOllama supports a wide range of models, some trained with tool-specific tokens and some without. The parsing logic would needs to stream user content while being able to detect, suppress, and parse the tool call tokens.\\n\\nIncremental Parser\\nThe new parser directly references each model’s template to understand the prefix of the tool call. This is necessary for Ollama to understand and separate the tool calls and content.\\n\\nWhen a model is not directly trained on tool usage directly (trained with a prefix/tool token), it may still be able to output valid tool calls based on the sheer amount of knowledge it has. In this case, the parser is able to handle the partial prefixes output by the model and correctly separate tool calls and content.\\n\\nSome models also elect to output a tool call without a prefix, even though they were trained on using a prefix for calling tools. Empirically, this behavior happens at the start of a model output only. To address this, the parser can fallback to parsing JSON as a tool call when it recognizes the start of a JSON. If the JSON does not match the tool call format for the model, the JSON will be returned."}<|call|>commentary<|channel|>assistant<|channel|>commentary to=functions.count_words<|channel|>analysis <|constrain|>json<|message|>{"text":"Calculate the length in words of this chunk of text. Use the count_words tool.\\n\\nHow tool parsing works in Ollama\\nBackground\\nWe’ve built a new parser that focuses on understanding the structure of a tool call rather than simply looking for JSON.\\n\\nPreviously, when tools were passed into the model, the system had to wait until the entire output was generated and then parse it as JSON to determine whether it contained a tool call or normal content. Users had to wait for the complete generation before seeing any streamed token. This approach was reliable against malformed output, but blocked streaming because a tool call might occur at any point in the text.\\n\\nOllama supports a wide range of models, some trained with tool-specific tokens and some without. The parsing logic would needs to stream user content while being able to detect, suppress, and parse the tool call tokens.\\n\\nIncremental Parser\\nThe new parser directly references each model’s template to understand the prefix of the tool call. This is necessary for Ollama to understand and separate the tool calls and content.\\n\\nWhen a model is not directly trained on tool usage directly (trained with a prefix/tool token), it may still be able to output valid tool calls based on the sheer amount of knowledge it has. In this case, the parser is able to handle the partial prefixes output by the model and correctly separate tool calls and content.\\n\\nSome models also elect to output a tool call without a prefix, even though they were trained on using a prefix for calling tools. Empirically, this behavior happens at the start of a model output only. To address this, the parser can fallback to parsing JSON as a tool call when it recognizes the start of a JSON. If the JSON does not match the tool call format for the model, the JSON will be returned."}<|call|>commentary<|channel|>assistant<|channel|>final<|message|>**Word count:** 241 words.\', err=invalid character \'<\' after top-level value'}
<!-- gh-comment-id:3354028838 --> @socram8888 commented on GitHub (Sep 30, 2025): I've just stumbled upon this error. The exact request, done manually against the chat API: ``` { "model":"huihui_ai/gpt-oss-abliterated:20b", "messages":[ { "role":"user", "content":"Calculate the length in words of this chunk of text. Use the count_words tool.\n\nHow tool parsing works in Ollama\nBackground\nWe\u2019ve built a new parser that focuses on understanding the structure of a tool call rather than simply looking for JSON.\n\nPreviously, when tools were passed into the model, the system had to wait until the entire output was generated and then parse it as JSON to determine whether it contained a tool call or normal content. Users had to wait for the complete generation before seeing any streamed token. This approach was reliable against malformed output, but blocked streaming because a tool call might occur at any point in the text.\n\nOllama supports a wide range of models, some trained with tool-specific tokens and some without. The parsing logic would needs to stream user content while being able to detect, suppress, and parse the tool call tokens.\n\nIncremental Parser\nThe new parser directly references each model\u2019s template to understand the prefix of the tool call. This is necessary for Ollama to understand and separate the tool calls and content.\n\nWhen a model is not directly trained on tool usage directly (trained with a prefix/tool token), it may still be able to output valid tool calls based on the sheer amount of knowledge it has. In this case, the parser is able to handle the partial prefixes output by the model and correctly separate tool calls and content.\n\nSome models also elect to output a tool call without a prefix, even though they were trained on using a prefix for calling tools. Empirically, this behavior happens at the start of a model output only. To address this, the parser can fallback to parsing JSON as a tool call when it recognizes the start of a JSON. If the JSON does not match the tool call format for the model, the JSON will be returned." } ], "options":{ "num_ctx":8192 }, "think":true, "stream":true, "tools":[ { "type":"function", "function":{ "name":"count_words", "description":"Calculate the word count for a given text", "parameters":{ "type":"object", "properties":{ "text":{ "type":"string", "description":"The text to calculate the length of" } }, "required":[ "text" ] } } } ] } ``` The streaming response: ``` {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:48.8421465Z', 'message': {'role': 'assistant', 'content': '', 'thinking': 'We'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:48.8918969Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' need'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:48.943321Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' to'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:48.9943184Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' call'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.0466919Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.0975886Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' count'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.1477533Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '_words'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.1993845Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' tool'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.250604Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' with'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.3023189Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.3541248Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' given'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.4060203Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' text'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.4569036Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.5083889Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' The'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.5593626Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' text'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.6104598Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' is'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.6617102Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.7136654Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' chunk'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.7657388Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' starting'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.8168979Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' with'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.8693789Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' "'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.9213727Z', 'message': {'role': 'assistant', 'content': '', 'thinking': 'Calculate'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:49.972173Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.0243279Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' length'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.0762522Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' in'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.1288321Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' words'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.1823557Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' of'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.2347046Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' this'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.2891573Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' chunk'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.3417142Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' of'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.3937481Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' text'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.446075Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.4985808Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' Use'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.5516047Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.6038104Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' count'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.6554842Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '_words'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.7086585Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' tool'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.7607187Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '."'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.812525Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' Then'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.8644452Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.9170759Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' rest'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:50.9702597Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' of'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.0235476Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.0763046Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' chunk'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.1307694Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.1836988Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' We'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.2365574Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' must'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.2888922Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' ensure'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.3413114Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.3944374Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' text'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.447811Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' is'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.5013433Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' passed'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.5560489Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' correctly'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.6101454Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.6635471Z', 'message': {'role': 'assistant', 'content': '', 'thinking': " We'll"}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.7176046Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' produce'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.7717075Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' JSON'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.8244351Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' with'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.8785793Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' key'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.9315896Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' "'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:51.9912788Z', 'message': {'role': 'assistant', 'content': '', 'thinking': 'text'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.0451975Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '".'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.0992761Z', 'message': {'role': 'assistant', 'content': '', 'thinking': " Let's"}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.152886Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' count'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.2061728Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' words'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.259895Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.313304Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' We'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.3667153Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' can'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.4205601Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' do'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.4744018Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' it'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.5306094Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' manually'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.5842288Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' quickly'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.6387154Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' or'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.6917305Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' rely'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.7451629Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' on'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.7980629Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' tool'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.8552752Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.9079159Z', 'message': {'role': 'assistant', 'content': '', 'thinking': " We'll"}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:52.9607948Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' call'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:53.0129743Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' the'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:53.0657066Z', 'message': {'role': 'assistant', 'content': '', 'thinking': ' tool'}, 'done': False} {'model': 'huihui_ai/gpt-oss-abliterated:20b', 'created_at': '2025-09-30T22:41:53.1191227Z', 'message': {'role': 'assistant', 'content': '', 'thinking': '.'}, 'done': False} {'error': 'error parsing tool call: raw=\'{"text":"Calculate the length in words of this chunk of text. Use the count_words tool.\\n\\nHow tool parsing works in Ollama\\nBackground\\nWe’ve built a new parser that focuses on understanding the structure of a tool call rather than simply looking for JSON.\\n\\nPreviously, when tools were passed into the model, the system had to wait until the entire output was generated and then parse it as JSON to determine whether it contained a tool call or normal content. Users had to wait for the complete generation before seeing any streamed token. This approach was reliable against malformed output, but blocked streaming because a tool call might occur at any point in the text.\\n\\nOllama supports a wide range of models, some trained with tool-specific tokens and some without. The parsing logic would needs to stream user content while being able to detect, suppress, and parse the tool call tokens.\\n\\nIncremental Parser\\nThe new parser directly references each model’s template to understand the prefix of the tool call. This is necessary for Ollama to understand and separate the tool calls and content.\\n\\nWhen a model is not directly trained on tool usage directly (trained with a prefix/tool token), it may still be able to output valid tool calls based on the sheer amount of knowledge it has. In this case, the parser is able to handle the partial prefixes output by the model and correctly separate tool calls and content.\\n\\nSome models also elect to output a tool call without a prefix, even though they were trained on using a prefix for calling tools. Empirically, this behavior happens at the start of a model output only. To address this, the parser can fallback to parsing JSON as a tool call when it recognizes the start of a JSON. If the JSON does not match the tool call format for the model, the JSON will be returned."}<|call|>commentary<|channel|>assistant<|channel|>commentary to=functions.count_words<|channel|>analysis <|constrain|>json<|message|>{"text":"Calculate the length in words of this chunk of text. Use the count_words tool.\\n\\nHow tool parsing works in Ollama\\nBackground\\nWe’ve built a new parser that focuses on understanding the structure of a tool call rather than simply looking for JSON.\\n\\nPreviously, when tools were passed into the model, the system had to wait until the entire output was generated and then parse it as JSON to determine whether it contained a tool call or normal content. Users had to wait for the complete generation before seeing any streamed token. This approach was reliable against malformed output, but blocked streaming because a tool call might occur at any point in the text.\\n\\nOllama supports a wide range of models, some trained with tool-specific tokens and some without. The parsing logic would needs to stream user content while being able to detect, suppress, and parse the tool call tokens.\\n\\nIncremental Parser\\nThe new parser directly references each model’s template to understand the prefix of the tool call. This is necessary for Ollama to understand and separate the tool calls and content.\\n\\nWhen a model is not directly trained on tool usage directly (trained with a prefix/tool token), it may still be able to output valid tool calls based on the sheer amount of knowledge it has. In this case, the parser is able to handle the partial prefixes output by the model and correctly separate tool calls and content.\\n\\nSome models also elect to output a tool call without a prefix, even though they were trained on using a prefix for calling tools. Empirically, this behavior happens at the start of a model output only. To address this, the parser can fallback to parsing JSON as a tool call when it recognizes the start of a JSON. If the JSON does not match the tool call format for the model, the JSON will be returned."}<|call|>commentary<|channel|>assistant<|channel|>final<|message|>**Word count:** 241 words.\', err=invalid character \'<\' after top-level value'} ```
Author
Owner

@drifkin commented on GitHub (Sep 30, 2025):

it looks like the model didn't stop generating after <|call|>, which should be a stop token. Have you seen this on the official gpt-oss:20b model? It might be that the abliterated one is outputting the wrong token, or is configured incorrectly

<!-- gh-comment-id:3354037407 --> @drifkin commented on GitHub (Sep 30, 2025): it looks like the model didn't stop generating after `<|call|>`, which should be a stop token. Have you seen this on the official gpt-oss:20b model? It might be that the abliterated one is outputting the wrong token, or is configured incorrectly
Author
Owner

@socram8888 commented on GitHub (Sep 30, 2025):

I was thinking exactly the same and yes, the gpt-oss:20b model does work fine. I've ran the same query three times and it all generated correct tool calls, so it must be an issue with that specific model.

<!-- gh-comment-id:3354044859 --> @socram8888 commented on GitHub (Sep 30, 2025): I was thinking exactly the same and yes, the `gpt-oss:20b` model does work fine. I've ran the same query three times and it all generated correct tool calls, so it must be an issue with that specific model.
Author
Owner

@trufae commented on GitHub (Oct 8, 2025):

lmstudio does that right, will be great if i could use ollama instead

<!-- gh-comment-id:3380965385 --> @trufae commented on GitHub (Oct 8, 2025): lmstudio does that right, will be great if i could use ollama instead
Author
Owner

@drifkin commented on GitHub (Oct 8, 2025):

lmstudio does that right, will be great if i could use ollama instead

Are you seeing this with the real model or also with the abliterated one?

<!-- gh-comment-id:3382472964 --> @drifkin commented on GitHub (Oct 8, 2025): > lmstudio does that right, will be great if i could use ollama instead Are you seeing this with the real model or also with the abliterated one?
Author
Owner

@trufae commented on GitHub (Oct 8, 2025):

With qwen2.5, qwen3, gpt-oss:20b at least for me using vscode copilot and opencode as clients. But i didnt went deep into seeing the real problem

<!-- gh-comment-id:3383105904 --> @trufae commented on GitHub (Oct 8, 2025): With qwen2.5, qwen3, gpt-oss:20b at least for me using vscode copilot and opencode as clients. But i didnt went deep into seeing the real problem
Author
Owner

@ParthSareen commented on GitHub (Oct 8, 2025):

@trufae how much context length are you setting for the model when using Ollama? these model should at least be giving some output (even though they're small). If you're coding with them I'd recommend at least 64000 as long as it fits on your machine.

<!-- gh-comment-id:3383467492 --> @ParthSareen commented on GitHub (Oct 8, 2025): @trufae how much context length are you setting for the model when using Ollama? these model should at least be giving some output (even though they're small). If you're coding with them I'd recommend at least 64000 as long as it fits on your machine.
Author
Owner

@trufae commented on GitHub (Oct 9, 2025):

im just testing it now with the latest ollama 0.12.3 and opencode 0.14.6 and i can use qwen3:latest and gpt-oss:20b in local with no issues, but i cant use other models. also after some testing i find out that the real problem was that the context was not configured to 128K so it was silently failing

<!-- gh-comment-id:3386732867 --> @trufae commented on GitHub (Oct 9, 2025): im just testing it now with the latest ollama 0.12.3 and opencode 0.14.6 and i can use `qwen3:latest` and `gpt-oss:20b` in local with no issues, but i cant use other models. also after some testing i find out that the real problem was that the context was not configured to 128K so it was silently failing
Author
Owner

@ParthSareen commented on GitHub (Oct 9, 2025):

@trufae that's great! hopefully we're gonna do some changes to make the context filling more visible. as for other models - I'm not sure what you're trying but most are not made for agentic tool use. they wouldn't be able to complete a loop at their size

<!-- gh-comment-id:3386848000 --> @ParthSareen commented on GitHub (Oct 9, 2025): @trufae that's great! hopefully we're gonna do some changes to make the context filling more visible. as for other models - I'm not sure what you're trying but most are not made for agentic tool use. they wouldn't be able to complete a loop at their size
Author
Owner

@mcr-ksh commented on GitHub (Oct 11, 2025):

I can confirm it didn't work for me setting the context to 128K. See comment from https://github.com/ollama/ollama/issues/12187.
Short vLLM works, ollama doesnt on the same model and same machine. Just exchanged the n8n chat model node between vLLM and ollama.

<!-- gh-comment-id:3393038641 --> @mcr-ksh commented on GitHub (Oct 11, 2025): I can confirm it didn't work for me setting the context to 128K. See comment from https://github.com/ollama/ollama/issues/12187. Short vLLM works, ollama doesnt on the same model and same machine. Just exchanged the n8n chat model node between vLLM and ollama.
Author
Owner

@ParthSareen commented on GitHub (Oct 11, 2025):

I can confirm it didn't work for me setting the context to 128K. See comment from https://github.com/ollama/ollama/issues/12187.
Short vLLM works, ollama doesnt on the same model and same machine. Just exchanged the n8n chat model node between vLLM and ollama.

What configuration are you running with? Can you show the output of ollama ps

<!-- gh-comment-id:3393040724 --> @ParthSareen commented on GitHub (Oct 11, 2025): > I can confirm it didn't work for me setting the context to 128K. See comment from https://github.com/ollama/ollama/issues/12187. > Short vLLM works, ollama doesnt on the same model and same machine. Just exchanged the n8n chat model node between vLLM and ollama. What configuration are you running with? Can you show the output of ollama ps
Author
Owner

@mcr-ksh commented on GitHub (Oct 11, 2025):

I can confirm it didn't work for me setting the context to 128K. See comment from #12187.
Short vLLM works, ollama doesnt on the same model and same machine. Just exchanged the n8n chat model node between vLLM and ollama.

What configuration are you running with? Can you show the output of ollama ps

(vllm) root@jetson-thor:/usr/src/vllm# ollama ps 
NAME           ID              SIZE     PROCESSOR    CONTEXT    UNTIL             
gpt-oss:20b    aa4295ac10c3    17 GB    100% GPU     131072     24 hours from now 

Oct 11 10:04:15 jetson-thor ollama[3558504]: [GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
Oct 11 10:04:15 jetson-thor ollama[3558504]: [GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
Oct 11 10:04:15 jetson-thor ollama[3558504]: [GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
Oct 11 10:04:15 jetson-thor ollama[3558504]: [GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
Oct 11 10:04:15 jetson-thor ollama[3558504]: [GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
Oct 11 10:04:15 jetson-thor ollama[3558504]: time=2025-10-11T10:04:15.162+02:00 level=INFO source=routes.go:1534 msg="Listening on [::]:11434 (version 0.0.0)"
Oct 11 10:04:15 jetson-thor ollama[3558504]: time=2025-10-11T10:04:15.163+02:00 level=INFO source=runner.go:80 msg="discovering available GPUs..."
Oct 11 10:04:15 jetson-thor ollama[3558504]: time=2025-10-11T10:04:15.697+02:00 level=INFO source=types.go:112 msg="inference compute" id=GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 library=CUDA compute=11.0 name=CUDA0 description="NVIDIA Thor" libdirs=ollama,cuda_sbsa driver=13.0 pci_id=01:00.0 type=iGPU total="122.8 GiB" available="120.0 GiB"
Oct 11 10:04:19 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:04:19 | 200 |      803.95µs |       127.0.0.1 | HEAD     "/"
Oct 11 10:04:19 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:04:19 | 200 |          92µs |       127.0.0.1 | GET      "/api/ps"
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.436+02:00 level=INFO source=server.go:216 msg="enabling flash attention"
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.436+02:00 level=INFO source=server.go:400 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --port 40287"
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.437+02:00 level=INFO source=server.go:675 msg="loading model" "model layers"=25 requested=-1
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.437+02:00 level=INFO source=server.go:681 msg="system memory" total="122.8 GiB" free="119.7 GiB" free_swap="0 B"
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.437+02:00 level=INFO source=server.go:689 msg="gpu memory" id=GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 library=CUDA available="119.1 GiB" free="119.5 GiB" minimum="457.0 MiB" overhead="0 B"
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.445+02:00 level=INFO source=runner.go:1316 msg="starting ollama engine"
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.448+02:00 level=INFO source=runner.go:1352 msg="Server listening on 127.0.0.1:40287"
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.449+02:00 level=INFO source=runner.go:1189 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:131072 KvCacheType: NumThreads:14 GPULayers:25[ID:GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.493+02:00 level=INFO source=ggml.go:133 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30
Oct 11 10:04:32 jetson-thor ollama[3558504]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Oct 11 10:04:32 jetson-thor ollama[3558504]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Oct 11 10:04:32 jetson-thor ollama[3558504]: ggml_cuda_init: found 1 CUDA devices:
Oct 11 10:04:32 jetson-thor ollama[3558504]:   Device 0: NVIDIA Thor, compute capability 11.0, VMM: yes, ID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600
Oct 11 10:04:32 jetson-thor ollama[3558504]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_sbsa/libggml-cuda.so
Oct 11 10:04:32 jetson-thor ollama[3558504]: load_backend: loaded CPU backend from /usr/local/lib/ollama/cuda_sbsa/libggml-cpu.so
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.589+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CUDA.0.ARCHS=1100 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.818+02:00 level=INFO source=runner.go:1189 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:131072 KvCacheType: NumThreads:14 GPULayers:25[ID:GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.252+02:00 level=INFO source=runner.go:1189 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:131072 KvCacheType: NumThreads:14 GPULayers:25[ID:GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.252+02:00 level=INFO source=ggml.go:477 msg="offloading 24 repeating layers to GPU"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.252+02:00 level=INFO source=ggml.go:483 msg="offloading output layer to GPU"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.252+02:00 level=INFO source=ggml.go:488 msg="offloaded 25/25 layers to GPU"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.252+02:00 level=INFO source=device.go:206 msg="model weights" device=CUDA0 size="11.8 GiB"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="1.1 GiB"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=device.go:217 msg="kv cache" device=CUDA0 size="3.1 GiB"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=device.go:228 msg="compute graph" device=CUDA0 size="249.8 MiB"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="5.6 MiB"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=device.go:238 msg="total memory" size="16.2 GiB"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=sched.go:481 msg="loaded runners" count=1
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding"
Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server loading model"
Oct 11 10:04:37 jetson-thor ollama[3558504]: time=2025-10-11T10:04:37.266+02:00 level=INFO source=server.go:1309 msg="llama runner started in 4.83 seconds"
Oct 11 10:04:46 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:04:46 | 200 |      44.907µs |       127.0.0.1 | HEAD     "/"
Oct 11 10:04:46 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:04:46 | 200 |       37.75µs |       127.0.0.1 | GET      "/api/ps"
Oct 11 10:05:08 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:05:08 | 200 | 38.538412472s |    192.168.1.17 | POST     "/api/chat"
Oct 11 10:05:27 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:05:27 | 200 |      35.241µs |       127.0.0.1 | HEAD     "/"
Oct 11 10:05:27 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:05:27 | 200 |      28.065µs |       127.0.0.1 | GET      "/api/ps"
Oct 11 10:06:16 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:06:16 | 200 | 38.183342541s |    192.168.1.17 | POST     "/api/chat"
Oct 11 10:06:29 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:06:29 | 200 |   50.020388ms |    192.168.1.17 | GET      "/api/tags"
<!-- gh-comment-id:3393048806 --> @mcr-ksh commented on GitHub (Oct 11, 2025): > > I can confirm it didn't work for me setting the context to 128K. See comment from [#12187](https://github.com/ollama/ollama/issues/12187). > > Short vLLM works, ollama doesnt on the same model and same machine. Just exchanged the n8n chat model node between vLLM and ollama. > > What configuration are you running with? Can you show the output of ollama ps ``` (vllm) root@jetson-thor:/usr/src/vllm# ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL gpt-oss:20b aa4295ac10c3 17 GB 100% GPU 131072 24 hours from now Oct 11 10:04:15 jetson-thor ollama[3558504]: [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) Oct 11 10:04:15 jetson-thor ollama[3558504]: [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) Oct 11 10:04:15 jetson-thor ollama[3558504]: [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) Oct 11 10:04:15 jetson-thor ollama[3558504]: [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) Oct 11 10:04:15 jetson-thor ollama[3558504]: [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) Oct 11 10:04:15 jetson-thor ollama[3558504]: time=2025-10-11T10:04:15.162+02:00 level=INFO source=routes.go:1534 msg="Listening on [::]:11434 (version 0.0.0)" Oct 11 10:04:15 jetson-thor ollama[3558504]: time=2025-10-11T10:04:15.163+02:00 level=INFO source=runner.go:80 msg="discovering available GPUs..." Oct 11 10:04:15 jetson-thor ollama[3558504]: time=2025-10-11T10:04:15.697+02:00 level=INFO source=types.go:112 msg="inference compute" id=GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 library=CUDA compute=11.0 name=CUDA0 description="NVIDIA Thor" libdirs=ollama,cuda_sbsa driver=13.0 pci_id=01:00.0 type=iGPU total="122.8 GiB" available="120.0 GiB" Oct 11 10:04:19 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:04:19 | 200 | 803.95µs | 127.0.0.1 | HEAD "/" Oct 11 10:04:19 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:04:19 | 200 | 92µs | 127.0.0.1 | GET "/api/ps" Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.436+02:00 level=INFO source=server.go:216 msg="enabling flash attention" Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.436+02:00 level=INFO source=server.go:400 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --port 40287" Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.437+02:00 level=INFO source=server.go:675 msg="loading model" "model layers"=25 requested=-1 Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.437+02:00 level=INFO source=server.go:681 msg="system memory" total="122.8 GiB" free="119.7 GiB" free_swap="0 B" Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.437+02:00 level=INFO source=server.go:689 msg="gpu memory" id=GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 library=CUDA available="119.1 GiB" free="119.5 GiB" minimum="457.0 MiB" overhead="0 B" Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.445+02:00 level=INFO source=runner.go:1316 msg="starting ollama engine" Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.448+02:00 level=INFO source=runner.go:1352 msg="Server listening on 127.0.0.1:40287" Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.449+02:00 level=INFO source=runner.go:1189 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:131072 KvCacheType: NumThreads:14 GPULayers:25[ID:GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.493+02:00 level=INFO source=ggml.go:133 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30 Oct 11 10:04:32 jetson-thor ollama[3558504]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Oct 11 10:04:32 jetson-thor ollama[3558504]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Oct 11 10:04:32 jetson-thor ollama[3558504]: ggml_cuda_init: found 1 CUDA devices: Oct 11 10:04:32 jetson-thor ollama[3558504]: Device 0: NVIDIA Thor, compute capability 11.0, VMM: yes, ID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 Oct 11 10:04:32 jetson-thor ollama[3558504]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_sbsa/libggml-cuda.so Oct 11 10:04:32 jetson-thor ollama[3558504]: load_backend: loaded CPU backend from /usr/local/lib/ollama/cuda_sbsa/libggml-cpu.so Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.589+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CUDA.0.ARCHS=1100 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Oct 11 10:04:32 jetson-thor ollama[3558504]: time=2025-10-11T10:04:32.818+02:00 level=INFO source=runner.go:1189 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:131072 KvCacheType: NumThreads:14 GPULayers:25[ID:GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.252+02:00 level=INFO source=runner.go:1189 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:131072 KvCacheType: NumThreads:14 GPULayers:25[ID:GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.252+02:00 level=INFO source=ggml.go:477 msg="offloading 24 repeating layers to GPU" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.252+02:00 level=INFO source=ggml.go:483 msg="offloading output layer to GPU" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.252+02:00 level=INFO source=ggml.go:488 msg="offloaded 25/25 layers to GPU" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.252+02:00 level=INFO source=device.go:206 msg="model weights" device=CUDA0 size="11.8 GiB" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="1.1 GiB" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=device.go:217 msg="kv cache" device=CUDA0 size="3.1 GiB" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=device.go:228 msg="compute graph" device=CUDA0 size="249.8 MiB" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="5.6 MiB" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=device.go:238 msg="total memory" size="16.2 GiB" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=sched.go:481 msg="loaded runners" count=1 Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding" Oct 11 10:04:33 jetson-thor ollama[3558504]: time=2025-10-11T10:04:33.253+02:00 level=INFO source=server.go:1305 msg="waiting for server to become available" status="llm server loading model" Oct 11 10:04:37 jetson-thor ollama[3558504]: time=2025-10-11T10:04:37.266+02:00 level=INFO source=server.go:1309 msg="llama runner started in 4.83 seconds" Oct 11 10:04:46 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:04:46 | 200 | 44.907µs | 127.0.0.1 | HEAD "/" Oct 11 10:04:46 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:04:46 | 200 | 37.75µs | 127.0.0.1 | GET "/api/ps" Oct 11 10:05:08 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:05:08 | 200 | 38.538412472s | 192.168.1.17 | POST "/api/chat" Oct 11 10:05:27 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:05:27 | 200 | 35.241µs | 127.0.0.1 | HEAD "/" Oct 11 10:05:27 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:05:27 | 200 | 28.065µs | 127.0.0.1 | GET "/api/ps" Oct 11 10:06:16 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:06:16 | 200 | 38.183342541s | 192.168.1.17 | POST "/api/chat" Oct 11 10:06:29 jetson-thor ollama[3558504]: [GIN] 2025/10/11 - 10:06:29 | 200 | 50.020388ms | 192.168.1.17 | GET "/api/tags" ```
Author
Owner

@ParthSareen commented on GitHub (Oct 11, 2025):

@mcr-ksh can you also attach the request you're making?

<!-- gh-comment-id:3393055461 --> @ParthSareen commented on GitHub (Oct 11, 2025): @mcr-ksh can you also attach the request you're making?
Author
Owner

@mcr-ksh commented on GitHub (Oct 11, 2025):

@mcr-ksh can you also attach the request you're making?
How can I do that? Is there a way to log the requests?

<!-- gh-comment-id:3393061481 --> @mcr-ksh commented on GitHub (Oct 11, 2025): > [@mcr-ksh](https://github.com/mcr-ksh) can you also attach the request you're making? How can I do that? Is there a way to log the requests?
Author
Owner

@ParthSareen commented on GitHub (Oct 11, 2025):

How can I do that? Is there a way to log the requests?

You can run OLLAMA_DEBUG=2 ollama serve

and drop the logs. There might be quite a bit - if you can you can grab the chatPrompt.go piece.

<!-- gh-comment-id:3393063407 --> @ParthSareen commented on GitHub (Oct 11, 2025): > How can I do that? Is there a way to log the requests? You can run `OLLAMA_DEBUG=2 ollama serve` and drop the logs. There might be quite a bit - if you can you can grab the `chatPrompt.go` piece.
Author
Owner

@mcr-ksh commented on GitHub (Oct 11, 2025):

chatPrompt.go doesnt show anything. I've used: OLLAMA_DEBUG=2 ollama serve 2>&1 | grep -v runner.go > /tmp/ollama_debug.log

ollama_debug.log

No tool was called at all this run.

<!-- gh-comment-id:3393074158 --> @mcr-ksh commented on GitHub (Oct 11, 2025): `chatPrompt.go` doesnt show anything. I've used: `OLLAMA_DEBUG=2 ollama serve 2>&1 | grep -v runner.go > /tmp/ollama_debug.log` [ollama_debug.log](https://github.com/user-attachments/files/22862945/ollama_debug.log) No tool was called at all this run.
Author
Owner

@ParthSareen commented on GitHub (Nov 8, 2025):

@mcr-ksh I need a log dump where you're getting the error/missing the tool call when it should've called one.

<!-- gh-comment-id:3505546362 --> @ParthSareen commented on GitHub (Nov 8, 2025): @mcr-ksh I need a log dump where you're getting the error/missing the tool call when it should've called one.
Author
Owner

@raulcabello commented on GitHub (Nov 18, 2025):

I’m running into the same issue.

My request looks like this:

{
  "model": "gpt-oss:20b",
  "messages": [
    {
      "role": "user",
      "content": "add foo: bar label to fleet-controller deployment in the cattle-fleet namespace in the local cluster"
    }
  ],
  "stream": false,
  "tools":[
  {
    "type": "function",
    "function": {
      "name": "patchKubernetesResource",
      "description": "Patches a Kubernetes resource using a JSON patch. Don't ask for confirmation.'\n\t\tParameters:\n\t\tkind (string): The type of Kubernetes resource to patch (e.g., Pod, Deployment, Service).\n\t\tnamespace (string): The namespace where the resource is located. It must be empty for cluster-wide resources.\n\t\tname (string): The name of the specific resource to patch.\n\t\tcluster (string): The name of the Kubernetes cluster.\n\t\tpatch (json): Patch to apply. The content type used is application/json-patch+json.\n\t\tReturns the modified resource.",
      "parameters": {
        "type": "object",
        "required": [
          "name",
          "namespace",
          "kind",
          "cluster",
          "patch"
        ],
        "properties": {
          "cluster": {
            "type": "string",
            "description": "the cluster of the resource"
          },
          "kind": {
            "type": "string",
            "description": "the kind of the resource"
          },
          "name": {
            "type": "string",
            "description": "the name of k8s resource"
          },
          "namespace": {
            "type": "string",
            "description": "the namespace of the resource"
          },
          "patch": {
            "type": "array",
            "description": "the patch of the request",
            "items": {
              "type": "object",
              "required": [
                "op",
                "path"
              ],
              "properties": {
                "op": {
                  "type": "string"
                },
                "path": {
                  "type": "string"
                },
                "value": true
              },
              "additionalProperties": false
            }
          }
        },
        "additionalProperties": false
      }
    }
  }
]
}

Most of the time it works fine, but occasionally I get an error like:

{"error":"error parsing tool call: raw='{\"cluster\":\"local\",\"kind\":\"Deployment\",\"name\":\"fleet-controller\",\"namespace\":\"cattle-fleet\",\"patch\":[{\"op\":\"add\",\"path\":\"/metadata/labels/foo\",\"value\":\"bar\"}]', err=unexpected end of JSON input"}

It seems the model occasionally produces incomplete JSON. Specifically, it drops the closing }

When the request succeeds, the last decoded string looks like:

 level=TRACE source=bytepairencoding.go:280 msg=decoded string=]} from=[28000]

But when the failure occurs, I see:

level=TRACE source=bytepairencoding.go:280 msg=decoded string=] from=[60]

Note the missing } in the decoded output.

Ollama itself doesn’t report anything helpful. The only entry I see is:

[GIN] 2 | 500 |  2.020396543s  | POST     "/api/chat"
<!-- gh-comment-id:3546556255 --> @raulcabello commented on GitHub (Nov 18, 2025): I’m running into the same issue. My request looks like this: ``` { "model": "gpt-oss:20b", "messages": [ { "role": "user", "content": "add foo: bar label to fleet-controller deployment in the cattle-fleet namespace in the local cluster" } ], "stream": false, "tools":[ { "type": "function", "function": { "name": "patchKubernetesResource", "description": "Patches a Kubernetes resource using a JSON patch. Don't ask for confirmation.'\n\t\tParameters:\n\t\tkind (string): The type of Kubernetes resource to patch (e.g., Pod, Deployment, Service).\n\t\tnamespace (string): The namespace where the resource is located. It must be empty for cluster-wide resources.\n\t\tname (string): The name of the specific resource to patch.\n\t\tcluster (string): The name of the Kubernetes cluster.\n\t\tpatch (json): Patch to apply. The content type used is application/json-patch+json.\n\t\tReturns the modified resource.", "parameters": { "type": "object", "required": [ "name", "namespace", "kind", "cluster", "patch" ], "properties": { "cluster": { "type": "string", "description": "the cluster of the resource" }, "kind": { "type": "string", "description": "the kind of the resource" }, "name": { "type": "string", "description": "the name of k8s resource" }, "namespace": { "type": "string", "description": "the namespace of the resource" }, "patch": { "type": "array", "description": "the patch of the request", "items": { "type": "object", "required": [ "op", "path" ], "properties": { "op": { "type": "string" }, "path": { "type": "string" }, "value": true }, "additionalProperties": false } } }, "additionalProperties": false } } } ] } ``` Most of the time it works fine, but occasionally I get an error like: ``` {"error":"error parsing tool call: raw='{\"cluster\":\"local\",\"kind\":\"Deployment\",\"name\":\"fleet-controller\",\"namespace\":\"cattle-fleet\",\"patch\":[{\"op\":\"add\",\"path\":\"/metadata/labels/foo\",\"value\":\"bar\"}]', err=unexpected end of JSON input"} ``` It seems the model occasionally produces incomplete JSON. Specifically, it drops the closing `}` When the request succeeds, the last decoded string looks like: ``` level=TRACE source=bytepairencoding.go:280 msg=decoded string=]} from=[28000] ``` But when the failure occurs, I see: ``` level=TRACE source=bytepairencoding.go:280 msg=decoded string=] from=[60] ``` Note the missing `}` in the decoded output. Ollama itself doesn’t report anything helpful. The only entry I see is: ``` [GIN] 2 | 500 | 2.020396543s | POST "/api/chat" ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70071