[GH-ISSUE #14986] /api/generate returns HTTP 500 {"error":"EOF"} with qwen3.5:9b when prompt requests <tool_call> XML output #9633

Open
opened 2026-04-12 22:31:50 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @meowDieJob on GitHub (Mar 21, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14986

What is the issue?

Environment

  • Ollama version: v0.18.2
  • Model: qwen3.5:9b
  • Endpoint: /api/generate
  • Native Ollama tools: not used

Summary

/api/generate works for plain text prompts and simple XML-like prompts, but can return HTTP 500 with {"error":"EOF"} when the prompt asks the model to emit <tool_call>...</tool_call> style XML output.

This reproduces even without using native Ollama tools. The prompt only contains tool-like XML tags as plain text instructions.

Possibly related to other Qwen XML/tool-call parsing issues, but this reproduces on qwen3.5:9b with /api/generate and without native tools.

Minimal reproduction

I used the following script:

import json
import urllib.request
import urllib.error

HOST = "http://127.0.0.1:11434"
MODEL = "qwen3.5:9b"

tests = [
    ("plain_text", "hello"),
    ("simple_xml", "<user_request>查上海天气</user_request>"),
    ("xml_with_tool_hint", "<tools>[{\"name\":\"get_weather\"}]</tools>\n<user_request>查上海天气</user_request>\n请只输出一个 <tool_call>...</tool_call>"),
    ("minitest1","<tools>[{\"name\":\"get_weather\"}]</tools>\n<user_request>查上海天气</user_request>\n请只输出一个工具调用块"),
]

for name, prompt in tests:
    payload = {
        "model": MODEL,
        "stream": False,
        "prompt": prompt,
    }
    req = urllib.request.Request(
        f"{HOST}/api/generate",
        data=json.dumps(payload).encode("utf-8"),
        headers={"Content-Type": "application/json"},
        method="POST",
    )
    print(f"\n=== TEST: {name} ===")
    try:
        with urllib.request.urlopen(req, timeout=120) as resp:
            body = resp.read().decode("utf-8")
            print("HTTP", resp.status)
            print(body[:1000])
    except urllib.error.HTTPError as e:
        body = e.read().decode("utf-8", errors="replace")
        print("HTTPError", e.code)
        print(body)
    except Exception as e:
        print("Exception", repr(e))

Results

1) plain_text

Prompt:

hello

Result:

HTTP 200
{"model":"qwen3.5:9b","created_at":"2026-03-20T23:46:44.373588553Z","response":"Hello! 👋 How can I help you today?", ...}

2) simple_xml

Prompt:

<user_request>查上海天气</user_request>

Result:

HTTP 200
{"model":"qwen3.5:9b","created_at":"2026-03-20T23:47:08.663588206Z","response":"很抱歉,作为人工智能助手,我暂时无法直接获取实时的天气数据。...", ...}

3) xml_with_tool_hint

Prompt:

<tools>[{"name":"get_weather"}]</tools>
<user_request>查上海天气</user_request>
请只输出一个 <tool_call>...</tool_call>

Result:

HTTPError 500
{"error":"EOF"}

5) minitest1

Prompt:

<tools>[{"name":"get_weather"}]</tools>
<user_request>查上海天气</user_request>
请只输出一个工具调用块

Result:

HTTP 200
{"model":"qwen3.5:9b","created_at":"2026-03-20T23:47:56.226133671Z","response":"```json
{
  \"tool_name\": \"get_weather\",
  \"parameters\": {
    \"location\": \"上海\",
    \"city_name\": \"Shanghai\",
    \"country\": \"CN\"
  }
}
```", ...}

Expected behavior

The server should not return HTTP 500 / {"error":"EOF"} because the prompt contains <tool_call>-style XML text.

Even if Ollama or the model-side parser dislikes this format, the request should fail gracefully, for example by:

  • returning raw model text, or
  • returning a controlled parse error with a clear message

but not an internal server error.

Actual behavior

When the prompt explicitly asks for <tool_call>...</tool_call> or <tool_call></tool_call>, /api/generate can fail with:

{"error":"EOF"}

Notes

This seems to be triggered specifically by the <tool_call> XML pattern in the prompt.

Important detail: this reproduction does not use native Ollama tools. The issue appears to happen even when these tags are just plain prompt text.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.18.2

Originally created by @meowDieJob on GitHub (Mar 21, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14986 ### What is the issue? ### Environment - Ollama version: `v0.18.2` - Model: `qwen3.5:9b` - Endpoint: `/api/generate` - Native Ollama tools: not used ### Summary `/api/generate` works for plain text prompts and simple XML-like prompts, but can return HTTP 500 with `{"error":"EOF"}` when the prompt asks the model to emit `<tool_call>...</tool_call>` style XML output. This reproduces even without using native Ollama tools. The prompt only contains tool-like XML tags as plain text instructions. Possibly related to other Qwen XML/tool-call parsing issues, but this reproduces on `qwen3.5:9b` with `/api/generate` and without native tools. ### Minimal reproduction I used the following script: ```python import json import urllib.request import urllib.error HOST = "http://127.0.0.1:11434" MODEL = "qwen3.5:9b" tests = [ ("plain_text", "hello"), ("simple_xml", "<user_request>查上海天气</user_request>"), ("xml_with_tool_hint", "<tools>[{\"name\":\"get_weather\"}]</tools>\n<user_request>查上海天气</user_request>\n请只输出一个 <tool_call>...</tool_call>"), ("minitest1","<tools>[{\"name\":\"get_weather\"}]</tools>\n<user_request>查上海天气</user_request>\n请只输出一个工具调用块"), ] for name, prompt in tests: payload = { "model": MODEL, "stream": False, "prompt": prompt, } req = urllib.request.Request( f"{HOST}/api/generate", data=json.dumps(payload).encode("utf-8"), headers={"Content-Type": "application/json"}, method="POST", ) print(f"\n=== TEST: {name} ===") try: with urllib.request.urlopen(req, timeout=120) as resp: body = resp.read().decode("utf-8") print("HTTP", resp.status) print(body[:1000]) except urllib.error.HTTPError as e: body = e.read().decode("utf-8", errors="replace") print("HTTPError", e.code) print(body) except Exception as e: print("Exception", repr(e)) ``` ### Results #### 1) plain_text Prompt: ```text hello ``` Result: ```text HTTP 200 {"model":"qwen3.5:9b","created_at":"2026-03-20T23:46:44.373588553Z","response":"Hello! 👋 How can I help you today?", ...} ``` #### 2) simple_xml Prompt: ```text <user_request>查上海天气</user_request> ``` Result: ```text HTTP 200 {"model":"qwen3.5:9b","created_at":"2026-03-20T23:47:08.663588206Z","response":"很抱歉,作为人工智能助手,我暂时无法直接获取实时的天气数据。...", ...} ``` #### 3) xml_with_tool_hint Prompt: ```text <tools>[{"name":"get_weather"}]</tools> <user_request>查上海天气</user_request> 请只输出一个 <tool_call>...</tool_call> ``` Result: ```text HTTPError 500 {"error":"EOF"} ``` #### 5) minitest1 Prompt: ```text <tools>[{"name":"get_weather"}]</tools> <user_request>查上海天气</user_request> 请只输出一个工具调用块 ``` Result: ```text HTTP 200 {"model":"qwen3.5:9b","created_at":"2026-03-20T23:47:56.226133671Z","response":"```json { \"tool_name\": \"get_weather\", \"parameters\": { \"location\": \"上海\", \"city_name\": \"Shanghai\", \"country\": \"CN\" } } ```", ...} ``` ### Expected behavior The server should not return HTTP 500 / `{"error":"EOF"}` because the prompt contains `<tool_call>`-style XML text. Even if Ollama or the model-side parser dislikes this format, the request should fail gracefully, for example by: - returning raw model text, or - returning a controlled parse error with a clear message but not an internal server error. ### Actual behavior When the prompt explicitly asks for `<tool_call>...</tool_call>` or `<tool_call></tool_call>`, `/api/generate` can fail with: ```json {"error":"EOF"} ``` ### Notes This seems to be triggered specifically by the `<tool_call>` XML pattern in the prompt. Important detail: this reproduction does **not** use native Ollama tools. The issue appears to happen even when these tags are just plain prompt text. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.18.2
GiteaMirror added the bug label 2026-04-12 22:31:50 -05:00
Author
Owner

@r266-tech commented on GitHub (Mar 22, 2026):

I've identified the root cause and have a fix ready.

Root cause: The Qwen3.5 model always has a builtin parser registered (qwen3.5). When a user prompt instructs the model to emit <tool_call>...</tool_call> XML as plain text (without registering native Ollama tools), the Qwen3CoderParser still scans for those delimiters, finds them, and then calls parseToolCall(). If the content between the tags isn't valid Qwen3-coder XML (e.g. it's a JSON blob), xml.Unmarshal fails with an EOF/parse error. That error propagates up to GenerateHandler which sends {"error":"EOF"} and returns HTTP 500.

Fix: In Qwen3CoderParser.Add(), when parseToolCall fails, instead of returning the error, log a warning and emit the raw tool-call text (including the wrapping tags) as ordinary content. This ensures the caller always gets a usable HTTP 200 response. When real tools are registered and the model produces valid XML, the existing parse path is unchanged.

PR incoming.

<!-- gh-comment-id:4106678049 --> @r266-tech commented on GitHub (Mar 22, 2026): I've identified the root cause and have a fix ready. **Root cause:** The Qwen3.5 model always has a builtin parser registered (`qwen3.5`). When a user prompt instructs the model to emit `<tool_call>...</tool_call>` XML as plain text (without registering native Ollama tools), the `Qwen3CoderParser` still scans for those delimiters, finds them, and then calls `parseToolCall()`. If the content between the tags isn't valid Qwen3-coder XML (e.g. it's a JSON blob), `xml.Unmarshal` fails with an EOF/parse error. That error propagates up to `GenerateHandler` which sends `{"error":"EOF"}` and returns HTTP 500. **Fix:** In `Qwen3CoderParser.Add()`, when `parseToolCall` fails, instead of returning the error, log a warning and emit the raw tool-call text (including the wrapping tags) as ordinary content. This ensures the caller always gets a usable HTTP 200 response. When real tools are registered and the model produces valid XML, the existing parse path is unchanged. PR incoming.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9633