[GH-ISSUE #6394] mistral-nemo:12b-instruct-2407-fp16 will return empty string using json mode while mistral-nemo:12b will return code #50528

Closed
opened 2026-04-28 16:10:43 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @franz101 on GitHub (Aug 16, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6394

currently using openai api support mistral-nemo:12b-instruct-2407-fp16 returns an empty string

Originally created by @franz101 on GitHub (Aug 16, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6394 currently using openai api support `mistral-nemo:12b-instruct-2407-fp16` returns an empty string
GiteaMirror added the model label 2026-04-28 16:10:43 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 16, 2024):

Server logs may help in debugging. What version of ollama? Do you have an example of a prompt or code that replicates the issue?

<!-- gh-comment-id:2294202211 --> @rick-github commented on GitHub (Aug 16, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may help in debugging. What version of ollama? Do you have an example of a prompt or code that replicates the issue?
Author
Owner

@franz101 commented on GitHub (Aug 16, 2024):

Sure thing:
https://colab.research.google.com/drive/1wFO7-A3iRMwjcLDi_W3cSniBu1lTid6T?usp=sharing

You will see only the normal model works here

<!-- gh-comment-id:2294439214 --> @franz101 commented on GitHub (Aug 16, 2024): Sure thing: https://colab.research.google.com/drive/1wFO7-A3iRMwjcLDi_W3cSniBu1lTid6T?usp=sharing You will see only the normal model works here
Author
Owner

@rick-github commented on GitHub (Aug 17, 2024):

Tried it locally, both models returned the same response:

{
  "id": "chatcmpl-935",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "{ \"text\": \"world\" }",
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1723854013,
  "model": "mistral-nemo:12b",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "fp_ollama",
  "usage": {
    "completion_tokens": 9,
    "prompt_tokens": 72,
    "total_tokens": 81
  }
}
{
  "id": "chatcmpl-893",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "{ \"text\": \"world\" }",
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1723854016,
  "model": "mistral-nemo:12b-instruct-2407-fp16",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "fp_ollama",
  "usage": {
    "completion_tokens": 9,
    "prompt_tokens": 72,
    "total_tokens": 81
  }
}
$ curl localhost:11434/api/version
{"version":"0.3.5"}
<!-- gh-comment-id:2294491066 --> @rick-github commented on GitHub (Aug 17, 2024): Tried it locally, both models returned the same response: ``` { "id": "chatcmpl-935", "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": null, "message": { "content": "{ \"text\": \"world\" }", "role": "assistant", "function_call": null, "tool_calls": null } } ], "created": 1723854013, "model": "mistral-nemo:12b", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_ollama", "usage": { "completion_tokens": 9, "prompt_tokens": 72, "total_tokens": 81 } } { "id": "chatcmpl-893", "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": null, "message": { "content": "{ \"text\": \"world\" }", "role": "assistant", "function_call": null, "tool_calls": null } } ], "created": 1723854016, "model": "mistral-nemo:12b-instruct-2407-fp16", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_ollama", "usage": { "completion_tokens": 9, "prompt_tokens": 72, "total_tokens": 81 } } ``` ``` $ curl localhost:11434/api/version {"version":"0.3.5"} ```
Author
Owner

@rick-github commented on GitHub (Aug 17, 2024):

It looks like you want to do tool calls, but the response isn't filled out correctly, the content is just the function arguments.
Using the ollama endpoint rather than the openai endpoint returns the expected results:

response = ollama.chat(
    model='mistral-nemo:12b',
    messages=[
      {'role': 'system', 'content': 'You are an assistant'},
      {'role': 'user', 'content': 'Return hello world with the tools provided'}
    ],
    tools=tools,
)

print(json.dumps(response))
{
  "model": "mistral-nemo:12b",
  "created_at": "2024-08-17T00:49:27.192331932Z",
  "message": {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "function": {
          "name": "type",
          "arguments": {
            "text": "world"
          }
        }
      }
    ]
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 2753569516,
  "load_duration": 15550008,
  "prompt_eval_count": 72,
  "prompt_eval_duration": 143596000,
  "eval_count": 19,
  "eval_duration": 2457047000
}
<!-- gh-comment-id:2294504997 --> @rick-github commented on GitHub (Aug 17, 2024): It looks like you want to do tool calls, but the response isn't filled out correctly, the `content` is just the function arguments. Using the ollama endpoint rather than the openai endpoint returns the expected results: ```python response = ollama.chat( model='mistral-nemo:12b', messages=[ {'role': 'system', 'content': 'You are an assistant'}, {'role': 'user', 'content': 'Return hello world with the tools provided'} ], tools=tools, ) print(json.dumps(response)) ``` ```json { "model": "mistral-nemo:12b", "created_at": "2024-08-17T00:49:27.192331932Z", "message": { "role": "assistant", "content": "", "tool_calls": [ { "function": { "name": "type", "arguments": { "text": "world" } } } ] }, "done_reason": "stop", "done": true, "total_duration": 2753569516, "load_duration": 15550008, "prompt_eval_count": 72, "prompt_eval_duration": 143596000, "eval_count": 19, "eval_duration": 2457047000 } ```
Author
Owner

@franz101 commented on GitHub (Aug 17, 2024):

thanks for reproducing it, basically the quantized version would output the content and the smarter model the actual function call :D

<!-- gh-comment-id:2294994085 --> @franz101 commented on GitHub (Aug 17, 2024): thanks for reproducing it, basically the quantized version would output the content and the smarter model the actual function call :D
Author
Owner

@rick-github commented on GitHub (Aug 17, 2024):

Based on the output I got, neither model worked with the openai endpoint, and with the ollama endpoint, the quantized model made the actual function call. I didn't test the unquantized model with the ollama endpoint because I assumed it would perform the same.

<!-- gh-comment-id:2294994829 --> @rick-github commented on GitHub (Aug 17, 2024): Based on the output I got, neither model worked with the openai endpoint, and with the ollama endpoint, the quantized model made the actual function call. I didn't test the unquantized model with the ollama endpoint because I assumed it would perform the same.
Author
Owner

@rick-github commented on GitHub (Aug 17, 2024):

Additionally: I looked at the actual bytes being fed into the model and they were the same for the openai and ollama endpoints, which indicates that the incorrect results from the openai endpoint is likely a problem with ollama.

<!-- gh-comment-id:2294995244 --> @rick-github commented on GitHub (Aug 17, 2024): Additionally: I looked at the actual bytes being fed into the model and they were the same for the openai and ollama endpoints, which indicates that the incorrect results from the openai endpoint is likely a problem with ollama.
Author
Owner

@franz101 commented on GitHub (Aug 17, 2024):

ChatCompletionMessage(content='', refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_exfl6bms', function=Function(arguments='{"text":"world"}', name='type'), type='function')])

<!-- gh-comment-id:2294996968 --> @franz101 commented on GitHub (Aug 17, 2024): ``` ChatCompletionMessage(content='', refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_exfl6bms', function=Function(arguments='{"text":"world"}', name='type'), type='function')]) ```
Author
Owner

@franz101 commented on GitHub (Aug 17, 2024):

yes hasty mistake:
print(response.choices[0].message.content.strip())

<!-- gh-comment-id:2294997037 --> @franz101 commented on GitHub (Aug 17, 2024): yes hasty mistake: print(response.choices[0].message.content.strip())
Author
Owner

@franz101 commented on GitHub (Aug 17, 2024):

System prompt:
You are an assistant that always performs a function call

<!-- gh-comment-id:2294997160 --> @franz101 commented on GitHub (Aug 17, 2024): System prompt: `You are an assistant that always performs a function call`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50528