[GH-ISSUE #7100] mixtral:8x22b model does not work with system prompt only #51019

Open
opened 2026-04-28 17:52:10 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @gakugaku on GitHub (Oct 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7100

What is the issue?

The mixtral:8x22b-instruct model does not work correctly when only the system prompt is provided. In such cases, an empty prompt is sent, leading to irrelevant output.

This behavior may be related to the internal handling of prompts or recent changes made in the system prompt handling, as referenced in #4228.

mixtral:8x22b-instruct template: https://ollama.com/library/mixtral:8x22b-instruct/blobs/138b3322e0da
Mixtral 8x22B template in docs: https://github.com/ollama/ollama/blob/main/docs/template.md#mistral

Steps to Reproduce

  1. Input the following system prompt only into the mixtral:8x22b-instruct model.
    curl http://localhost:11434/api/chat -d '{
        "model": "mixtral:8x22b-instruct-v0.1-q4_K_M",
        "stream": false,
        "messages": [
            {
                "role": "system",
                "content": "Hello, I am Ollama. I am here to help you with your questions. What would you like to know?\n\nWhat is the capital of Japan?"
            }
        ]
      }'
    
  2. The Ollama log shows that the prompt field is empty.
    prompt=\"\"\r\n
    {"log":"time=2024-10-04T08:03:17.462Z level=DEBUG source=routes.go:1417 msg=\"chat request\" images=0 prompt=\"\"\r\n","stream":"stdout","time":"2024-10-04T08:03:17.462471779Z"}
    
  3. The output is unrelated to the input content.
    {"model":"mixtral:8x22b-instruct-v0.1-q4_K_M","created_at":"2024-10-04T08:03:40.224484198Z","message":{"role":"assistant","content":",\n.\nA new study by the University of Maryland and Johns Hopkins Medicine has found ..."}
    

Expected Behavior

The mixtral:8x22b-instruct model should work with system prompt only, similar to other models like gemma2:27b.

Actual Results

mixtral:8x22b-instruct, system prompt only: NG

Results

input:

curl http://localhost:11434/api/chat -d '{
    "model": "mixtral:8x22b-instruct-v0.1-q4_K_M",
    "stream": false,
    "messages": [
        {
            "role": "system",
            "content": "Hello, I am Ollama. I am here to help you with your questions. What would you like to know?\n\nWhat is the capital of Japan?"
        }
    ]
}'

Ollama log:

{"log":"time=2024-10-04T08:03:17.462Z level=DEBUG source=routes.go:1417 msg=\"chat request\" images=0 prompt=\"\"\r\n","stream":"stdout","time":"2024-10-04T08:03:17.462471779Z"}

Output:

{"model":"mixtral:8x22b-instruct-v0.1-q4_K_M","created_at":"2024-10-04T08:03:40.224484198Z","message":{"role":"assistant","content":",\n.\nA new study by the University of Maryland and Johns Hopkins Medicine has found ..."}

This result is not related to the input.

mixtral:8x22b-instruct, system prompt + user prompt: OK

Results

Input

curl http://localhost:11434/api/chat -d '{
    "model": "mixtral:8x22b-instruct-v0.1-q4_K_M",
    "stream": false,
    "messages": [
        {
            "role": "system",
            "content": "Hello, I am Ollama. I am here to help you with your questions. What would you like to know?"
        },
        {
            "role": "user",
            "content": "What is the capital of Japan?"
        }
    ]
}'

Ollama log:

{"log":"time=2024-10-04T08:08:04.464Z level=DEBUG source=routes.go:1417 msg=\"chat request\" images=0 prompt=\"[INST] Hello, I am Ollama. I am here to help you with your questions. What would you like to know?\\n\\nWhat is the capital of Japan?[/INST]\"\r\n","stream":"stdout","time":"2024-10-04T08:08:04.464656724Z"}

Output:

{"model":"mixtral:8x22b-instruct-v0.1-q4_K_M","created_at":"2024-10-04T08:08:06.219305367Z","message":{"role":"assistant","content":" The capital of Japan is Tokyo. It's also the country's largest city and one of the world's most populous metropolitan areas."},"done_reason":"stop","done":true,"total_duration":1853131385,"load_duration":11283091,"prompt_eval_count":38,"prompt_eval_duration":348092000,"eval_count":32,"eval_duration":1361906000}

Other model example gemma2:27b, system prompt only: OK

Results

Input

curl http://localhost:11434/api/chat -d '{
    "model": "gemma2:27b-instruct-q4_K_M",
    "stream": false,
    "messages": [
        {
            "role": "system",
            "content": "Hello, I am Ollama. I am here to help you with your questions. What would you like to know?\n\nWhat is the capital of Japan?"
        }
    ]
}'

Ollama log:

{"log":"time=2024-10-04T08:12:01.403Z level=DEBUG source=routes.go:1417 msg=\"chat request\" images=0 prompt=\"\u003cstart_of_turn\u003euser\\nHello, I am Ollama. I am here to help you with your questions. What would you like to know?\\n\\nWhat is the capital of Japan? \u003cend_of_turn\u003e\\n\u003cstart_of_turn\u003emodel\\n\"\r\n","stream":"stdout","time":"2024-10-04T08:12:01.40376108Z"}

Output:

{"model":"gemma2:27b-instruct-q4_K_M","created_at":"2024-10-04T08:10:22.460284079Z","message":{"role":"assistant","content":"The capital of Japan is **Tokyo**. 🏯  \n"},"done_reason":"stop","done":true,"total_duration":6504495197,"load_duration":6057933332,"prompt_eval_count":42,"prompt_eval_duration":66685000,"eval_count":13,"eval_duration":376322000}

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.12

Originally created by @gakugaku on GitHub (Oct 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7100 ### What is the issue? The `mixtral:8x22b-instruct` model does not work correctly when only the system prompt is provided. In such cases, an empty prompt is sent, leading to irrelevant output. This behavior may be related to the internal handling of prompts or recent changes made in the system prompt handling, as referenced in #4228. mixtral:8x22b-instruct template: https://ollama.com/library/mixtral:8x22b-instruct/blobs/138b3322e0da Mixtral 8x22B template in docs: https://github.com/ollama/ollama/blob/main/docs/template.md#mistral #### Steps to Reproduce 1. Input the following system prompt only into the `mixtral:8x22b-instruct` model. ```bash curl http://localhost:11434/api/chat -d '{ "model": "mixtral:8x22b-instruct-v0.1-q4_K_M", "stream": false, "messages": [ { "role": "system", "content": "Hello, I am Ollama. I am here to help you with your questions. What would you like to know?\n\nWhat is the capital of Japan?" } ] }' ``` 2. The Ollama log shows that the `prompt` field is empty. `prompt=\"\"\r\n` ```bash {"log":"time=2024-10-04T08:03:17.462Z level=DEBUG source=routes.go:1417 msg=\"chat request\" images=0 prompt=\"\"\r\n","stream":"stdout","time":"2024-10-04T08:03:17.462471779Z"} ``` 4. The output is unrelated to the input content. ```bash {"model":"mixtral:8x22b-instruct-v0.1-q4_K_M","created_at":"2024-10-04T08:03:40.224484198Z","message":{"role":"assistant","content":",\n.\nA new study by the University of Maryland and Johns Hopkins Medicine has found ..."} ``` #### Expected Behavior The `mixtral:8x22b-instruct` model should work with system prompt only, similar to other models like `gemma2:27b`. #### Actual Results mixtral:8x22b-instruct, system prompt only: NG <details><summary>Results</summary> <p> input: ```bash curl http://localhost:11434/api/chat -d '{ "model": "mixtral:8x22b-instruct-v0.1-q4_K_M", "stream": false, "messages": [ { "role": "system", "content": "Hello, I am Ollama. I am here to help you with your questions. What would you like to know?\n\nWhat is the capital of Japan?" } ] }' ``` Ollama log: ```bash {"log":"time=2024-10-04T08:03:17.462Z level=DEBUG source=routes.go:1417 msg=\"chat request\" images=0 prompt=\"\"\r\n","stream":"stdout","time":"2024-10-04T08:03:17.462471779Z"} ``` Output: ```bash {"model":"mixtral:8x22b-instruct-v0.1-q4_K_M","created_at":"2024-10-04T08:03:40.224484198Z","message":{"role":"assistant","content":",\n.\nA new study by the University of Maryland and Johns Hopkins Medicine has found ..."} ``` This result is not related to the input. </p> </details> mixtral:8x22b-instruct, system prompt + user prompt: OK <details><summary>Results</summary> <p> Input ```bash curl http://localhost:11434/api/chat -d '{ "model": "mixtral:8x22b-instruct-v0.1-q4_K_M", "stream": false, "messages": [ { "role": "system", "content": "Hello, I am Ollama. I am here to help you with your questions. What would you like to know?" }, { "role": "user", "content": "What is the capital of Japan?" } ] }' ``` Ollama log: ```bash {"log":"time=2024-10-04T08:08:04.464Z level=DEBUG source=routes.go:1417 msg=\"chat request\" images=0 prompt=\"[INST] Hello, I am Ollama. I am here to help you with your questions. What would you like to know?\\n\\nWhat is the capital of Japan?[/INST]\"\r\n","stream":"stdout","time":"2024-10-04T08:08:04.464656724Z"} ``` Output: ```bash {"model":"mixtral:8x22b-instruct-v0.1-q4_K_M","created_at":"2024-10-04T08:08:06.219305367Z","message":{"role":"assistant","content":" The capital of Japan is Tokyo. It's also the country's largest city and one of the world's most populous metropolitan areas."},"done_reason":"stop","done":true,"total_duration":1853131385,"load_duration":11283091,"prompt_eval_count":38,"prompt_eval_duration":348092000,"eval_count":32,"eval_duration":1361906000} ``` </p> </details> Other model example `gemma2:27b`, system prompt only: OK <details><summary>Results</summary> <p> Input ```bash curl http://localhost:11434/api/chat -d '{ "model": "gemma2:27b-instruct-q4_K_M", "stream": false, "messages": [ { "role": "system", "content": "Hello, I am Ollama. I am here to help you with your questions. What would you like to know?\n\nWhat is the capital of Japan?" } ] }' ``` Ollama log: ```bash {"log":"time=2024-10-04T08:12:01.403Z level=DEBUG source=routes.go:1417 msg=\"chat request\" images=0 prompt=\"\u003cstart_of_turn\u003euser\\nHello, I am Ollama. I am here to help you with your questions. What would you like to know?\\n\\nWhat is the capital of Japan? \u003cend_of_turn\u003e\\n\u003cstart_of_turn\u003emodel\\n\"\r\n","stream":"stdout","time":"2024-10-04T08:12:01.40376108Z"} ``` Output: ```bash {"model":"gemma2:27b-instruct-q4_K_M","created_at":"2024-10-04T08:10:22.460284079Z","message":{"role":"assistant","content":"The capital of Japan is **Tokyo**. 🏯 \n"},"done_reason":"stop","done":true,"total_duration":6504495197,"load_duration":6057933332,"prompt_eval_count":42,"prompt_eval_duration":66685000,"eval_count":13,"eval_duration":376322000} ``` </p> </details> #### Potentially Related Issues - https://github.com/ollama/ollama/issues/5547 - https://github.com/ollama/ollama/issues/6176 ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.12
GiteaMirror added the bug label 2026-04-28 17:52:11 -05:00
Author
Owner

@pdevine commented on GitHub (Oct 23, 2024):

This is fixed in 0.4.0. It will output:

% curl http://localhost:11434/api/chat -d '{
    "model": "mixtral:8x22b-instruct-v0.1-q4_K_M",
    "stream": false,
    "messages": [
        {
            "role": "system",
            "content": "Hello, I am Ollama. I am here to help you with your questions. What would you like to know?\n\nWhat is the capital of Japan?"
        }
    ]
}'
{"error":"Failed to create new sequence: no input provided\n"}%

cc @jessegross

<!-- gh-comment-id:2433761273 --> @pdevine commented on GitHub (Oct 23, 2024): This is fixed in `0.4.0`. It will output: ``` % curl http://localhost:11434/api/chat -d '{ "model": "mixtral:8x22b-instruct-v0.1-q4_K_M", "stream": false, "messages": [ { "role": "system", "content": "Hello, I am Ollama. I am here to help you with your questions. What would you like to know?\n\nWhat is the capital of Japan?" } ] }' {"error":"Failed to create new sequence: no input provided\n"}% ``` cc @jessegross
Author
Owner

@jessegross commented on GitHub (Oct 23, 2024):

The new runner does a better job of protecting itself (it returns an error vs. random garbage). However, I think there is an underlying issue here in the layers above it that remains.

It looks like the template processing code is outputting an empty prompt string for Mixtral vs. one with the system prompt for Gemma. I don't see an obvious reason why they should be different. I'm not sure that a message with only a system prompt really makes sense but if not, it's probably better to just detect this and return a consistent error before even going to the runner.

<!-- gh-comment-id:2433790389 --> @jessegross commented on GitHub (Oct 23, 2024): The new runner does a better job of protecting itself (it returns an error vs. random garbage). However, I think there is an underlying issue here in the layers above it that remains. It looks like the template processing code is outputting an empty prompt string for Mixtral vs. one with the system prompt for Gemma. I don't see an obvious reason why they should be different. I'm not sure that a message with only a system prompt really makes sense but if not, it's probably better to just detect this and return a consistent error before even going to the runner.
Author
Owner

@mozophe commented on GitHub (Feb 9, 2025):

@jessegross I think you are right. I just faced the same issue on Mistral Instruct finetunes.
Found the right template for Mistral Instruct and put it in the modelfile, and then created a new model. This fixed the issue.

<!-- gh-comment-id:2646029870 --> @mozophe commented on GitHub (Feb 9, 2025): @jessegross I think you are right. I just faced the same issue on Mistral Instruct finetunes. Found the right template for Mistral Instruct and put it in the modelfile, and then created a new model. This fixed the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51019