[GH-ISSUE #8510] Microsoft Phi-4 runs fine in Ollama but not in API #51996

Closed
opened 2026-04-28 21:33:33 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @MaxAkbar on GitHub (Jan 20, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8510

What is the issue?

  1. Downlaod ollama by running the following command: ollama run phi4:14b-q8_0
  2. Once the model is loaded ask it a simple questions: "What can you do keep it short." Notice that the response is valid.
  3. Now call the API using the following curl request: curl http://localhost:11434/api/generate -d "{ \"model\": \"phi4:14b-q8_0\", \"prompt\": \"What can you do keep it short.\" , \"stream\": false}" You will notice that although you have a response is not answering the question.

Here is what I got back from the API:
curl http://localhost:11434/api/generate -d "{\"model\": \"phi4:14b-q8_0\", \"prompt\": \"What can you do keep it short.\", \"stream\": false}"

{
  "model": "phi4:14b-q8_0",
  "created_at": "2025-01-20T22:31:36.6626843Z",
  "response": "To keep writing or communication concise:\n\n1. **Focus on Key Points**: Identify the main ideas and ensure they're clear.\n2. **Use Simple Language**: Avoid jargon and complex sentences.\n3. **Be Direct**: Get to the point quickly without unnecessary details.\n4. **Eliminate Redundancy**: Remove repetitive information.\n5. **Prioritize Information**: Arrange content by importance.\n6. **Use Bullet Points**: For lists or important points, bullet points enhance clarity.\n\nBy applying these strategies, you can convey your message effectively in fewer words.",
  "done": true,
  "done_reason": "stop",
  "context": [
    100264, 882, 100266, 198, 3923, 649, 499, 656, 2567, 433, 2875, 13, 100265,
    198, 100264, 78191, 100266, 198, 1271, 2567, 4477, 477, 10758, 64694, 1473,
    16, 13, 3146, 14139, 389, 5422, 21387, 96618, 65647, 279, 1925, 6848, 323,
    6106, 814, 2351, 2867, 627, 17, 13, 3146, 10464, 9170, 11688, 96618, 35106,
    503, 71921, 323, 6485, 23719, 627, 18, 13, 3146, 3513, 7286, 96618, 2175,
    311, 279, 1486, 6288, 2085, 26225, 3649, 627, 19, 13, 3146, 42113, 3357,
    3816, 1263, 6709, 96618, 11016, 59177, 2038, 627, 20, 13, 3146, 50571,
    27406, 8245, 96618, 41680, 2262, 555, 12939, 627, 21, 13, 3146, 10464,
    32912, 21387, 96618, 1789, 11725, 477, 3062, 3585, 11, 17889, 3585, 18885,
    32373, 382, 1383, 19486, 1521, 15174, 11, 499, 649, 20599, 701, 1984, 13750,
    304, 17162, 4339, 13
  ],
  "total_duration": 2704592600,
  "load_duration": 12532000,
  "prompt_eval_count": 18,
  "prompt_eval_duration": 4000000,
  "eval_count": 113,
  "eval_duration": 2686000000
}

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.5.7

Originally created by @MaxAkbar on GitHub (Jan 20, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8510 ### What is the issue? 1. Downlaod ollama by running the following command: `ollama run phi4:14b-q8_0` 2. Once the model is loaded ask it a simple questions: "What can you do keep it short." Notice that the response is valid. 3. Now call the API using the following curl request: `curl http://localhost:11434/api/generate -d "{ \"model\": \"phi4:14b-q8_0\", \"prompt\": \"What can you do keep it short.\" , \"stream\": false}"` You will notice that although you have a response is not answering the question. Here is what I got back from the API: `curl http://localhost:11434/api/generate -d "{\"model\": \"phi4:14b-q8_0\", \"prompt\": \"What can you do keep it short.\", \"stream\": false}"` ``` { "model": "phi4:14b-q8_0", "created_at": "2025-01-20T22:31:36.6626843Z", "response": "To keep writing or communication concise:\n\n1. **Focus on Key Points**: Identify the main ideas and ensure they're clear.\n2. **Use Simple Language**: Avoid jargon and complex sentences.\n3. **Be Direct**: Get to the point quickly without unnecessary details.\n4. **Eliminate Redundancy**: Remove repetitive information.\n5. **Prioritize Information**: Arrange content by importance.\n6. **Use Bullet Points**: For lists or important points, bullet points enhance clarity.\n\nBy applying these strategies, you can convey your message effectively in fewer words.", "done": true, "done_reason": "stop", "context": [ 100264, 882, 100266, 198, 3923, 649, 499, 656, 2567, 433, 2875, 13, 100265, 198, 100264, 78191, 100266, 198, 1271, 2567, 4477, 477, 10758, 64694, 1473, 16, 13, 3146, 14139, 389, 5422, 21387, 96618, 65647, 279, 1925, 6848, 323, 6106, 814, 2351, 2867, 627, 17, 13, 3146, 10464, 9170, 11688, 96618, 35106, 503, 71921, 323, 6485, 23719, 627, 18, 13, 3146, 3513, 7286, 96618, 2175, 311, 279, 1486, 6288, 2085, 26225, 3649, 627, 19, 13, 3146, 42113, 3357, 3816, 1263, 6709, 96618, 11016, 59177, 2038, 627, 20, 13, 3146, 50571, 27406, 8245, 96618, 41680, 2262, 555, 12939, 627, 21, 13, 3146, 10464, 32912, 21387, 96618, 1789, 11725, 477, 3062, 3585, 11, 17889, 3585, 18885, 32373, 382, 1383, 19486, 1521, 15174, 11, 499, 649, 20599, 701, 1984, 13750, 304, 17162, 4339, 13 ], "total_duration": 2704592600, "load_duration": 12532000, "prompt_eval_count": 18, "prompt_eval_duration": 4000000, "eval_count": 113, "eval_duration": 2686000000 } ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.7
GiteaMirror added the bug label 2026-04-28 21:33:33 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 21, 2025):

What was the response you got from the CLI? I found them pretty similar.

$ ollama run phi4:14b-q8_0
>>> What can you do keep it short.
To keep your writing or presentation short and concise, consider these tips:

1. **Identify the Core Message**: Focus on the main point you want to convey.
2. **Use Simple Language**: Avoid jargon and complex sentences.
3. **Eliminate Redundancies**: Remove repetitive information.
4. **Prioritize Information**: Include only what is necessary for understanding.
5. **Be Direct**: Get straight to the point without unnecessary introductions or conclusions.
6. **Utilize Bullet Points**: Break down information into concise, digestible points.
7. **Edit Ruthlessly**: Cut out any fluff or non-essential content.

By applying these strategies, you can ensure your message is clear and concise.

>>> Send a message (/? for help)
$ curl -s http://localhost:11434/api/generate -d "{ \"model\": \"phi4:14b-q8_0\",  \"prompt\": \"What can you do keep it short.\" , \"stream\": false}" | jq -r .response
To keep responses short, I focus on these strategies:

1. **Clarity**: Deliver clear and concise information.
2. **Relevance**: Address only the core aspects of your query.
3. **Brevity**: Use direct language to avoid unnecessary details.

If you have a specific topic or question in mind, feel free to ask!

A slightly different prompt provides a better response.

$ curl -s http://localhost:11434/api/generate -d "{ \"model\": \"phi4:14b-q8_0\",  \"prompt\": \"What can you do? keep the answer short.\" , \"stream\": false}" | jq -r .response
I can assist by providing information, answering questions, offering explanations, helping with problem-solving, and more, depending on your needs. How can I help you today?
<!-- gh-comment-id:2603948383 --> @rick-github commented on GitHub (Jan 21, 2025): What was the response you got from the CLI? I found them pretty similar. ```console $ ollama run phi4:14b-q8_0 >>> What can you do keep it short. To keep your writing or presentation short and concise, consider these tips: 1. **Identify the Core Message**: Focus on the main point you want to convey. 2. **Use Simple Language**: Avoid jargon and complex sentences. 3. **Eliminate Redundancies**: Remove repetitive information. 4. **Prioritize Information**: Include only what is necessary for understanding. 5. **Be Direct**: Get straight to the point without unnecessary introductions or conclusions. 6. **Utilize Bullet Points**: Break down information into concise, digestible points. 7. **Edit Ruthlessly**: Cut out any fluff or non-essential content. By applying these strategies, you can ensure your message is clear and concise. >>> Send a message (/? for help) ``` ```console $ curl -s http://localhost:11434/api/generate -d "{ \"model\": \"phi4:14b-q8_0\", \"prompt\": \"What can you do keep it short.\" , \"stream\": false}" | jq -r .response To keep responses short, I focus on these strategies: 1. **Clarity**: Deliver clear and concise information. 2. **Relevance**: Address only the core aspects of your query. 3. **Brevity**: Use direct language to avoid unnecessary details. If you have a specific topic or question in mind, feel free to ask! ``` A slightly different prompt provides a better response. ```console $ curl -s http://localhost:11434/api/generate -d "{ \"model\": \"phi4:14b-q8_0\", \"prompt\": \"What can you do? keep the answer short.\" , \"stream\": false}" | jq -r .response I can assist by providing information, answering questions, offering explanations, helping with problem-solving, and more, depending on your needs. How can I help you today? ```
Author
Owner

@MaxAkbar commented on GitHub (Jan 21, 2025):

@rick-github Strange I tried it this morning and yes, the responses are similar. Last night I was getting responses that were different and from the console it was specific to what the model could do. This is from memory:
generating human like text
answering questions
language assistance
I remember something about reasoning
explaining concepts

I thought it was related to this Phi-4 Finetuning + Bug Fixes by Unsloth but when I downloaded their model from huggingface.co/unsloth got same results.

I did notice that when I run the model via LM Studio, I get the expected results.

I will try it again later tonight with a few different prompts and see what I get back.

Thank you.

<!-- gh-comment-id:2605370302 --> @MaxAkbar commented on GitHub (Jan 21, 2025): @rick-github Strange I tried it this morning and yes, the responses are similar. Last night I was getting responses that were different and from the console it was specific to what the model could do. This is from memory: generating human like text answering questions language assistance I remember something about reasoning explaining concepts I thought it was related to this [Phi-4 Finetuning + Bug Fixes by Unsloth](https://unsloth.ai/blog/phi4) but when I downloaded their model from [huggingface.co/unsloth](https://huggingface.co/unsloth/phi-4-GGUF) got same results. I did notice that when I run the model via LM Studio, I get the expected results. I will try it again later tonight with a few different prompts and see what I get back. Thank you.
Author
Owner

@MaxAkbar commented on GitHub (Jan 22, 2025):

Ok I tried a few prompts, and it seems that if you break the sentence in two then you will get the expected response. Closing this as it's not a bug sorry for the trouble.

<!-- gh-comment-id:2606193872 --> @MaxAkbar commented on GitHub (Jan 22, 2025): Ok I tried a few prompts, and it seems that if you break the sentence in two then you will get the expected response. Closing this as it's not a bug sorry for the trouble.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51996