[GH-ISSUE #8395] Empty response via API #67447

Closed
opened 2026-05-04 10:23:11 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @gl2007 on GitHub (Jan 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8395

What is the issue?

Hosted ollama via 0.0.0.0 in my server in my LAN and "curl :11434 " returns ollama is running. Also, when I run ollama run in cmd in that machine, I am able to see proper responses.

However, when I run an API request via Postman, I get this empty response, irrespective of the model, which seems to indicate model is not loaded properly. This also happens on the server machine via Postman using localhost.

{
"model": "Mistral-Nemo-12B-Instruct-2407-Q8_0:latest",
"created_at": "2025-01-12T07:39:16.7356243Z",
"response": "",
"done": true,
"done_reason": "load"
}

But it seems model is loaded correctly as I see it in "ollama ps".

What am I doing wrong?

OS

Windows

GPU

None

CPU

Intel

Ollama version

0.5.4

Originally created by @gl2007 on GitHub (Jan 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8395 ### What is the issue? Hosted ollama via 0.0.0.0 in my server in my LAN and "curl <ip>:11434 " returns ollama is running. Also, when I run ollama run<model> in cmd in that machine, I am able to see proper responses. However, when I run an API request via Postman, I get this empty response, irrespective of the model, which seems to indicate model is not loaded properly. This also happens on the server machine via Postman using localhost. { "model": "Mistral-Nemo-12B-Instruct-2407-Q8_0:latest", "created_at": "2025-01-12T07:39:16.7356243Z", "response": "", "done": true, "done_reason": "load" } But it seems model is loaded correctly as I see it in "ollama ps". What am I doing wrong? ### OS Windows ### GPU None ### CPU Intel ### Ollama version 0.5.4
GiteaMirror added the bug label 2026-05-04 10:23:11 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 12, 2025):

You need to send a prompt. This is the response when the model is loaded but hasn't been asked to do a completion: "done_reason": "load"

<!-- gh-comment-id:2585687010 --> @rick-github commented on GitHub (Jan 12, 2025): You need to send a prompt. This is the response when the model is loaded but hasn't been asked to do a completion: `"done_reason": "load"`
Author
Owner

@gl2007 commented on GitHub (Jan 12, 2025):

You need to send a prompt. This is the response when the model is loaded but hasn't been asked to do a completion: "done_reason": "load"

I did send a prompt:

{
"model": "Mistral-Nemo-12B-Instruct-2407-Q8_0:latest", //insert any models from Ollama that are on your local machine
"messages": [
{
"role": "system", //"system" is a prompt to define how the model should act.
"content": "you are a salty pirate" //system prompt should be written here
},
{
"role": "user", //"user" is a prompt provided by the user.
"content": "why is the sky blue" //user prompt should be written here
}
],
"stream": false //returns as a full message rather than a streamed response
}

<!-- gh-comment-id:2585906449 --> @gl2007 commented on GitHub (Jan 12, 2025): > You need to send a prompt. This is the response when the model is loaded but hasn't been asked to do a completion: `"done_reason": "load"` I did send a prompt: { "model": "Mistral-Nemo-12B-Instruct-2407-Q8_0:latest", //insert any models from Ollama that are on your local machine "messages": [ { "role": "system", //"system" is a prompt to define how the model should act. "content": "you are a salty pirate" //system prompt should be written here }, { "role": "user", //"user" is a prompt provided by the user. "content": "why is the sky blue" //user prompt should be written here } ], "stream": false //returns as a full message rather than a streamed response }
Author
Owner

@rick-github commented on GitHub (Jan 12, 2025):

Which endpoint did you use?

<!-- gh-comment-id:2585933452 --> @rick-github commented on GitHub (Jan 12, 2025): Which endpoint did you use?
Author
Owner

@gl2007 commented on GitHub (Jan 12, 2025):

http://:11434/api/generate and using it via Postman, via POST. Using Postman documentation example prompts from here

<!-- gh-comment-id:2585942877 --> @gl2007 commented on GitHub (Jan 12, 2025): http://<my-server-ip>:11434/api/generate and using it via Postman, via POST. Using Postman documentation example prompts from [here](https://www.postman.com/postman-student-programs/ollama-api/documentation/suc47x8/ollama-rest-api)
Author
Owner

@rick-github commented on GitHub (Jan 12, 2025):

For /api/generate the format is {"model":"Mistral-Nemo-12B-Instruct-2407-Q8_0:latest","prompt":"why is the sky blue"}. The format you are using above is for the /api/chat endpoint.

<!-- gh-comment-id:2585945663 --> @rick-github commented on GitHub (Jan 12, 2025): For `/api/generate` the format is `{"model":"Mistral-Nemo-12B-Instruct-2407-Q8_0:latest","prompt":"why is the sky blue"}`. The format you are using above is for the `/api/chat` endpoint.
Author
Owner

@gl2007 commented on GitHub (Jan 13, 2025):

For /api/generate the format is {"model":"Mistral-Nemo-12B-Instruct-2407-Q8_0:latest","prompt":"why is the sky blue"}. The format you are using above is for the /api/chat endpoint.

TY so much! Not sure how I missed that. Its working now!

However, now if that is the syntax, how do I tell it the system prompt and prompts for all the subsequent requests? Can't I have "session behavior"?

<!-- gh-comment-id:2586076794 --> @gl2007 commented on GitHub (Jan 13, 2025): > For `/api/generate` the format is `{"model":"Mistral-Nemo-12B-Instruct-2407-Q8_0:latest","prompt":"why is the sky blue"}`. The format you are using above is for the `/api/chat` endpoint. TY so much! Not sure how I missed that. Its working now! However, now if that is the syntax, how do I tell it the system prompt and prompts for all the subsequent requests? Can't I have "session behavior"?
Author
Owner

@rick-github commented on GitHub (Jan 13, 2025):

The ollama server is stateless. If want to maintain conversation history, you accumulate the prompts and responses in the client and append them to messages[] when you send the request to the API.

<!-- gh-comment-id:2586078700 --> @rick-github commented on GitHub (Jan 13, 2025): The ollama server is stateless. If want to maintain conversation history, you accumulate the prompts and responses in the client and append them to `messages[]` when you send the request to the API.
Author
Owner

@pdevine commented on GitHub (Jan 13, 2025):

It's definitely harder to do with /api/generate. You probably want to start off with /api/chat initially.

I'll go ahead and close the issue, but feel free to keep commenting.

<!-- gh-comment-id:2588005776 --> @pdevine commented on GitHub (Jan 13, 2025): It's definitely harder to do with `/api/generate`. You probably want to start off with `/api/chat` initially. I'll go ahead and close the issue, but feel free to keep commenting.
Author
Owner

@gl2007 commented on GitHub (Jan 13, 2025):

It's definitely harder to do with /api/generate. You probably want to start off with /api/chat initially.

I'll go ahead and close the issue, but feel free to keep commenting.

Agreed. However, does that mean the chat endpoint maintains state? do I have to do anything for it to maintain state and if I pass in a new system prompt, does that start a new "session"?

<!-- gh-comment-id:2588059743 --> @gl2007 commented on GitHub (Jan 13, 2025): > It's definitely harder to do with `/api/generate`. You probably want to start off with `/api/chat` initially. > > I'll go ahead and close the issue, but feel free to keep commenting. Agreed. However, does that mean the chat endpoint maintains state? do I have to do anything for it to maintain state and if I pass in a new system prompt, does that start a new "session"?
Author
Owner

@rick-github commented on GitHub (Jan 13, 2025):

No, the ollama server is stateless.

<!-- gh-comment-id:2588411284 --> @rick-github commented on GitHub (Jan 13, 2025): No, the ollama server is stateless.
Author
Owner

@pdevine commented on GitHub (Jan 14, 2025):

Just keep adding each of the messages back to each request (both the user and assistant messages) to have a conversation.

<!-- gh-comment-id:2589325494 --> @pdevine commented on GitHub (Jan 14, 2025): Just keep adding each of the messages back to each request (both the user and assistant messages) to have a conversation.
Author
Owner

@gl2007 commented on GitHub (Jan 14, 2025):

Ok, sounds ok to me but wondering if making it stateful with an optional session based approach would be worthwhile. Would save some work for every client consuming this to write same code.

<!-- gh-comment-id:2591105860 --> @gl2007 commented on GitHub (Jan 14, 2025): Ok, sounds ok to me but wondering if making it stateful with an optional session based approach would be worthwhile. Would save some work for every client consuming this to write same code.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67447