[GH-ISSUE #1695] How to make the model stop generating response when using via API? #956

Closed
opened 2026-04-12 10:39:19 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @EliasPereirah on GitHub (Dec 24, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1695

When using Via Cli I can give Ctrl+C, but how to do it via API?
Can anyone help me with this?

Originally created by @EliasPereirah on GitHub (Dec 24, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1695 When using Via Cli I can give Ctrl+C, but how to do it via API? Can anyone help me with this?
Author
Owner

@BruceMacD commented on GitHub (Dec 24, 2023):

Hi @EliasPereirah when the connection is closed Ollama should stop generating. When you press ctrl+c in the CLI that is what is happening.

Here is a quick example in Python of closing a connection, which should stop Ollama from generating more on the back-end:

import requests
import json

url = 'http://localhost:11434/api/generate'

data = {
    'model': 'llama2',
    'prompt': 'Why is the sky blue?'
}

response = requests.post(url, json=data)

# Close the connection rather than reading through the streamed response, which stops Ollama from continuing to generate
response.close()
<!-- gh-comment-id:1868530022 --> @BruceMacD commented on GitHub (Dec 24, 2023): Hi @EliasPereirah when the connection is closed Ollama should stop generating. When you press ctrl+c in the CLI that is what is happening. Here is a quick example in Python of closing a connection, which should stop Ollama from generating more on the back-end: ```Python import requests import json url = 'http://localhost:11434/api/generate' data = { 'model': 'llama2', 'prompt': 'Why is the sky blue?' } response = requests.post(url, json=data) # Close the connection rather than reading through the streamed response, which stops Ollama from continuing to generate response.close() ```
Author
Owner

@EliasPereirah commented on GitHub (Dec 24, 2023):

Hi @EliasPereirah when the connection is closed Ollama should stop generating. When you press ctrl+c in the CLI that is what is happening.

Here is a quick example in Python of closing a connection, which should stop Ollama from generating more on the back-end:

import requests
import json

url = 'http://localhost:11434/api/generate'

data = {
    'model': 'llama2',
    'prompt': 'Why is the sky blue?'
}

response = requests.post(url, json=data)

# Close the connection rather than reading through the streamed response, which stops Ollama from continuing to generate
response.close()

Thank you very much, but I don't think that's exactly what I want.
I'll explain my use case, maybe it will be clearer.
I'm creating my own interface to communicate with the ollama API and sometimes the model used starts to hallucinate, in this case I want to leave a button on the web interface that I can click and the answer stops being generated, so I can ask a new question /interaction because having two responses running at the same time would be too heavy for my machine

<!-- gh-comment-id:1868542571 --> @EliasPereirah commented on GitHub (Dec 24, 2023): > Hi @EliasPereirah when the connection is closed Ollama should stop generating. When you press ctrl+c in the CLI that is what is happening. > > Here is a quick example in Python of closing a connection, which should stop Ollama from generating more on the back-end: > > ```python > import requests > import json > > url = 'http://localhost:11434/api/generate' > > data = { > 'model': 'llama2', > 'prompt': 'Why is the sky blue?' > } > > response = requests.post(url, json=data) > > # Close the connection rather than reading through the streamed response, which stops Ollama from continuing to generate > response.close() > ``` Thank you very much, but I don't think that's exactly what I want. I'll explain my use case, maybe it will be clearer. I'm creating my own interface to communicate with the ollama API and sometimes the model used starts to hallucinate, in this case I want to leave a button on the web interface that I can click and the answer stops being generated, so I can ask a new question /interaction because having two responses running at the same time would be too heavy for my machine
Author
Owner

@rgaidot commented on GitHub (Dec 28, 2023):

Have you set up signal on your fetch with AbortController?

const controller: AbortController = new AbortController();

const stream = await fetch('<...>', {
    method: 'POST',
    headers: {<...>},
    body: JSON.stringify({<...>}),
    signal: controller.signal,
});

now you can call controller.abort(); to stop

<!-- gh-comment-id:1871412497 --> @rgaidot commented on GitHub (Dec 28, 2023): Have you set up `signal` on your `fetch` with `AbortController`? ```js const controller: AbortController = new AbortController(); const stream = await fetch('<...>', { method: 'POST', headers: {<...>}, body: JSON.stringify({<...>}), signal: controller.signal, }); ``` now you can call `controller.abort();` to stop
Author
Owner

@pdevine commented on GitHub (Jan 3, 2024):

@EliasPereirah I think this should answer your question, so I'm going to go ahead and close. Feel free to reopen it if there's still an issue!

<!-- gh-comment-id:1874777543 --> @pdevine commented on GitHub (Jan 3, 2024): @EliasPereirah I think this should answer your question, so I'm going to go ahead and close. Feel free to reopen it if there's still an issue!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#956