[GH-ISSUE #9813] Properly Interrupting Inference in Ollama #32182

Closed
opened 2026-04-22 13:13:26 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @20246688 on GitHub (Mar 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9813

How can I properly interrupt the inference in Ollama when I notice the output doesn't match the expected content during streaming?

Originally created by @20246688 on GitHub (Mar 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9813 How can I properly interrupt the inference in Ollama when I notice the output doesn't match the expected content during streaming?
GiteaMirror added the feature request label 2026-04-22 13:13:26 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 17, 2025):

Close the connection.

<!-- gh-comment-id:2729021203 --> @rick-github commented on GitHub (Mar 17, 2025): Close the connection.
Author
Owner

@20246688 commented on GitHub (Mar 18, 2025):

Close the connection.

Do you mean client._client.aclose()?

<!-- gh-comment-id:2731310179 --> @20246688 commented on GitHub (Mar 18, 2025): > Close the connection. Do you mean client._client.aclose()?
Author
Owner

@rick-github commented on GitHub (Mar 18, 2025):

Depends on your client. If you are using curl, then ^C and stop the program, thereby closing the connection. If you are using the ollama python client, close the connection:

#!/usr/bin/env python3

import ollama

response = ollama.Client().chat(model='deepseek-r1:7b', messages=[{"role":"user","content":"2+2=?"}], stream=True)
i = 0
for r in response:
  c = r['message']['content']
  if i == 0 and c != '<think>':
    response.close()
    print("reasoning model didn't start with <think>")
    break
  print(c, end='', flush=True)
  i += 1
print()

Whatever your client uses, closing the connection will tell the ollama server that the client no longer needs the output and the server will stop the inference.

<!-- gh-comment-id:2731353450 --> @rick-github commented on GitHub (Mar 18, 2025): Depends on your client. If you are using curl, then ^C and stop the program, thereby closing the connection. If you are using the ollama python client, close the connection: ```python #!/usr/bin/env python3 import ollama response = ollama.Client().chat(model='deepseek-r1:7b', messages=[{"role":"user","content":"2+2=?"}], stream=True) i = 0 for r in response: c = r['message']['content'] if i == 0 and c != '<think>': response.close() print("reasoning model didn't start with <think>") break print(c, end='', flush=True) i += 1 print() ``` Whatever your client uses, closing the connection will tell the ollama server that the client no longer needs the output and the server will stop the inference.
Author
Owner

@20246688 commented on GitHub (Mar 18, 2025):

For example:

async def async_chat(message, stop_event):
    client = None
    try:
        client = AsyncClient(host=HOST)
        response = await client.chat(model=MODEL, messages=[{'role': 'user', 'content': message}], stream=True)
       
        complete_message = ''
        is_first_content = True

        async for part in response:
            if stop_event.is_set():
                return 'Error'

            content = part['message']['content']

            if is_first_content:
                if not content.startswith(('E'):
                    complete_message = 'Error'
                    print(complete_message)
                    stop_event.set()
                    return complete_message

                is_first_content = False

            print(content, end='', flush=True)
            complete_message += content
       
        return complete_message

    except Exception as e:
        return 'Error'
   
    finally:
        if client:
            await client._client.aclose()

I’m currently using client._client.aclose() to close the connection, but I’m not sure if it’s the correct way to do it.

<!-- gh-comment-id:2731387274 --> @20246688 commented on GitHub (Mar 18, 2025): For example: async def async_chat(message, stop_event):     client = None     try:         client = AsyncClient(host=HOST)         response = await client.chat(model=MODEL, messages=[{'role': 'user', 'content': message}], stream=True)                 complete_message = ''         is_first_content = True         async for part in response:             if stop_event.is_set():                 return 'Error'             content = part['message']['content']             if is_first_content:                 if not content.startswith(('E'):                     complete_message = 'Error'                     print(complete_message)                     stop_event.set()                     return complete_message                 is_first_content = False             print(content, end='', flush=True)             complete_message += content                 return complete_message     except Exception as e:         return 'Error'         finally:         if client:             await client._client.aclose() I’m currently using client._client.aclose() to close the connection, but I’m not sure if it’s the correct way to do it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32182