[GH-ISSUE #11337] Progressive streaming output versus static output #53990

Closed
opened 2026-04-29 05:03:37 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @JonRiv on GitHub (Jul 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11337

At the moment the interactive CLI display is progressively streaming the output, chunks of text are displayed on screen as they are being generated. An option enabling all the generated output to be displayed at once and not progressively could be appreciated.

OLLAMA_TERMINAL_OUTPUT_DISPLAY=streaming #would accept an additional "static" option

Having a look at the code, I believe the logic could be implemented in progress/progress.go in the render() function where instead of dumping to the terminal the available output as it comes, it could be cached until the inference is finished.

Would this be of interest ? I can have a try and design a prototype PR.

Originally created by @JonRiv on GitHub (Jul 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11337 At the moment the interactive CLI display is progressively streaming the output, chunks of text are displayed on screen as they are being generated. An option enabling all the generated output to be displayed at once and not progressively could be appreciated. ```bash OLLAMA_TERMINAL_OUTPUT_DISPLAY=streaming #would accept an additional "static" option ``` Having a look at the code, I believe the logic could be implemented in `progress/progress.go` in the `render()` function where instead of dumping to the terminal the available output as it comes, it could be cached until the inference is finished. Would this be of interest ? I can have a try and design a prototype PR.
GiteaMirror added the feature request label 2026-04-29 05:03:37 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 9, 2025):

If you don't want streaming output, you can use a small script with the API:

#!/usr/bin/env python3

import ollama
import argparse
import sys
try:
  import readline
except:
  pass

parser = argparse.ArgumentParser()
parser.add_argument("model")
parser.add_argument("prompts", nargs='*')
args = parser.parse_args()

client = ollama.Client()
userprompt = ">>> " if sys.stdin.isatty() else ""

def chat(messages, prompt):
  messages.append({"role":"user", "content": prompt})
  response = client.chat(model=args.model, messages=messages, stream=False)
  print(response['message']['content'])
  messages.append({"role": "assistant", "content": response['message']['content']})
  return messages

messages = []
for prompt in args.prompts:
  messages = chat(messages, prompt)
if len(args.prompts) == 0:
  while True:
    try:
      prompt = input(userprompt)
    except:
      print()
      break
    if prompt == "/bye":
      break
    messages = chat(messages, prompt)
$ ./ollama-nostream.py qwen2.5:0.5b hello
Hello! How can I assist you today?
$ ./ollama-nostream.py qwen2.5:0.5b
>>> hello
Hello! How can I assist you today?
>>> /bye
$
<!-- gh-comment-id:3051880045 --> @rick-github commented on GitHub (Jul 9, 2025): If you don't want streaming output, you can use a small script with the API: ```python #!/usr/bin/env python3 import ollama import argparse import sys try: import readline except: pass parser = argparse.ArgumentParser() parser.add_argument("model") parser.add_argument("prompts", nargs='*') args = parser.parse_args() client = ollama.Client() userprompt = ">>> " if sys.stdin.isatty() else "" def chat(messages, prompt): messages.append({"role":"user", "content": prompt}) response = client.chat(model=args.model, messages=messages, stream=False) print(response['message']['content']) messages.append({"role": "assistant", "content": response['message']['content']}) return messages messages = [] for prompt in args.prompts: messages = chat(messages, prompt) if len(args.prompts) == 0: while True: try: prompt = input(userprompt) except: print() break if prompt == "/bye": break messages = chat(messages, prompt) ``` ```console $ ./ollama-nostream.py qwen2.5:0.5b hello Hello! How can I assist you today? $ ./ollama-nostream.py qwen2.5:0.5b >>> hello Hello! How can I assist you today? >>> /bye $ ```
Author
Owner

@JonRiv commented on GitHub (Jul 10, 2025):

Thank you @rick-github for a working solution :)
I believe an option could still be valuable, so I'll let the issue progress.

<!-- gh-comment-id:3056991607 --> @JonRiv commented on GitHub (Jul 10, 2025): Thank you @rick-github for a working solution :) I believe an option could still be valuable, so I'll let the issue progress.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53990