[GH-ISSUE #1474] subprocess or pexpect rather than the API #47307

Closed
opened 2026-04-28 03:34:05 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @MikeyBeez on GitHub (Dec 11, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1474

I find that Ollama is fast enough, but the API is very slow. I've been trying to use something like subprocess. The is program runs, but waiting for the output is torturously slow:

import subprocess

def run_ollama(model_name):
# Build the Ollama command
ollama_command = f"ollama run {model_name}"

# Start Ollama as a subprocess
process = subprocess.Popen(ollama_command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, shell=True)

# Enter the interactive loop
while True:
    # Get user input for the prompt
    user_input = input("Enter prompt (type 'exit' to end): ")

    # Check if the user wants to exit
    if user_input.lower() == 'exit':
        break

    # Send the user input to Ollama
    process.stdin.write(user_input + '\n')
    process.stdin.flush()

    # Read and print the output from Ollama
    output, error = process.communicate()
    print("Ollama Output:", output.strip())
    print("Ollama Error:", error.strip())

# Close the subprocess
process.stdin.close()
process.stdout.close()
process.stderr.close()
process.terminate()

if name == "main":
# Get the model name from the command line arguments
import sys
if len(sys.argv) != 2:
print("Usage: python script.py <model_name>")
sys.exit(1)

model_name = sys.argv[1]

# Run Ollama with the specified model
run_ollama(model_name)

Attempts to stream the output as it is being created have failed.  Even using the pexpect module fails, I believe because of the animated prompt.  Is there a way to run this as a subprocess and get the results back word by word?      
Originally created by @MikeyBeez on GitHub (Dec 11, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1474 I find that Ollama is fast enough, but the API is very slow. I've been trying to use something like subprocess. The is program runs, but waiting for the output is torturously slow: import subprocess def run_ollama(model_name): # Build the Ollama command ollama_command = f"ollama run {model_name}" # Start Ollama as a subprocess process = subprocess.Popen(ollama_command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, shell=True) # Enter the interactive loop while True: # Get user input for the prompt user_input = input("Enter prompt (type 'exit' to end): ") # Check if the user wants to exit if user_input.lower() == 'exit': break # Send the user input to Ollama process.stdin.write(user_input + '\n') process.stdin.flush() # Read and print the output from Ollama output, error = process.communicate() print("Ollama Output:", output.strip()) print("Ollama Error:", error.strip()) # Close the subprocess process.stdin.close() process.stdout.close() process.stderr.close() process.terminate() if __name__ == "__main__": # Get the model name from the command line arguments import sys if len(sys.argv) != 2: print("Usage: python script.py <model_name>") sys.exit(1) model_name = sys.argv[1] # Run Ollama with the specified model run_ollama(model_name) Attempts to stream the output as it is being created have failed. Even using the pexpect module fails, I believe because of the animated prompt. Is there a way to run this as a subprocess and get the results back word by word?
Author
Owner

@mxyng commented on GitHub (Dec 11, 2023):

I'm not sure how you're using the API but if you're outputting the generation outputs using print, you'll need to flush, e.g. print(response, end='', flush=True), for each call otherwise print will buffer.

Also make sure you set stream=True so each token is returned as soon as its generated

The performance difference between subprocess vs. API should be negligible.

<!-- gh-comment-id:1851013319 --> @mxyng commented on GitHub (Dec 11, 2023): I'm not sure how you're using the API but if you're outputting the generation outputs using `print`, you'll need to flush, e.g. `print(response, end='', flush=True)`, for each call otherwise print will buffer. Also make sure you set `stream=True` so each token is returned as soon as its generated The performance difference between subprocess vs. API should be negligible.
Author
Owner

@MikeyBeez commented on GitHub (Dec 11, 2023):

Thanks.

<!-- gh-comment-id:1851016383 --> @MikeyBeez commented on GitHub (Dec 11, 2023): Thanks.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47307