[GH-ISSUE #3474] ollama process exit but llama.cpp process remains as a zombie process #48650

Closed
opened 2026-04-28 08:59:20 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @mofanke on GitHub (Apr 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3474

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

image
then i killed the ollama process
image

What did you expect to see?

llama.cpp process exit as ollama process exit

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

No response

Architecture

No response

Platform

No response

Ollama version

No response

GPU

No response

GPU info

No response

CPU

No response

Other software

No response

Originally created by @mofanke on GitHub (Apr 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3474 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? ![image](https://github.com/ollama/ollama/assets/54242816/d27540e7-8155-4ff1-8c77-11911c761d6b) then i killed the ollama process ![image](https://github.com/ollama/ollama/assets/54242816/f0dd96c7-e11f-481d-91be-7a2aa03a8b13) ### What did you expect to see? llama.cpp process exit as ollama process exit ### Steps to reproduce _No response_ ### Are there any recent changes that introduced the issue? _No response_ ### OS _No response_ ### Architecture _No response_ ### Platform _No response_ ### Ollama version _No response_ ### GPU _No response_ ### GPU info _No response_ ### CPU _No response_ ### Other software _No response_
GiteaMirror added the bug label 2026-04-28 08:59:20 -05:00
Author
Owner

@dhiltgen commented on GitHub (Apr 12, 2024):

How did you shutdown the ollama server? If you send a sig term we should gracefully shutdown the subprocess. If you're able to repro, can you share the steps, and/or the server log with OLLAMA_DEBUG=1 set?

<!-- gh-comment-id:2052703430 --> @dhiltgen commented on GitHub (Apr 12, 2024): How did you shutdown the ollama server? If you send a sig term we should gracefully shutdown the subprocess. If you're able to repro, can you share the steps, and/or the server log with OLLAMA_DEBUG=1 set?
Author
Owner

@dhiltgen commented on GitHub (Apr 15, 2024):

We fixed a few bugs related to signaling of the subprocess recently. Please give 0.1.32 a try and see if that resolves the problem you're seeing.

<!-- gh-comment-id:2057954361 --> @dhiltgen commented on GitHub (Apr 15, 2024): We fixed a few bugs related to signaling of the subprocess recently. Please give 0.1.32 a try and see if that resolves the problem you're seeing.
Author
Owner

@dhiltgen commented on GitHub (Apr 28, 2024):

The latest release 0.1.33 (available in pre-release) further refines our subprocessing. Please give it a try, and if you're still having problems with zombies let us know.

<!-- gh-comment-id:2081601896 --> @dhiltgen commented on GitHub (Apr 28, 2024): The latest release [0.1.33](https://github.com/ollama/ollama/releases) (available in pre-release) further refines our subprocessing. Please give it a try, and if you're still having problems with zombies let us know.
Author
Owner

@Skarian commented on GitHub (Jun 13, 2024):

If you send a sig term we should gracefully shutdown the subprocess

What is the guidance on graceful shutdown of the windows CLI?

As I understand it, there is no SIGTERM on windows. Currently, when I kill ollama.exe I tend to get a zombie ollama_llama_server.exe that is using significant vRAM that I must manually kill via task manager.

I am attempting to embed the ollama windows cli into another application so this must be done programmatically for my use-case (i.e. end user won't be doing ctrl + c)

Any advice or guidance would be greatly appreciated!

<!-- gh-comment-id:2165876393 --> @Skarian commented on GitHub (Jun 13, 2024): > If you send a sig term we should gracefully shutdown the subprocess What is the guidance on graceful shutdown of the windows CLI? As I understand it, there is no SIGTERM on windows. Currently, when I kill ollama.exe I tend to get a zombie ollama_llama_server.exe that is using significant vRAM that I must manually kill via task manager. I am attempting to embed the ollama windows cli into another application so this must be done programmatically for my use-case (i.e. end user won't be doing ctrl + c) Any advice or guidance would be greatly appreciated!
Author
Owner

@dhiltgen commented on GitHub (Jun 13, 2024):

@Skarian you can use the code we use to shutdown the server in the tray app as inspiration and it is located here

<!-- gh-comment-id:2166126108 --> @dhiltgen commented on GitHub (Jun 13, 2024): @Skarian you can use the code we use to shutdown the server in the tray app as inspiration and it is located [here](https://github.com/ollama/ollama/blob/main/app/lifecycle/server_windows.go#L22-L68)
Author
Owner

@Skarian commented on GitHub (Jun 13, 2024):

@Skarian you can use the code we use to shutdown the server in the tray app as inspiration and it is located here

Thank you so much for the quick response this was extremely helpful.

For anyone else who is interested in porting this logic for use against the CLI in windows, I've made a quick script in rust.

Working like a charm 👍🏾

main.rs

use command_group::{CommandGroup, GroupChild};
use reqwest::Client;
use std::process::{Command, Stdio};

#[tokio::main]
async fn main() {
    // Start server
    println!("Ollama server started...");
    let mut child = start_server();

    // Send a request to server to spin up runners and get VRAM allocated
    println!("Engaging ollama runners and waiting thirty seconds...");
    perform_activities().await;
    wait_thirty_secs().await;

    // Gracefully shutdown the server
    child.kill().unwrap();
}

fn start_server() -> GroupChild {
    let mut command = Command::new("./ollama/ollama.exe");

    command
        .arg("serve")
        .stdout(Stdio::piped())
        .stderr(Stdio::piped())
        .group_spawn()
        .unwrap()
}

async fn perform_activities() {
    let client = Client::new();
    tokio::spawn(async move {
        let _ = client
            .post("http://localhost:11434/api/generate")
            .body(r#"{"model": "phi3", "prompt": "Why is the sky blue?"}"#)
            .header("Content-Type", "application/json")
            .send()
            .await;
    });
    // Optional: short delay to reduce the risk of the runtime shutting down too quickly
    tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
}

async fn wait_thirty_secs() {
    tokio::time::sleep(tokio::time::Duration::from_secs(30)).await;
}

Cargo.toml

[package]
name = "embedded_ollama_start_and_kill"
version = "0.1.0"
edition = "2021"

[dependencies]
reqwest = "0.12.4"
tokio = { version = "1", features = ["full"] }
lazy_static = "1.4"
command-group = { version = "5.0.1", features = ["tokio"] }
<!-- gh-comment-id:2166650117 --> @Skarian commented on GitHub (Jun 13, 2024): > @Skarian you can use the code we use to shutdown the server in the tray app as inspiration and it is located [here](https://github.com/ollama/ollama/blob/main/app/lifecycle/server_windows.go#L22-L68) Thank you so much for the quick response this was extremely helpful. For anyone else who is interested in porting this logic for use against the CLI in windows, I've made a quick script in rust. Working like a charm 👍🏾 **main.rs** ```rust use command_group::{CommandGroup, GroupChild}; use reqwest::Client; use std::process::{Command, Stdio}; #[tokio::main] async fn main() { // Start server println!("Ollama server started..."); let mut child = start_server(); // Send a request to server to spin up runners and get VRAM allocated println!("Engaging ollama runners and waiting thirty seconds..."); perform_activities().await; wait_thirty_secs().await; // Gracefully shutdown the server child.kill().unwrap(); } fn start_server() -> GroupChild { let mut command = Command::new("./ollama/ollama.exe"); command .arg("serve") .stdout(Stdio::piped()) .stderr(Stdio::piped()) .group_spawn() .unwrap() } async fn perform_activities() { let client = Client::new(); tokio::spawn(async move { let _ = client .post("http://localhost:11434/api/generate") .body(r#"{"model": "phi3", "prompt": "Why is the sky blue?"}"#) .header("Content-Type", "application/json") .send() .await; }); // Optional: short delay to reduce the risk of the runtime shutting down too quickly tokio::time::sleep(tokio::time::Duration::from_millis(100)).await; } async fn wait_thirty_secs() { tokio::time::sleep(tokio::time::Duration::from_secs(30)).await; } ``` **Cargo.toml** ```TOML [package] name = "embedded_ollama_start_and_kill" version = "0.1.0" edition = "2021" [dependencies] reqwest = "0.12.4" tokio = { version = "1", features = ["full"] } lazy_static = "1.4" command-group = { version = "5.0.1", features = ["tokio"] } ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48650