[GH-ISSUE #14761] Ollama desktop app respawns ollama.exe serve after manual process termination on Windows #56055

Closed
opened 2026-04-29 10:12:15 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @crow8417 on GitHub (Mar 10, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14761

What is the issue?

Summary

On Windows, terminating ollama.exe serve does not keep Ollama offline if the Ollama desktop application is still running. The server process is automatically respawned by ollama app.exe.

Environment

  • OS: Windows
  • Ollama installed locally via desktop app
  • Observed executable paths:
    • C:\Users\sasha\AppData\Local\Programs\Ollama\ollama.exe
    • C:\Users\sasha\AppData\Local\Programs\Ollama\ollama app.exe

Observed behavior

  1. ollama.exe serve was running normally.
  2. I terminated the server process with PowerShell.
  3. A new ollama.exe serve process appeared shortly afterward.
  4. There was no obvious Windows service named ollama responsible for the restart.
  5. Inspecting the respawned process showed its parent process was ollama app.exe.
  6. The respawn stopped only after terminating both:
    • ollama.exe
    • ollama app.exe

Expected behavior

If the user manually terminates ollama.exe serve, it should be clear that the desktop app will respawn it, or there should be an obvious and controllable setting for that behavior.

Actual behavior

ollama app.exe relaunches ollama.exe serve automatically, which makes it difficult to intentionally take Ollama offline by stopping only the server process.

Evidence

  • Respawned process command line:
    • C:\Users\sasha\AppData\Local\Programs\Ollama\ollama.exe serve
  • Parent process of respawned server:
    • ollama app.exe

Impact

This makes troubleshooting and controlled field testing harder because stopping the server alone is not sufficient to disable local model availability.

Workaround

Terminate both processes:

  • ollama.exe
  • ollama app.exe

Ollama Respawn Incident Log.txt

Relevant log output

app.log tail

time=2026-03-09T19:39:05.505+08:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Users\sasha\AppData\Local\Programs\Ollama version=0.17.7 OS=Windows/10.0.26200
time=2026-03-09T19:39:05.507+08:00 level=INFO source=app.go:239 msg="initialized tools registry" tool_count=0
time=2026-03-09T19:39:05.515+08:00 level=INFO source=app.go:254 msg="starting ollama server"
time=2026-03-09T19:39:05.515+08:00 level=INFO source=app.go:285 msg="starting ui server" port=61705
time=2026-03-09T19:39:08.533+08:00 level=INFO source=updater.go:296 msg="beginning update checker" interval=1h0m0s

time=2026-03-10T12:06:44.574+08:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Users\sasha\AppData\Local\Programs\Ollama version=0.17.7 OS=Windows/10.0.26200
time=2026-03-10T12:06:44.966+08:00 level=INFO source=eventloop.go:328 msg="sent focus request to existing instance"
time=2026-03-10T12:06:44.966+08:00 level=INFO source=app_windows.go:79 msg="existing instance found, exiting"

time=2026-03-10T12:06:45.189+08:00 level=INFO source=ui.go:159 msg="configuring ollama proxy" target=http://127.0.0.1:11434
time=2026-03-10T12:06:45.246+08:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/cloud http.status=200 version=0.17.7
time=2026-03-10T12:06:45.248+08:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.status=200 version=0.17.7
time=2026-03-10T12:06:45.251+08:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/chats http.status=200 version=0.17.7
time=2026-03-10T12:06:45.273+08:00 level=INFO source=server.go:362 msg="Matched inference compute"="{Library:CUDA Variant: Compute:8.6 Driver:13.2 Name:CUDA0 VRAM:12.0 GiB}"
time=2026-03-10T12:06:45.273+08:00 level=INFO source=server.go:373 msg="Matched default context length" default_num_ctx=4096

time=2026-03-10T12:06:50.568+08:00 level=WARN source=ui.go:1567 msg="failed to check upstream digest" error="Head \"https://ollama.com/v2/library/llama3/manifests/8b\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" model=llama3:8b
time=2026-03-10T17:06:17.338+08:00 level=WARN source=ui.go:1567 msg="failed to check upstream digest" error="Head \"https://ollama.com/v2/library/codellama/manifests/7b\": context deadline exceeded" model=codellama:7b
time=2026-03-10T18:06:41.695+08:00 level=WARN source=ui.go:1567 msg="failed to check upstream digest" error="registry returned status 404" model=hf.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF:Q4_K_S

time=2026-03-10T18:06:33.065+08:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Users\sasha\AppData\Local\Programs\Ollama version=0.17.7 OS=Windows/10.0.26200
time=2026-03-10T18:06:33.785+08:00 level=INFO source=eventloop.go:328 msg="sent focus request to existing instance"
time=2026-03-10T18:06:33.785+08:00 level=INFO source=app_windows.go:79 msg="existing instance found, exiting"

time=2026-03-10T18:59:11.972+08:00 level=WARN source=server_windows.go:144 msg="failed to kill ollama process" pid=29720 err="exit status 1"
time=2026-03-10T19:45:08.532+08:00 level=ERROR source=server.go:201 msg="ollama exited" err="exit status 0xffffffff"
time=2026-03-10T19:45:48.253+08:00 level=ERROR source=server.go:201 msg="ollama exited" err="exit status 0xffffffff"

server.log tail

print_info: file size   = 4.33 GiB (4.64 BPW)
[GIN] 2026/03/10 - 19:43:49 | 200 | 9.2189ms | 127.0.0.1 | GET "/api/tags"

print_info: arch             = llama
print_info: n_ctx_train      = 8192
print_info: n_embd           = 4096
print_info: n_layer          = 32
print_info: model type       = 8B
print_info: model params     = 8.03 B
print_info: general.name     = Meta-Llama-3-8B-Instruct
print_info: n_vocab          = 128256

load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: offloading 32 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 33/33 layers to GPU
load_tensors:        CUDA0 model buffer size = 4155.99 MiB
load_tensors:    CUDA_Host model buffer size = 281.81 MiB

llama_context: n_ctx         = 8192
llama_context: n_batch       = 512
llama_kv_cache: CUDA0 KV buffer size = 1024.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context: CUDA0 compute buffer size = 258.50 MiB

time=2026-03-10T19:43:50.980+08:00 level=INFO source=server.go:1388 msg="llama runner started in 2.18 seconds"
time=2026-03-10T19:43:50.980+08:00 level=INFO source=sched.go:565 msg="loaded runners" count=1
time=2026-03-10T19:43:50.980+08:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
[GIN] 2026/03/10 - 19:43:53 | 200 | 5.3425132s | 127.0.0.1 | POST "/v1/chat/completions"
[GIN] 2026/03/10 - 19:44:20 | 200 | 10.3365ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/03/10 - 19:44:51 | 200 | 9.4448ms | 127.0.0.1 | GET "/api/tags"

time=2026-03-10T19:45:09.582+08:00 level=INFO source=routes.go:1658 msg="server config" env="map[OLLAMA_CONTEXT_LENGTH:32768 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\sasha\\.ollama\\models OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost ... vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_VULKAN:false]"
time=2026-03-10T19:45:09.583+08:00 level=INFO source=routes.go:1660 msg="Ollama cloud disabled: false"
time=2026-03-10T19:45:09.617+08:00 level=INFO source=routes.go:1713 msg="Listening on [::]:11434 (version 0.17.7)"
time=2026-03-10T19:45:09.619+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-10T19:45:10.442+08:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-13c2dca2-3747-fefd-49a7-ff5d74d532b9 library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="10.2 GiB"
time=2026-03-10T19:45:10.442+08:00 level=INFO source=routes.go:1763 msg="vram-based default context" total_vram="12.0 GiB" default_num_ctx=4096
[GIN] 2026/03/10 - 19:45:22 | 200 | 10.3567ms | 127.0.0.1 | GET "/api/tags"

time=2026-03-10T19:45:49.562+08:00 level=INFO source=routes.go:1658 msg="server config" env="map[OLLAMA_CONTEXT_LENGTH:32768 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\sasha\\.ollama\\models OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost ... vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_VULKAN:false]"
time=2026-03-10T19:45:49.563+08:00 level=INFO source=routes.go:1660 msg="Ollama cloud disabled: false"
time=2026-03-10T19:45:49.725+08:00 level=INFO source=routes.go:1713 msg="Listening on [::]:11434 (version 0.17.7)"
time=2026-03-10T19:45:49.727+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-10T19:45:51.464+08:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-13c2dca2-3747-fefd-49a7-ff5d74d532b9 library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="10.2 GiB"
[GIN] 2026/03/10 - 19:45:53 | 200 | 145.9882ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2026/03/10 - 19:46:24 | 200 | 12.4295ms | 127.0.0.1 | GET "/api/tags"

app log shows repeated startup with:

msg="starting Ollama"
msg="sent focus request to existing instance"
msg="existing instance found, exiting"

app log also shows failed termination and abnormal exits:

msg="failed to kill ollama process" pid=29720 err="exit status 1"
msg="ollama exited" err="exit status 0xffffffff"

server log shows the server coming back up and listening again at 19:45:09 and again at 19:45:49:

msg="Listening on [::]:11434 (version 0.17.7)"

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.17.7

Originally created by @crow8417 on GitHub (Mar 10, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14761 ### What is the issue? Summary On Windows, terminating `ollama.exe serve` does not keep Ollama offline if the Ollama desktop application is still running. The server process is automatically respawned by `ollama app.exe`. Environment - OS: Windows - Ollama installed locally via desktop app - Observed executable paths: - `C:\Users\sasha\AppData\Local\Programs\Ollama\ollama.exe` - `C:\Users\sasha\AppData\Local\Programs\Ollama\ollama app.exe` Observed behavior 1. `ollama.exe serve` was running normally. 2. I terminated the server process with PowerShell. 3. A new `ollama.exe serve` process appeared shortly afterward. 4. There was no obvious Windows service named `ollama` responsible for the restart. 5. Inspecting the respawned process showed its parent process was `ollama app.exe`. 6. The respawn stopped only after terminating both: - `ollama.exe` - `ollama app.exe` Expected behavior If the user manually terminates `ollama.exe serve`, it should be clear that the desktop app will respawn it, or there should be an obvious and controllable setting for that behavior. Actual behavior `ollama app.exe` relaunches `ollama.exe serve` automatically, which makes it difficult to intentionally take Ollama offline by stopping only the server process. Evidence - Respawned process command line: - `C:\Users\sasha\AppData\Local\Programs\Ollama\ollama.exe serve` - Parent process of respawned server: - `ollama app.exe` Impact This makes troubleshooting and controlled field testing harder because stopping the server alone is not sufficient to disable local model availability. Workaround Terminate both processes: - `ollama.exe` - `ollama app.exe` [Ollama Respawn Incident Log.txt](https://github.com/user-attachments/files/25868689/Ollama.Respawn.Incident.Log.txt) ### Relevant log output ```shell app.log tail time=2026-03-09T19:39:05.505+08:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Users\sasha\AppData\Local\Programs\Ollama version=0.17.7 OS=Windows/10.0.26200 time=2026-03-09T19:39:05.507+08:00 level=INFO source=app.go:239 msg="initialized tools registry" tool_count=0 time=2026-03-09T19:39:05.515+08:00 level=INFO source=app.go:254 msg="starting ollama server" time=2026-03-09T19:39:05.515+08:00 level=INFO source=app.go:285 msg="starting ui server" port=61705 time=2026-03-09T19:39:08.533+08:00 level=INFO source=updater.go:296 msg="beginning update checker" interval=1h0m0s time=2026-03-10T12:06:44.574+08:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Users\sasha\AppData\Local\Programs\Ollama version=0.17.7 OS=Windows/10.0.26200 time=2026-03-10T12:06:44.966+08:00 level=INFO source=eventloop.go:328 msg="sent focus request to existing instance" time=2026-03-10T12:06:44.966+08:00 level=INFO source=app_windows.go:79 msg="existing instance found, exiting" time=2026-03-10T12:06:45.189+08:00 level=INFO source=ui.go:159 msg="configuring ollama proxy" target=http://127.0.0.1:11434 time=2026-03-10T12:06:45.246+08:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/cloud http.status=200 version=0.17.7 time=2026-03-10T12:06:45.248+08:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.status=200 version=0.17.7 time=2026-03-10T12:06:45.251+08:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/chats http.status=200 version=0.17.7 time=2026-03-10T12:06:45.273+08:00 level=INFO source=server.go:362 msg="Matched inference compute"="{Library:CUDA Variant: Compute:8.6 Driver:13.2 Name:CUDA0 VRAM:12.0 GiB}" time=2026-03-10T12:06:45.273+08:00 level=INFO source=server.go:373 msg="Matched default context length" default_num_ctx=4096 time=2026-03-10T12:06:50.568+08:00 level=WARN source=ui.go:1567 msg="failed to check upstream digest" error="Head \"https://ollama.com/v2/library/llama3/manifests/8b\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" model=llama3:8b time=2026-03-10T17:06:17.338+08:00 level=WARN source=ui.go:1567 msg="failed to check upstream digest" error="Head \"https://ollama.com/v2/library/codellama/manifests/7b\": context deadline exceeded" model=codellama:7b time=2026-03-10T18:06:41.695+08:00 level=WARN source=ui.go:1567 msg="failed to check upstream digest" error="registry returned status 404" model=hf.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF:Q4_K_S time=2026-03-10T18:06:33.065+08:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Users\sasha\AppData\Local\Programs\Ollama version=0.17.7 OS=Windows/10.0.26200 time=2026-03-10T18:06:33.785+08:00 level=INFO source=eventloop.go:328 msg="sent focus request to existing instance" time=2026-03-10T18:06:33.785+08:00 level=INFO source=app_windows.go:79 msg="existing instance found, exiting" time=2026-03-10T18:59:11.972+08:00 level=WARN source=server_windows.go:144 msg="failed to kill ollama process" pid=29720 err="exit status 1" time=2026-03-10T19:45:08.532+08:00 level=ERROR source=server.go:201 msg="ollama exited" err="exit status 0xffffffff" time=2026-03-10T19:45:48.253+08:00 level=ERROR source=server.go:201 msg="ollama exited" err="exit status 0xffffffff" server.log tail print_info: file size = 4.33 GiB (4.64 BPW) [GIN] 2026/03/10 - 19:43:49 | 200 | 9.2189ms | 127.0.0.1 | GET "/api/tags" print_info: arch = llama print_info: n_ctx_train = 8192 print_info: n_embd = 4096 print_info: n_layer = 32 print_info: model type = 8B print_info: model params = 8.03 B print_info: general.name = Meta-Llama-3-8B-Instruct print_info: n_vocab = 128256 load_tensors: loading model tensors, this can take a while... (mmap = false) load_tensors: offloading 32 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 33/33 layers to GPU load_tensors: CUDA0 model buffer size = 4155.99 MiB load_tensors: CUDA_Host model buffer size = 281.81 MiB llama_context: n_ctx = 8192 llama_context: n_batch = 512 llama_kv_cache: CUDA0 KV buffer size = 1024.00 MiB llama_context: Flash Attention was auto, set to enabled llama_context: CUDA0 compute buffer size = 258.50 MiB time=2026-03-10T19:43:50.980+08:00 level=INFO source=server.go:1388 msg="llama runner started in 2.18 seconds" time=2026-03-10T19:43:50.980+08:00 level=INFO source=sched.go:565 msg="loaded runners" count=1 time=2026-03-10T19:43:50.980+08:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" [GIN] 2026/03/10 - 19:43:53 | 200 | 5.3425132s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2026/03/10 - 19:44:20 | 200 | 10.3365ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/10 - 19:44:51 | 200 | 9.4448ms | 127.0.0.1 | GET "/api/tags" time=2026-03-10T19:45:09.582+08:00 level=INFO source=routes.go:1658 msg="server config" env="map[OLLAMA_CONTEXT_LENGTH:32768 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\sasha\\.ollama\\models OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost ... vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_VULKAN:false]" time=2026-03-10T19:45:09.583+08:00 level=INFO source=routes.go:1660 msg="Ollama cloud disabled: false" time=2026-03-10T19:45:09.617+08:00 level=INFO source=routes.go:1713 msg="Listening on [::]:11434 (version 0.17.7)" time=2026-03-10T19:45:09.619+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-10T19:45:10.442+08:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-13c2dca2-3747-fefd-49a7-ff5d74d532b9 library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="10.2 GiB" time=2026-03-10T19:45:10.442+08:00 level=INFO source=routes.go:1763 msg="vram-based default context" total_vram="12.0 GiB" default_num_ctx=4096 [GIN] 2026/03/10 - 19:45:22 | 200 | 10.3567ms | 127.0.0.1 | GET "/api/tags" time=2026-03-10T19:45:49.562+08:00 level=INFO source=routes.go:1658 msg="server config" env="map[OLLAMA_CONTEXT_LENGTH:32768 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\sasha\\.ollama\\models OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost ... vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_VULKAN:false]" time=2026-03-10T19:45:49.563+08:00 level=INFO source=routes.go:1660 msg="Ollama cloud disabled: false" time=2026-03-10T19:45:49.725+08:00 level=INFO source=routes.go:1713 msg="Listening on [::]:11434 (version 0.17.7)" time=2026-03-10T19:45:49.727+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-10T19:45:51.464+08:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-13c2dca2-3747-fefd-49a7-ff5d74d532b9 library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="10.2 GiB" [GIN] 2026/03/10 - 19:45:53 | 200 | 145.9882ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/10 - 19:46:24 | 200 | 12.4295ms | 127.0.0.1 | GET "/api/tags" app log shows repeated startup with: msg="starting Ollama" msg="sent focus request to existing instance" msg="existing instance found, exiting" app log also shows failed termination and abnormal exits: msg="failed to kill ollama process" pid=29720 err="exit status 1" msg="ollama exited" err="exit status 0xffffffff" server log shows the server coming back up and listening again at 19:45:09 and again at 19:45:49: msg="Listening on [::]:11434 (version 0.17.7)" ``` ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.17.7
GiteaMirror added the bug label 2026-04-29 10:12:15 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 10, 2026):

Click on the Ollama icon in the systray and choose "Quit Ollama".

<!-- gh-comment-id:4030912957 --> @rick-github commented on GitHub (Mar 10, 2026): Click on the Ollama icon in the systray and choose "Quit Ollama".
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56055