[GH-ISSUE #14364] Model stuck in 'Stopping...' state indefinitely with active connections #55846

New Issue

GiteaMirror · 2026-04-29T09:48:12-05:00

GiteaMirror commented

2026-04-29 09:48:12 -05:00

Originally created by @JRMeyer on GitHub (Feb 22, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14364

Description

ollama ps shows a model permanently stuck in Stopping... state. The model never finishes unloading and never becomes available again. The Ollama process holds the port but launchd restart attempts fail with bind: address already in use.

What triggered it

Two independent Python batch processes (8 workers total) sending concurrent chat requests to the same model. The second batch connected while the first was already active. Shortly after, the model entered Stopping... and never recovered.

Observed behavior

$ ollama ps
NAME            ID              SIZE     PROCESSOR    CONTEXT    UNTIL
gpt-oss:120b    a951a23b46a1    89 GB    100% GPU     131072     Stopping...

This state persists indefinitely (tested for 10+ minutes). The original Ollama process (PID 37884) still holds port 11434 with established connections from both batch processes. Meanwhile, launchd keeps trying to restart Ollama but fails:

Error: listen tcp 127.0.0.1:11434: bind: address already in use

(repeated hundreds of times in /opt/homebrew/var/log/ollama.log)

The batch processes stay alive but are blocked waiting on Ollama responses.

Expected behavior

The model should either:

Stay loaded and serve requests, or
Unload cleanly and reload, without getting stuck in a permanent Stopping... state

Steps to reproduce

Start Ollama with OLLAMA_NUM_PARALLEL=8
Launch a batch process with 4 concurrent workers sending chat requests
Wait for model to load and begin serving
Launch a second batch process with 4 more concurrent workers to the same model
Observe ollama ps — model enters Stopping... and never recovers

Environment

Ollama version: 0.13.3
OS: macOS 26.1 (build 25B78)
Hardware: Mac Studio, Apple M4 Max, 128 GB unified memory
Model: gpt-oss:120b (65 GB weights, 89 GB loaded)
Ollama config:
- OLLAMA_NUM_PARALLEL=8
- OLLAMA_KV_CACHE_TYPE=q8_0
- OLLAMA_FLASH_ATTENTION=1
Managed by: launchd (homebrew.mxcl.ollama)

Workaround

Kill Ollama manually and restart. The model reloads fine on fresh start.

Originally created by @JRMeyer on GitHub (Feb 22, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14364 ## Description `ollama ps` shows a model permanently stuck in `Stopping...` state. The model never finishes unloading and never becomes available again. The Ollama process holds the port but launchd restart attempts fail with `bind: address already in use`. ## What triggered it Two independent Python batch processes (8 workers total) sending concurrent chat requests to the same model. The second batch connected while the first was already active. Shortly after, the model entered `Stopping...` and never recovered. ## Observed behavior ``` $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL gpt-oss:120b a951a23b46a1 89 GB 100% GPU 131072 Stopping... ``` This state persists indefinitely (tested for 10+ minutes). The original Ollama process (PID 37884) still holds port 11434 with established connections from both batch processes. Meanwhile, launchd keeps trying to restart Ollama but fails: ``` Error: listen tcp 127.0.0.1:11434: bind: address already in use ``` (repeated hundreds of times in `/opt/homebrew/var/log/ollama.log`) The batch processes stay alive but are blocked waiting on Ollama responses. ## Expected behavior The model should either: - Stay loaded and serve requests, or - Unload cleanly and reload, without getting stuck in a permanent `Stopping...` state ## Steps to reproduce 1. Start Ollama with `OLLAMA_NUM_PARALLEL=8` 2. Launch a batch process with 4 concurrent workers sending chat requests 3. Wait for model to load and begin serving 4. Launch a second batch process with 4 more concurrent workers to the same model 5. Observe `ollama ps` — model enters `Stopping...` and never recovers ## Environment - **Ollama version**: 0.13.3 - **OS**: macOS 26.1 (build 25B78) - **Hardware**: Mac Studio, Apple M4 Max, 128 GB unified memory - **Model**: `gpt-oss:120b` (65 GB weights, 89 GB loaded) - **Ollama config**: - `OLLAMA_NUM_PARALLEL=8` - `OLLAMA_KV_CACHE_TYPE=q8_0` - `OLLAMA_FLASH_ATTENTION=1` - **Managed by**: launchd (`homebrew.mxcl.ollama`) ## Workaround Kill Ollama manually and restart. The model reloads fine on fresh start.

GiteaMirror closed this issue

2026-04-29 09:48:12 -05:00

GiteaMirror commented

2026-04-29 09:48:13 -05:00

@JRMeyer commented on GitHub (Feb 22, 2026):

Investigation Findings

After deeper investigation, the situation is more nuanced than originally reported. Here are the full findings:

1. `OLLAMA_NUM_PARALLEL` was actually 1, not 8

The running Ollama instance (PID 37884, uptime 19+ hours) was started from an older version of the launchd plist that did not include OLLAMA_NUM_PARALLEL. The startup log confirms:

routes.go:1554 msg="server config" env="map[...OLLAMA_NUM_PARALLEL:1...]"

So despite the plist now specifying OLLAMA_NUM_PARALLEL=8, the live process was running with the default of 1 parallel slot. Two batch processes (8 workers total) were hitting a single-slot instance.

2. API is partially responsive

Read-only endpoints respond immediately while in "Stopping..." state:

GET / → "Ollama is running" ✅
GET /api/ps → returns model info ✅
GET /api/tags → returns model list ✅
GET /api/version → returns 0.13.3 ✅
POST /api/chat → hangs indefinitely (TCP connects, 0 bytes response)

3. `expires_at` is continuously refreshed

Despite showing "Stopping...", the expires_at field in /api/ps is refreshed to current wall-clock time every ~2 seconds:

10:53:44 → 2026-02-22T10:53:44.287
10:53:46 → 2026-02-22T10:53:46.338
10:53:48 → 2026-02-22T10:53:48.387

Something is actively refreshing the keep-alive timer.

4. Runner process is alive and using CPU

The runner child process (ollama runner --ollama-engine --model ... --port 55342) is in RN (running) state with fluctuating CPU (0.2-21.2%) and 86.3 GB RSS. It's not hung at the OS level. Its internal TCP connections to the Ollama main process are dynamically changing between observations.

5. Runner was restarted at least once

Startup log shows initial runner on port 49397, but the current runner listens on port 55342. The model was unloaded/reloaded at some point during the 19-hour session — before the "Stopping..." state occurred.

6. One client died, leaving a stale connection

Two batch processes were started. Batch B died immediately after startup (never completed health check). Ollama main still has a CLOSED socket on fd 20 to port 55489, which was the dead client's connection. Batch A (4 workers, 4 ESTABLISHED connections) is alive but frozen — its log stopped growing and all threads are sleeping at 0% CPU, waiting on Ollama.

7. launchd has lost track of the process

active count = 0           # launchd thinks nothing is running
state = spawn scheduled    # trying to spawn another
runs = 5,652               # total spawn attempts
last exit code = 1

PID 37884 has PPID=1 but launchd's active count = 0. launchd continuously respawns new instances (~1 every 6 seconds due to KeepAlive=true), all failing with bind: address already in use. This has been going on for hours, adding ~5,852 error lines to the log.

8. Log file is 99.99% crash noise

The 180,768-line, 6.6 MB log contains:

~15,903 panic: $HOME is not defined entries (historical, from before the plist had HOME set)
7 lines of actual operational log from the running instance
~5,852 bind: address already in use errors (ongoing launchd respawn failures)

There is zero log output from Ollama between startup and the current state — no scheduler decisions, no model load/unload messages, no error about the "Stopping..." transition.

9. System resources are fine

Memory: 86 GB used by model, 1 GB free + 18 GB inactive, 0 swap
Disk: 755 GB available
Load: 0.97
No resource exhaustion

Environment (corrected)

Ollama version: 0.13.3 (latest available: 0.16.3, 3 versions behind)
Effective OLLAMA_NUM_PARALLEL: 1 (not 8 as originally reported)
Everything else unchanged from original report

Summary of observable state

The model is in a limbo where:

The runner process is alive and doing something (variable CPU, changing sockets)
Read-only API works, inference API hangs
Keep-alive timer is actively refreshed
ollama ps reports "Stopping..." but the model never finishes stopping
One client has a stale CLOSED connection from a process that died
The surviving client's 4 connections are established but getting no responses

@JRMeyer commented on GitHub (Feb 22, 2026): ## Investigation Findings After deeper investigation, the situation is more nuanced than originally reported. Here are the full findings: --- ### 1. `OLLAMA_NUM_PARALLEL` was actually 1, not 8 The running Ollama instance (PID 37884, uptime 19+ hours) was started from an **older version of the launchd plist** that did not include `OLLAMA_NUM_PARALLEL`. The startup log confirms: ``` routes.go:1554 msg="server config" env="map[...OLLAMA_NUM_PARALLEL:1...]" ``` So despite the plist now specifying `OLLAMA_NUM_PARALLEL=8`, the live process was running with the default of 1 parallel slot. Two batch processes (8 workers total) were hitting a single-slot instance. ### 2. API is partially responsive Read-only endpoints respond immediately while in "Stopping..." state: - `GET /` → "Ollama is running" ✅ - `GET /api/ps` → returns model info ✅ - `GET /api/tags` → returns model list ✅ - `GET /api/version` → returns 0.13.3 ✅ - `POST /api/chat` → **hangs indefinitely** (TCP connects, 0 bytes response) ### 3. `expires_at` is continuously refreshed Despite showing "Stopping...", the `expires_at` field in `/api/ps` is refreshed to current wall-clock time every ~2 seconds: ``` 10:53:44 → 2026-02-22T10:53:44.287 10:53:46 → 2026-02-22T10:53:46.338 10:53:48 → 2026-02-22T10:53:48.387 ``` Something is actively refreshing the keep-alive timer. ### 4. Runner process is alive and using CPU The runner child process (`ollama runner --ollama-engine --model ... --port 55342`) is in `RN` (running) state with fluctuating CPU (0.2-21.2%) and 86.3 GB RSS. It's not hung at the OS level. Its internal TCP connections to the Ollama main process are dynamically changing between observations. ### 5. Runner was restarted at least once Startup log shows initial runner on port **49397**, but the current runner listens on port **55342**. The model was unloaded/reloaded at some point during the 19-hour session — before the "Stopping..." state occurred. ### 6. One client died, leaving a stale connection Two batch processes were started. Batch B died immediately after startup (never completed health check). Ollama main still has a `CLOSED` socket on fd 20 to port 55489, which was the dead client's connection. Batch A (4 workers, 4 ESTABLISHED connections) is alive but frozen — its log stopped growing and all threads are sleeping at 0% CPU, waiting on Ollama. ### 7. launchd has lost track of the process ``` active count = 0 # launchd thinks nothing is running state = spawn scheduled # trying to spawn another runs = 5,652 # total spawn attempts last exit code = 1 ``` PID 37884 has PPID=1 but launchd's `active count = 0`. launchd continuously respawns new instances (~1 every 6 seconds due to `KeepAlive=true`), all failing with `bind: address already in use`. This has been going on for hours, adding ~5,852 error lines to the log. ### 8. Log file is 99.99% crash noise The 180,768-line, 6.6 MB log contains: - ~15,903 `panic: $HOME is not defined` entries (historical, from before the plist had HOME set) - **7 lines** of actual operational log from the running instance - ~5,852 `bind: address already in use` errors (ongoing launchd respawn failures) There is zero log output from Ollama between startup and the current state — no scheduler decisions, no model load/unload messages, no error about the "Stopping..." transition. ### 9. System resources are fine - Memory: 86 GB used by model, 1 GB free + 18 GB inactive, 0 swap - Disk: 755 GB available - Load: 0.97 - No resource exhaustion ### Environment (corrected) - **Ollama version**: 0.13.3 (latest available: 0.16.3, 3 versions behind) - **Effective OLLAMA_NUM_PARALLEL**: 1 (not 8 as originally reported) - Everything else unchanged from original report ### Summary of observable state The model is in a limbo where: 1. The runner process is alive and doing *something* (variable CPU, changing sockets) 2. Read-only API works, inference API hangs 3. Keep-alive timer is actively refreshed 4. `ollama ps` reports "Stopping..." but the model never finishes stopping 5. One client has a stale CLOSED connection from a process that died 6. The surviving client's 4 connections are established but getting no responses

GiteaMirror commented

2026-04-29 09:48:13 -05:00

@rick-github commented on GitHub (Feb 22, 2026):

The Stopping state indicates that the server wanted to unload the model. This usually happens for a model change or when model parameters (eg, num_ctx) are changed by the client. Since the state persists, the unload didn't complete, indicating the model runner got wedged. Just killing the runner (ps w -u ollama | grep runner to identify the runner) should restore function.

One reason for an unload to not complete is that the runner is still processing a request. Since it's still using CPU this seems the likely reason. It could be that the model ran out of context space, or the parallel requests caused context corruption, and the model runner has lost coherence and is stuck in a loop.

There have been previous instances of this bug that have been fixed, so upgrading to a more recent release may help.

@rick-github commented on GitHub (Feb 22, 2026): The `Stopping` state indicates that the server wanted to unload the model. This usually happens for a model change or when model parameters (eg, `num_ctx`) are changed by the client. Since the state persists, the unload didn't complete, indicating the model runner got wedged. Just killing the runner (`ps w -u ollama | grep runner` to identify the runner) should restore function. One reason for an unload to not complete is that the runner is still processing a request. Since it's still using CPU this seems the likely reason. It could be that the model ran out of context space, or the parallel requests caused context corruption, and the model runner has lost coherence and is stuck in a loop. There have been previous instances of this bug that have been fixed, so upgrading to a more recent release may help.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/opencode-image-modality

hoyyeva/anthropic-renderer-local-image-path

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#55846

[GH-ISSUE #14364] Model stuck in 'Stopping...' state indefinitely with active connections #55846

Description

What triggered it

Observed behavior

Expected behavior

Steps to reproduce

Environment

Workaround

Investigation Findings

1. OLLAMA_NUM_PARALLEL was actually 1, not 8

2. API is partially responsive

3. expires_at is continuously refreshed

4. Runner process is alive and using CPU

5. Runner was restarted at least once

6. One client died, leaving a stale connection

7. launchd has lost track of the process

8. Log file is 99.99% crash noise

9. System resources are fine

Environment (corrected)

Summary of observable state

1. `OLLAMA_NUM_PARALLEL` was actually 1, not 8

3. `expires_at` is continuously refreshed