[GH-ISSUE #2675] Health check shows Healthy while site is Offline #7002

Closed
opened 2026-04-25 15:59:37 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @karol-exe on GitHub (Mar 19, 2026).
Original GitHub issue: https://github.com/fosrl/pangolin/issues/2675

Describe the Bug

Bug: Health check shows Healthy while Site is Offline
Product: Pangolin
Edition: Community Edition
Version: v1.16.2 (self‑hosted)
Deployment: Docker on Linux (Debian)
Summary
There is an inconsistency between the Site status and the Target health check.
When a Site is shown as Offline in Manage Sites, the corresponding backend target in the proxy still reports Health Check = Healthy, which allows traffic to be routed to an Offline site and breaks deterministic failover.
Screenshots

Manage Sites — site pve-garaz is Offline.
(screenshot attached)
Proxy → Targets — the same backend (pve-garaz → http://10.48.1.48:5000) shows Health Check = Healthy.
(screenshot attached)

Additional Notes / Hypothesis
It looks like health checks are evaluated independently of Site connectivity, and the routing layer doesn’t reconcile Site Offline status with target health. A precedence rule like “Site status overrides target health” (or a suppression flag) would prevent this inconsistency.

Image Image

Environment

  • OS Type & Version: Debian 6.12.74-2 (2026-03-08) x86_64
  • Pangolin Version:v1.16.2 (self‑hosted)
  • Gerbil Version:1.3.0
  • Traefik Version:3.6
  • Newt Version:v1.10.1,v1.10.2

To Reproduce

Steps to Reproduce

Have two targets configured for a route (primary + fallback) with explicit priorities, e.g.:

pve-garaz → http://10.48.1.48:5000 (Priority: 100)
pangolin host → http://217.182.78.173:8080 (Priority: 200)

Ensure the site pve-garaz goes Offline (visible in Manage Sites).
Open Proxy → Targets for the route.
Observe that pve-garaz target still displays Healthy.

Actual Behavior

Site status: Offline in Manage Sites.
Target health: Healthy in Proxy → Targets.
Routing logic keeps considering the Offline site as eligible (since it’s marked Healthy), which can override the intended priority‑based failover.

At minimum, the UI should clearly indicate that a target is “healthy but suppressed due to Site Offline” so operators are not misled.

Why this matters

Presents contradictory information in the UI (Offline vs. Healthy).
Breaks deterministic, priority‑based routing and failover.
Can route traffic to a non‑reachable environment.

Expected Behavior

If a Site is Offline, its targets should:

be marked Unhealthy, or
be excluded from routing decisions regardless of health‑check result.

Originally created by @karol-exe on GitHub (Mar 19, 2026). Original GitHub issue: https://github.com/fosrl/pangolin/issues/2675 ### Describe the Bug Bug: Health check shows Healthy while Site is Offline Product: Pangolin Edition: Community Edition Version: v1.16.2 (self‑hosted) Deployment: Docker on Linux (Debian) Summary There is an inconsistency between the Site status and the Target health check. When a Site is shown as Offline in Manage Sites, the corresponding backend target in the proxy still reports Health Check = Healthy, which allows traffic to be routed to an Offline site and breaks deterministic failover. Screenshots Manage Sites — site pve-garaz is Offline. (screenshot attached) Proxy → Targets — the same backend (pve-garaz → http://10.48.1.48:5000) shows Health Check = Healthy. (screenshot attached) Additional Notes / Hypothesis It looks like health checks are evaluated independently of Site connectivity, and the routing layer doesn’t reconcile Site Offline status with target health. A precedence rule like “Site status overrides target health” (or a suppression flag) would prevent this inconsistency. <img width="1800" height="911" alt="Image" src="https://github.com/user-attachments/assets/72c462f6-20be-4bd0-86df-d0f277fd5868" /> <img width="1749" height="222" alt="Image" src="https://github.com/user-attachments/assets/0481a710-22e7-429d-a767-25b53519d6d4" /> ### Environment - OS Type & Version: Debian 6.12.74-2 (2026-03-08) x86_64 - Pangolin Version:v1.16.2 (self‑hosted) - Gerbil Version:1.3.0 - Traefik Version:3.6 - Newt Version:v1.10.1,v1.10.2 ### To Reproduce Steps to Reproduce Have two targets configured for a route (primary + fallback) with explicit priorities, e.g.: pve-garaz → http://10.48.1.48:5000 (Priority: 100) pangolin host → http://217.182.78.173:8080 (Priority: 200) Ensure the site pve-garaz goes Offline (visible in Manage Sites). Open Proxy → Targets for the route. Observe that pve-garaz target still displays Healthy. Actual Behavior Site status: Offline in Manage Sites. Target health: Healthy in Proxy → Targets. Routing logic keeps considering the Offline site as eligible (since it’s marked Healthy), which can override the intended priority‑based failover. At minimum, the UI should clearly indicate that a target is “healthy but suppressed due to Site Offline” so operators are not misled. Why this matters Presents contradictory information in the UI (Offline vs. Healthy). Breaks deterministic, priority‑based routing and failover. Can route traffic to a non‑reachable environment. ### Expected Behavior If a Site is Offline, its targets should: be marked Unhealthy, or be excluded from routing decisions regardless of health‑check result.
Author
Owner

@Akhilesh29 commented on GitHub (Mar 19, 2026):

root cause analysis (not a bug report, just adding context)
after digging into the codebase, the issue comes down to two completely independent systems that never reconcile with each other:

-->site online/offline status — tracked in pangolin's control plane based on tunnel/newt connectivity
-->target health check — runs separately inside newt and reports back via the hcHealth field in targetHealthCheck table

there's no logic that bridges these two. so when a site goes offline, the health check just retains its last cached Healthy result — because nothing tells it to invalidate or override.

suggested fix direction: when site status transitions to Offline, either:

-->force hcHealth = false on all associated targets, or
-->add a precedence rule in the routing layer: site offline → treat target as unhealthy regardless of cached health result

this would fix both the misleading UI and the broken failover behavior.

happy to raise a pr for this if the maintainers are okay with the approach? @oschwartz10612 @miloschwartz

<!-- gh-comment-id:4090479626 --> @Akhilesh29 commented on GitHub (Mar 19, 2026): root cause analysis (not a bug report, just adding context) after digging into the codebase, the issue comes down to two completely independent systems that never reconcile with each other: -->site online/offline status — tracked in pangolin's control plane based on tunnel/newt connectivity -->target health check — runs separately inside newt and reports back via the hcHealth field in targetHealthCheck table there's no logic that bridges these two. so when a site goes offline, the health check just retains its last cached Healthy result — because nothing tells it to invalidate or override. suggested fix direction: when site status transitions to Offline, either: -->force hcHealth = false on all associated targets, or -->add a precedence rule in the routing layer: site offline → treat target as unhealthy regardless of cached health result this would fix both the misleading UI and the broken failover behavior. happy to raise a pr for this if the maintainers are okay with the approach? @oschwartz10612 @miloschwartz
Author
Owner

@AstralDestiny commented on GitHub (Mar 21, 2026):

Sounds good.

<!-- gh-comment-id:4103980992 --> @AstralDestiny commented on GitHub (Mar 21, 2026): Sounds good.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/pangolin#7002