mirror of
https://github.com/fosrl/pangolin.git
synced 2026-05-21 09:21:15 -05:00
[GH-ISSUE #2700] Health check status not invalidated when Newt site goes offline #4153
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @strausmann on GitHub (Mar 24, 2026).
Original GitHub issue: https://github.com/fosrl/pangolin/issues/2700
Description
When a Newt agent disconnects (site goes offline), the health check status of all targets routed through that site remains "healthy" in the dashboard. Pangolin correctly detects the site as offline, but does not invalidate the cached health check results for targets on that site.
This causes Pangolin to continue routing traffic to targets through a dead tunnel, resulting in timeouts for users.
Steps to Reproduce
docker stop pangolin-newt)Expected Behavior
When a site goes offline, all targets routed through that site should immediately transition to "unhealthy" or "unknown" status. Pangolin should not route traffic to targets on offline sites.
Actual Behavior
Root Cause Analysis
Based on log analysis:
newt/disconnectingmessage type throws an exception instead of triggering state cleanup:Environment
Suggested Fix
When a Newt disconnect is detected:
"unknown"or"unhealthy"newt/disconnectingmessage type (currently throws exception)@strausmann commented on GitHub (Mar 24, 2026):
Code Analysis — Root Cause Identified
After analyzing the source code, the root cause is clear:
Health Check Flow
Health checks run on the Newt agent (remote), not on the Pangolin server. The Newt performs HTTP checks against the target and reports status back via WebSocket to
server/routers/target/handleHealthcheckStatusMessage.ts, which updatestargetHealthCheck.hcHealthin the database.The Bug: Three disconnect paths, none invalidate HC status
When a Newt disconnects, three code paths handle the cleanup — but none of them reset target health check status:
Mitigation in Traefik Config (partial)
`server/lib/traefik/getTraefikConfig.ts` (L500) does filter out targets from offline sites when generating Traefik config — but only if at least one other site for that resource is online. This means:
The Dashboard Problem
Even with the Traefik mitigation, the dashboard always shows the stale DB value. Users see green "healthy" badges for targets on an offline site, which is misleading.
Suggested Fix
In each of the three disconnect handlers, add a query to reset health check status:
```typescript
// After setting sites.online = false:
await db.update(targetHealthCheck)
.set({ hcHealth: "unknown" })
.where(
inArray(
targetHealthCheck.targetId,
db.select({ id: targets.targetId })
.from(targets)
.where(eq(targets.siteId, siteId))
)
);
```
This ensures targets transition to "unknown" immediately when their Newt disconnects, and naturally recover to "healthy" when the Newt reconnects and health checks resume.