Application Halts After 15 Failed Ping Attempts #13

Closed
opened 2025-11-19 07:11:49 -06:00 by GiteaMirror · 9 comments
Owner

Originally created by @miles7191 on GitHub (Mar 24, 2025).

Image

If the newt connector is not able to reach the host after 15 attempts the application no longer attempts to make contact with the server. This has recently been increased from 5 attempts to 15 but does not address the issue if the disconnection lasts longer than 15 attempts.

A few suggestions to rectify the problem:

  1. Remove the limit entirely and allow the application to attempt to ping forever
  2. Remove the limit and increase the delay as ping attempts are failed. IE. Pings 1-5 10 second delay, 6-10 30 second delay, 11-15 60 second delay etc.
  3. When the limit is reached exit the application. This will allow the failed connection to be managed by the wrapper systemd, docker etc.

Why is this an issue & how to recreate.

When a newt connector is behind an unreliable network connection, the max attempt number can easily be hit. In my use case the server that is running the newt connector is connected via an AT&T wireless connection that experiences many drop outs.

Originally created by @miles7191 on GitHub (Mar 24, 2025). ![Image](https://github.com/user-attachments/assets/ae7f1f73-8eb9-415d-8745-2637067dc679) If the newt connector is not able to reach the host after 15 attempts the application no longer attempts to make contact with the server. This has recently been increased from 5 attempts to 15 but does not address the issue if the disconnection lasts longer than 15 attempts. A few suggestions to rectify the problem: 1. Remove the limit entirely and allow the application to attempt to ping forever 2. Remove the limit and increase the delay as ping attempts are failed. IE. Pings 1-5 10 second delay, 6-10 30 second delay, 11-15 60 second delay etc. 3. When the limit is reached exit the application. This will allow the failed connection to be managed by the wrapper systemd, docker etc. Why is this an issue & how to recreate. When a newt connector is behind an unreliable network connection, the max attempt number can easily be hit. In my use case the server that is running the newt connector is connected via an AT&T wireless connection that experiences many drop outs.
Author
Owner

@miloschwartz commented on GitHub (Mar 24, 2025):

Newt should continue to ping in the background despite the logging. Does it not reconnect despite the server being up? In other words, are you forced to restart Newt to force it to reconnect to an online server after the 15 logged pings are up?

@miloschwartz commented on GitHub (Mar 24, 2025): Newt should continue to ping in the background despite the logging. Does it not reconnect despite the server being up? In other words, are you forced to restart Newt to force it to reconnect to an online server after the 15 logged pings are up?
Author
Owner

@miles7191 commented on GitHub (Mar 25, 2025):

That isn't the behavior I have noticed. As I haven't deployed this to production I allowed the clients to sit last week. By the end of the week 4 out of 20 sites were offline. The site I referenced for this issue experienced a disconnect at 8am and I manually restarted the service 6 hours later when I noticed it zombied. When the service was restarted it instantly connected.

@miles7191 commented on GitHub (Mar 25, 2025): That isn't the behavior I have noticed. As I haven't deployed this to production I allowed the clients to sit last week. By the end of the week 4 out of 20 sites were offline. The site I referenced for this issue experienced a disconnect at 8am and I manually restarted the service 6 hours later when I noticed it zombied. When the service was restarted it instantly connected.
Author
Owner

@oschwartz10612 commented on GitHub (Mar 25, 2025):

Hum okay I will try to test and patch as soon as possible to continue to ping

@oschwartz10612 commented on GitHub (Mar 25, 2025): Hum okay I will try to test and patch as soon as possible to continue to ping
Author
Owner

@miles7191 commented on GitHub (Mar 27, 2025):

Image
I am now seeing 11 out of 20 sites offline in a 36 hour time period. I restarted the pangolin stack to see if they may reconnect and only the clients that were showing connected initially managed to reconnect. Are there any logs I could pull to help narrow down the problem?

@miles7191 commented on GitHub (Mar 27, 2025): ![Image](https://github.com/user-attachments/assets/9c4c0874-259b-43a7-9806-374fb95d85f9) I am now seeing 11 out of 20 sites offline in a 36 hour time period. I restarted the pangolin stack to see if they may reconnect and only the clients that were showing connected initially managed to reconnect. Are there any logs I could pull to help narrow down the problem?
Author
Owner

@carsten-re commented on GitHub (Mar 29, 2025):

Hi,
I'd today the same issue. Pangolin server was unresponsive and I had to force reboot the Azure instance. My two (out of two) newt docker sites stopped working and I had to restart newt container.
Here is my latest log before I restarted newt site (IP and domain is obfuscated):

ERROR: 2025/03/29 08:45:09 Ping failed: use of closed network connection
ERROR: 2025/03/29 08:45:09 Failed to connect: failed to get token: failed to request new token: Post "https://host.domain.tld/api/v1/auth/newt/get-token": dial tcp 1.2.3.4:443: connect: connection refused. Retrying in 10s...
ERROR: 2025/03/29 08:45:19 Failed to connect: failed to get token: failed to request new token: Post "https://host.domain.tld/api/v1/auth/newt/get-token": dial tcp 1.2.3.4:443: connect: connection refused. Retrying in 10s...
ERROR: 2025/03/29 08:45:29 Failed to connect: failed to get token: failed to request new token: Post "https://host.domain.tld/api/v1/auth/newt/get-token": EOF. Retrying in 10s...
INFO: 2025/03/29 08:45:41 Sent registration message
INFO: 2025/03/29 08:45:41 Received registration message
INFO: 2025/03/29 08:45:41 Already connected! But I will send a ping anyway...
INFO: 2025/03/29 08:45:41 Ping attempt 1 of 15
INFO: 2025/03/29 08:45:41 Pinging 100.89.128.1
WARN: 2025/03/29 08:45:51 Ping attempt 1 failed: failed to read ICMP packet: i/o timeout
INFO: 2025/03/29 08:45:53 Ping attempt 2 of 15
INFO: 2025/03/29 08:45:53 Pinging 100.89.128.1
INFO: 2025/03/29 08:45:53 Ping latency: 45.049434ms

A healthcheck or something like a automatic restart would be fine, or?

@carsten-re commented on GitHub (Mar 29, 2025): Hi, I'd today the same issue. Pangolin server was unresponsive and I had to force reboot the Azure instance. My two (out of two) newt docker sites stopped working and I had to restart newt container. Here is my latest log before I restarted newt site (IP and domain is obfuscated): ``` ERROR: 2025/03/29 08:45:09 Ping failed: use of closed network connection ERROR: 2025/03/29 08:45:09 Failed to connect: failed to get token: failed to request new token: Post "https://host.domain.tld/api/v1/auth/newt/get-token": dial tcp 1.2.3.4:443: connect: connection refused. Retrying in 10s... ERROR: 2025/03/29 08:45:19 Failed to connect: failed to get token: failed to request new token: Post "https://host.domain.tld/api/v1/auth/newt/get-token": dial tcp 1.2.3.4:443: connect: connection refused. Retrying in 10s... ERROR: 2025/03/29 08:45:29 Failed to connect: failed to get token: failed to request new token: Post "https://host.domain.tld/api/v1/auth/newt/get-token": EOF. Retrying in 10s... INFO: 2025/03/29 08:45:41 Sent registration message INFO: 2025/03/29 08:45:41 Received registration message INFO: 2025/03/29 08:45:41 Already connected! But I will send a ping anyway... INFO: 2025/03/29 08:45:41 Ping attempt 1 of 15 INFO: 2025/03/29 08:45:41 Pinging 100.89.128.1 WARN: 2025/03/29 08:45:51 Ping attempt 1 failed: failed to read ICMP packet: i/o timeout INFO: 2025/03/29 08:45:53 Ping attempt 2 of 15 INFO: 2025/03/29 08:45:53 Pinging 100.89.128.1 INFO: 2025/03/29 08:45:53 Ping latency: 45.049434ms ``` A healthcheck or something like a automatic restart would be fine, or?
Author
Owner

@AleksCee commented on GitHub (Mar 29, 2025):

I try the restart now with a little cronjob running every 5 minutes:

if [ $(docker compose logs --since 6m newt|grep -c 'all ping attempts failed after 15 tries') -ne 0 ]; then
        docker compose restart newt
fi 
@AleksCee commented on GitHub (Mar 29, 2025): I try the restart now with a little cronjob running every 5 minutes: ``` if [ $(docker compose logs --since 6m newt|grep -c 'all ping attempts failed after 15 tries') -ne 0 ]; then docker compose restart newt fi ```
Author
Owner

@pizzaandcheese commented on GitHub (Sep 26, 2025):

This is still happening for me.

I am using Newt inside a podman container, i have some containers that use a mixture of host networking and docker networking and both encounter the same issue. But it doesn't seem to be all my containers that have the issue as I have some that come up perfectly fine after Pangolin reboots.

@pizzaandcheese commented on GitHub (Sep 26, 2025): This is still happening for me. I am using Newt inside a podman container, i have some containers that use a mixture of host networking and docker networking and both encounter the same issue. But it doesn't seem to be all my containers that have the issue as I have some that come up perfectly fine after Pangolin reboots.
Author
Owner

@oschwartz10612 commented on GitHub (Sep 26, 2025):

Just checking: are you on the latest newt?

Is this a fresh install or it dies after some time randomly?

@oschwartz10612 commented on GitHub (Sep 26, 2025): Just checking: are you on the latest newt? Is this a fresh install or it dies after some time randomly?
Author
Owner

@pizzaandcheese commented on GitHub (Oct 9, 2025):

Most are on 1.5.1 with a few on 1.4.2.

None of mine are fresh and it seems to drop connection whenever Gerbil/Pangolin restarts

@pizzaandcheese commented on GitHub (Oct 9, 2025): Most are on 1.5.1 with a few on 1.4.2. None of mine are fresh and it seems to drop connection whenever Gerbil/Pangolin restarts
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/newt#13