[GH-ISSUE #238] Newt ending using 100% resources after DNS failure on network #2057

Closed
opened 2026-05-03 05:45:44 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @KusAdama on GitHub (Feb 17, 2026).
Original GitHub issue: https://github.com/fosrl/newt/issues/238

Describe the Bug

Within few days I lost partially connectivity and so new connections being impossible to made because of network blocking DNS resolver of my choice.

While debugging what is working and what not (already established connections does persist so Newt provided networks from within my network remain active from outside as they already exist) I restarted the DNS on router and changed DNS severall times.

And it does happen that the Newt can't recover and stay stuck, taking all CPU resources. Same behavior on 2 machines. I noticed because both machines start spinning fans on high noise.

The WireGuard tunnels connected automatically all of them, only all Newt I have stuck like this and needed manual restart each one of them.

In all logs I have can see only two lines (x 10):

ERROR: 2026/02/15 22:23:26 Failed to resolve endpoint: DNS lookup failed: lookup sub.domain.xyz on 127.0.0.11:53: read udp 127.0.0.1:37338->127.0.0.11:53: i/o timeout
INFO: 2026/02/15 22:23:26 Connecting to endpoint: sub.domain.xyz

And nothing more, Newt on 100% resources.

Or:

INFO: 2026/02/17 20:44:27 Connecting to endpoint: sub.domain.xyz
ERROR: 2026/02/17 20:44:27 Failed to resolve endpoint: DNS lookup failed: lookup sub.domain.xyz on 127.0.0.11:53: server misbehaving

And nothing more, Newt on 100% resources.

Image

One machine does have 2 networks with Newt and second is with one.

Seems to me the Newt stop logging or there is some counter for attempts to be made - and it stop doing them after a while?

Environment

  • OS Type & Version: (e.g., Ubuntu 22.04) - 2x Synology DSM / uname -a 4.4.302+ x86_64 GNU/Linux
  • Newt Version: 1.9.0 in Docker

To Reproduce

I don't know precisely how I did that. Just as switching between working and not working DNS state within my network it does happen 2x on both machines.

Expected Behavior

Will reconnect automatically, not stall and cosumming all CPU resources.

Originally created by @KusAdama on GitHub (Feb 17, 2026). Original GitHub issue: https://github.com/fosrl/newt/issues/238 ### Describe the Bug Within few days I lost partially connectivity and so new connections being impossible to made because of network blocking DNS resolver of my choice. While debugging what is working and what not (already established connections does persist so Newt provided networks from within my network remain active from outside as they already exist) I restarted the DNS on router and changed DNS severall times. And it does happen that the Newt can't recover and stay stuck, taking all CPU resources. Same behavior on 2 machines. I noticed because both machines start spinning fans on high noise. The WireGuard tunnels connected automatically all of them, only all Newt I have stuck like this and needed manual restart each one of them. In all logs I have can see only two lines (x 10): ERROR: 2026/02/15 22:23:26 Failed to resolve endpoint: DNS lookup failed: lookup sub.domain.xyz on 127.0.0.11:53: read udp 127.0.0.1:37338->127.0.0.11:53: i/o timeout INFO: 2026/02/15 22:23:26 Connecting to endpoint: sub.domain.xyz And nothing more, Newt on 100% resources. Or: INFO: 2026/02/17 20:44:27 Connecting to endpoint: sub.domain.xyz ERROR: 2026/02/17 20:44:27 Failed to resolve endpoint: DNS lookup failed: lookup sub.domain.xyz on 127.0.0.11:53: server misbehaving And nothing more, Newt on 100% resources. <img width="1028" height="227" alt="Image" src="https://github.com/user-attachments/assets/6d553ff8-eb0a-4c7b-be19-a40210ce0a6b" /> One machine does have 2 networks with Newt and second is with one. Seems to me the Newt stop logging or there is some counter for attempts to be made - and it stop doing them after a while? ### Environment - OS Type & Version: (e.g., Ubuntu 22.04) - 2x Synology DSM / uname -a 4.4.302+ x86_64 GNU/Linux - Newt Version: 1.9.0 in Docker ### To Reproduce I don't know precisely how I did that. Just as switching between working and not working DNS state within my network it does happen 2x on both machines. ### Expected Behavior Will reconnect automatically, not stall and cosumming all CPU resources.
GiteaMirror added the stale label 2026-05-03 05:45:44 -05:00
Author
Owner

@github-actions[bot] commented on GitHub (Mar 4, 2026):

This issue has been automatically marked as stale due to 14 days of inactivity. It will be closed in 14 days if no further activity occurs.

<!-- gh-comment-id:3994471097 --> @github-actions[bot] commented on GitHub (Mar 4, 2026): This issue has been automatically marked as stale due to 14 days of inactivity. It will be closed in 14 days if no further activity occurs.
Author
Owner

@github-actions[bot] commented on GitHub (Mar 22, 2026):

This issue has been automatically marked as stale due to 14 days of inactivity. It will be closed in 14 days if no further activity occurs.

<!-- gh-comment-id:4105032480 --> @github-actions[bot] commented on GitHub (Mar 22, 2026): This issue has been automatically marked as stale due to 14 days of inactivity. It will be closed in 14 days if no further activity occurs.
Author
Owner

@strausmann commented on GitHub (Mar 27, 2026):

We have additional data points that may be related to this issue. In our environment, we observed 234% CPU on a Newt instance that was also experiencing a TCP connection leak (details in #268).

The high CPU correlates with:

  • 14,447 accumulated TCP connections (mostly FIN-WAIT-2)
  • 3,590 open file descriptors
  • Continuous DNS resolution attempts (related to PR #277)

After restarting Newt, CPU immediately dropped to <3%. This suggests the CPU spike is not caused by DNS failures alone, but also by the goroutine overhead of managing thousands of leaked TCP connections in proxy/manager.go (handleTCPProxy).

In the current code, handleTCPProxy uses net.Dial("tcp", targetAddr) without a timeout and then calls io.Copy without any read/write deadline. If the remote end holds the connection open (common with SMTP, SSH, or any long-lived protocol), the goroutines and file descriptors accumulate indefinitely.

This is a separate but compounding issue to the DNS-related CPU spike you reported — both contribute to resource exhaustion under failure conditions.

<!-- gh-comment-id:4145202935 --> @strausmann commented on GitHub (Mar 27, 2026): We have additional data points that may be related to this issue. In our environment, we observed **234% CPU** on a Newt instance that was also experiencing a TCP connection leak (details in #268). The high CPU correlates with: - 14,447 accumulated TCP connections (mostly FIN-WAIT-2) - 3,590 open file descriptors - Continuous DNS resolution attempts (related to PR #277) After restarting Newt, CPU immediately dropped to <3%. This suggests the CPU spike is not caused by DNS failures alone, but also by the goroutine overhead of managing thousands of leaked TCP connections in `proxy/manager.go` (`handleTCPProxy`). In the current code, `handleTCPProxy` uses `net.Dial("tcp", targetAddr)` without a timeout and then calls `io.Copy` without any read/write deadline. If the remote end holds the connection open (common with SMTP, SSH, or any long-lived protocol), the goroutines and file descriptors accumulate indefinitely. This is a separate but compounding issue to the DNS-related CPU spike you reported — both contribute to resource exhaustion under failure conditions.
Author
Owner

@github-actions[bot] commented on GitHub (Apr 11, 2026):

This issue has been automatically marked as stale due to 14 days of inactivity. It will be closed in 14 days if no further activity occurs.

<!-- gh-comment-id:4227496510 --> @github-actions[bot] commented on GitHub (Apr 11, 2026): This issue has been automatically marked as stale due to 14 days of inactivity. It will be closed in 14 days if no further activity occurs.
Author
Owner

@github-actions[bot] commented on GitHub (Apr 25, 2026):

This issue has been automatically closed due to inactivity. If you believe this is still relevant, please open a new issue with up-to-date information.

<!-- gh-comment-id:4317269816 --> @github-actions[bot] commented on GitHub (Apr 25, 2026): This issue has been automatically closed due to inactivity. If you believe this is still relevant, please open a new issue with up-to-date information.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/newt#2057