mirror of
https://github.com/fosrl/newt.git
synced 2026-05-07 08:28:25 -05:00
[GH-ISSUE #237] Client connectivity not working on Raspbian (aarch64) #2056
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @hanjo on GitHub (Feb 17, 2026).
Original GitHub issue: https://github.com/fosrl/newt/issues/237
Originally assigned to: @LaurenceJJones on GitHub.
Describe the Bug
Hi,
I have tried to get the "Zero-Trust Private Access" functionality to work for some time and eventually figured out, that it is my newt which is causing the connection to fail. I believe this might have to do with the architecture of my host, which is ARM64 (
aarch64). While this host has some special network setup (it is part of a DMZ), I never had issues with the "Web-based Public Access", and I was able to get the "Zero-Trust Private Access" functionality to work in no time on ax86_64machine.On the
x86_64host I can see this line on startup of newt:This line is missing from the log on the
aarch64host and when I try to connect the app, I will instead get:and obviously the App will never complete the connection attempt. This makes me believe that there may be some code which is missing for my architecture, or some other issue with setting up the required wireguard tunnel.
Environment
To Reproduce
This is my
docker-compose.ymlon the Raspberry Pi 4 Model B Rev 1.5:Expected Behavior
Same behavior as on
x86_64obviously 🙂@github-actions[bot] commented on GitHub (Mar 4, 2026):
This issue has been automatically marked as stale due to 14 days of inactivity. It will be closed in 14 days if no further activity occurs.
@hanjo commented on GitHub (Mar 4, 2026):
In the meantime, Pangolin v1.16.2 and Newt v1.10.2 were released, but I still see the same issue. Any chance to look into this?
Thanks!
@LaurenceJJones commented on GitHub (Mar 7, 2026):
Could you set the environment
LOG_LEVEL=DEBUGand provide the full logs so we can dive deeper. (just ensure to strip any sensitive information from the output)@hanjo commented on GitHub (Mar 7, 2026):
Sure, here you go:
@LaurenceJJones commented on GitHub (Mar 9, 2026):
Thanks for the detailed debug logs @hanjo - they were very helpful in tracking this down.
So my theory so far is a race condition in Pangolin, not a newt client issue. The handleGetConfigMessage handler has two conditions that cause it to silently drop requests without sending a response:
Endpoint not set yet (handleGetConfigMessage.ts:59-64):
if (!existingSite.endpoint) {
logger.debug(
In newt get config: existing site has no endpoint, skipping);return; // No response sent - client times out
}
Hole punch too old (handleGetConfigMessage.ts:70-75):
if (existingSite.lastHolePunch && now - existingSite.lastHolePunch > 5) {
logger.warn(
Site last hole punch is too old, skipping);return; // No response sent - client times out
}
Why it could be affecting ARM64/aarch64 only?
The endpoint and lastHolePunch fields are only set when the hole punch succeeds via Gerbil (updateHolePunch.ts:321-328). On ARM64/Raspberry Pi:
- The 10 retry attempts (20 seconds) have been exhausted, OR
- The lastHolePunch timestamp is now >5 seconds old
Evidence from Logs
Could it be anything else?
Yes in theory maybe the DMZ the device is behind is too strict, but is the
x86device in the same DMZ?Current ideas to fix:
Option A: Increase the 5-second window
The 5-second lastHolePunch requirement is very strict. Increasing to 30 seconds would accommodate slower ARM64 initialization, but also could introduce mismatch if not handled correctly (most stateful firewalls hold routes for 30 seconds maximum)
Option B: Return an error instead of silent drop
Send an explicit error response so the client knows to retry:
Option C: Client-side resilience
Have newt wait for hole punch confirmation before requesting config, or implement longer/infinite retries for get-config.
@hanjo commented on GitHub (Mar 9, 2026):
Wow, very intersting. I wouldn't have figured the performance of the device to be a factor. The x86_64 machine I'm running newt on is a Intel Core i5-9500T which should be about 5-6x times as fast. A more relevant measure is probably the SSD compared to the SD card, which has a huge influence on I/O.
Not sure what Option makes most sense, but if you need me to test some development version, let me know. I'm running everything in docker, if that makes any difference.
@github-actions[bot] commented on GitHub (Mar 24, 2026):
This issue has been automatically marked as stale due to 14 days of inactivity. It will be closed in 14 days if no further activity occurs.
@hanjo commented on GitHub (Mar 24, 2026):
Any news on this issue @LaurenceJJones ?
@github-actions[bot] commented on GitHub (Apr 8, 2026):
This issue has been automatically marked as stale due to 14 days of inactivity. It will be closed in 14 days if no further activity occurs.
@hanjo commented on GitHub (Apr 8, 2026):
It seems also with Pangolin v1.17.0 and Newt v1.11.0 this is still not working.
@dpurnam commented on GitHub (Apr 10, 2026):
+1 (AMD based newt sites work fine with private resources but not with ARM based newt-sites)
@hanjo commented on GitHub (Apr 28, 2026):
I just updated to newt v1.12.0 and it seems the situation has not changed (i.e. the issue is still the same).
@LaurenceJJones commented on GitHub (Apr 28, 2026):
Yes apologies me and @oschwartz10612 have a ticket to dive into this but had a few set backs with the release of pangolin.
@hanjo commented on GitHub (Apr 28, 2026):
No worries, I can understand this isn't the most pressing thing to look into. Just trying to keep the bot from closing this issue, before it is resolved 🙃
Thanks for your great work!
@LaurenceJJones commented on GitHub (Apr 28, 2026):
I added a tag now so the bot wont close or mark as stale 👍🏻
@oschwartz10612 commented on GitHub (Apr 28, 2026):
I was actually testing on a arm32v7 Raspberry Pi yesterday and it was working fine. We have newts deployed on arm64 EC2 graviton instances which also work. Must be a specific race condition as maybe Loz is saying. Needs more investigation.
@hanjo commented on GitHub (Apr 30, 2026):
Thanks for looking into this. I now believe my bug report is a false positive. After having re-read the entire issue discussion once more and with the statement, that it works for you, I was checking my configuration once again. In my DMZ I disallow outbound Internet connectivity and I have explicitly allowed tcp/udp 443 and udp 51820 to pangolin. My other newt is in a different network segment without these restrictions. I ran a tcpdump and noticed that newt also tries to connect on udp 21820 to pangolin, which I previously believed only clients (as in "user devices") need to connect to. So I allowed this in the firewall and voilà it works.
Bottom line: it was a misconfiguration on my side; my apologies for having wasted your time.
I leave this issue open in case you would like to use it to trace that race condition, but from my point of view feel free to close it 👍
@LaurenceJJones commented on GitHub (Apr 30, 2026):
Ahhh! makes perfect sense, maybe we should clarify a bit more in the documentation then? that even though we say "Clients" it is the co-ordination port so both sides as in a client and Newt itself need to be able to communicate to it.
@hanjo commented on GitHub (Apr 30, 2026):
Better documentation can never hurt, can it? 🙃 I think port 21820 is only mentioned twice in the documentation. Maybe it would be helpful to mention the communication requirements on the newt documentation page at https://docs.pangolin.net/manage/sites/install-site#running-newt