mirror of
https://github.com/fosrl/olm.git
synced 2026-05-06 10:47:55 -05:00
[GH-ISSUE #72] Holepunching unreliable if there's network overlap #315
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @asardaes on GitHub (Jan 2, 2026).
Original GitHub issue: https://github.com/fosrl/olm/issues/72
Originally assigned to: @oschwartz10612 on GitHub.
Describe the Bug
It seems like Olm can't decide beetween holepunching and relaying when there's network overlap:
172.16.15.133is the Newt site's private IP, see more below.Environment
To Reproduce
I did a maybe-unusual experiment. I have 2 VMs in my VPS and they're both in the same subnet, one for Pangolin and one for a Newt site. Pangolin has a public domain, but I tried to connect the Newt site through the internal subnet by manually entering an entry in its
/etc/hosts:Newt is not running inside a container.
I then defined a private resource in the Newt VM. The client machine ended up being a container with Olm that's not on the VPS and is not running in host network mode. The Docker daemon from Olm's host has a pretty large IP pool (TrueNAS default):
172.17.0.0/12, so as seen in the logs, the holepunch tried to use the VPS private IP, which obviously cannot work, but since that IP was also valid in the Olm container's network, it looked like it could work, but it didn't.Expected Behavior
My guess is that Olm should completely ignore private IP ranges when attempting to holepunch.
@TerrifiedBug commented on GitHub (Feb 10, 2026):
I ran into this as well.
I have two OLM clients (v1.4.1) connecting via Pangolin. The holepunch was constantly flapping: it would connect direct for about 2–3 seconds, drop, fall back to relay, detect that direct worked again, switch back, drop again. Endless cycle.
Root cause
The root cause in my case was that the site’s public IP was also configured as a Pangolin private resource. OLM added a /32 host route for that IP through the tunnel.
When holepunch failed (which was expected, since no firewall rules were open for it) and fell back to relay, the holepunch monitor kept testing connectivity. However, those probe packets were now being routed through the OLM tunnel itself because of that static route. From OLM’s perspective, the probes appeared to succeed.
OLM then switched back to direct, the real connection immediately died, it fell back to relay again, the probes “succeeded” through the tunnel, and the whole cycle repeated endlessly.
I tried --disable-holepunch (CLI flag), DISABLE_HOLEPUNCH=true (env var), and confirmed via olm -show-config that disable-holepunch = true [file] was saved, but it made no difference. It kept flapping.
I dug into the source and found a related issue. In olm/olm.go, the OnTokenUpdate callback unconditionally starts the holepunch manager:
There’s no check for the holepunch config here. The flag only sets "relay": true in the registration message (around line ~416), but the client-side holepunch monitor keeps running regardless. It keeps testing, sees the peer is reachable, calls sendUnRelay(), switches to direct, the connection dies, and the cycle repeats. The disable-holepunch flag really should guard the holepunch manager, because right now it is effectively a no-op on the client side.
Workaround
I fixed this by creating a dummy interface on the server with a private IP and pointing the Pangolin private resource at that instead of the public IP:
ip link add dummy0 type dummy
ip addr add 10.99.99.1/32 dev dummy0
ip link set dummy0 up
This stopped OLM from hijacking the public IP route.
@ercoppa commented on GitHub (Feb 28, 2026):
I can confirm the issue.