Olm fails to establish direct connection due to UDP port conflict during NAT holepunching on macOS #8

Closed
opened 2025-11-19 07:04:02 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @danohn on GitHub (Aug 29, 2025).

Description

When using the --holepunch flag on macOS, olm frequently fails to establish a direct peer-to-peer connection and falls back to relay mode due to a race condition between the holepunch goroutine and WireGuard device initialization.

Current Behavior

  1. Olm starts UDP holepunching on a randomly selected port (e.g., 55704)
  2. The holepunch goroutine holds this port open to maintain NAT mappings
  3. WireGuard attempts to bind to the same port for the tunnel
  4. Binding fails with error: Unable to update bind: listen udp4 :55704: bind: address already in use
  5. Connection falls back to relay mode after 4 seconds

Expected Behavior

WireGuard should successfully bind to the port after holepunching completes, enabling direct peer-to-peer connectivity without relay.

Root Cause

There's insufficient delay between close(stopHolepunch) and WireGuard's dev.Up() call. The current 10ms delay is not enough for macOS to release the UDP port, causing WireGuard to fail when attempting to bind.

Logs

ERROR: wireguard: 2025/08/29 12:38:01 Unable to update bind: listen udp4 :55704: bind: address already in use
DEBUG: wireguard: 2025/08/29 12:38:01 Interface state was Down, requested Up, now Down
ERROR: 2025/08/29 12:38:01 Failed to bring up WireGuard device: listen udp4 :55704: bind: address already in use

Environment

  • OS: macOS 15.6.1 (tested on MacBook Pro with M1 Pro)
  • Olm version: 1.1.0
  • Using --holepunch flag

Solution

Increase the delay after closing the holepunch channel from 10ms to 500ms to ensure the OS has released the port before WireGuard attempts to bind.

Reproduction Steps

  1. Run sudo -E olm --holepunch on macOS
  2. Observe logs showing port binding failure
  3. Note fallback to relay mode instead of direct connection

Impact

  • Higher latency due to unnecessary relay usage
  • Increased bandwidth costs through relay server
  • Degraded performance when direct connectivity should be possible
Originally created by @danohn on GitHub (Aug 29, 2025). ## Description When using the `--holepunch` flag on macOS, olm frequently fails to establish a direct peer-to-peer connection and falls back to relay mode due to a race condition between the holepunch goroutine and WireGuard device initialization. ## Current Behavior 1. Olm starts UDP holepunching on a randomly selected port (e.g., 55704) 2. The holepunch goroutine holds this port open to maintain NAT mappings 3. WireGuard attempts to bind to the same port for the tunnel 4. Binding fails with error: `Unable to update bind: listen udp4 :55704: bind: address already in use` 5. Connection falls back to relay mode after 4 seconds ## Expected Behavior WireGuard should successfully bind to the port after holepunching completes, enabling direct peer-to-peer connectivity without relay. ## Root Cause There's insufficient delay between `close(stopHolepunch)` and WireGuard's `dev.Up()` call. The current 10ms delay is not enough for macOS to release the UDP port, causing WireGuard to fail when attempting to bind. ## Logs ``` ERROR: wireguard: 2025/08/29 12:38:01 Unable to update bind: listen udp4 :55704: bind: address already in use DEBUG: wireguard: 2025/08/29 12:38:01 Interface state was Down, requested Up, now Down ERROR: 2025/08/29 12:38:01 Failed to bring up WireGuard device: listen udp4 :55704: bind: address already in use ``` ## Environment - OS: macOS 15.6.1 (tested on MacBook Pro with M1 Pro) - Olm version: 1.1.0 - Using `--holepunch` flag ## Solution Increase the delay after closing the holepunch channel from 10ms to 500ms to ensure the OS has released the port before WireGuard attempts to bind. ## Reproduction Steps 1. Run `sudo -E olm --holepunch` on macOS 2. Observe logs showing port binding failure 3. Note fallback to relay mode instead of direct connection ## Impact - Higher latency due to unnecessary relay usage - Increased bandwidth costs through relay server - Degraded performance when direct connectivity should be possible
Author
Owner

@oschwartz10612 commented on GitHub (Oct 1, 2025):

I think fixed in #17?

@oschwartz10612 commented on GitHub (Oct 1, 2025): I think fixed in #17?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/olm#8