[GH-ISSUE #108] iOS/OLM client: UDP hole punch fails on IPv6-only/NAT64 mobile networks #509

Open
opened 2026-04-29 17:02:45 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @SystemFuchs on GitHub (Mar 25, 2026).
Original GitHub issue: https://github.com/fosrl/olm/issues/108

Originally assigned to: @oschwartz10612 on GitHub.

Describe the Bug

The Pangolin iOS client (OLM) fails to register on IPv6-only mobile networks (e.g., T-Mobile Germany 5G with NAT64/DNS64). The WebSocket connection (TCP) establishes correctly over IPv6, but the UDP hole punch never sends a single packet, causing the client to be stuck in "Registering" state indefinitely.

The root cause appears to be that the hole punch code resolves base_endpoint to an IPv4 address and opens an AF_INET UDP socket, which cannot function on an IPv6-only network. On NAT64 networks, there is no native IPv4 stack available — only TCP connections benefit from the system's Happy Eyeballs / NAT64 translation.

Actual Behavior

The iOS client connects the WebSocket (TCP) successfully over IPv6 but gets stuck at "Registering" because the UDP hole punch never completes.

Diagnostic Evidence

1. WebSocket connects, but server receives no hole punch

Pangolin server logs show the WebSocket is established, but the hole punch timestamp is never updated:

[info]: Establishing websocket connection
[info]: Client added to tracking - OLM ID: 72jspdj84knpeu4
[info]: WebSocket connection fully established and ready - OLM ID: 72jspdj84knpeu4
[info]: Handling register olm message!
[info]: Public key mismatch. Updating public key and clearing session info...
[warn]: Client last hole punch is too old and we have sites to send; skipping this register
[warn]: Client last hole punch is too old and we have sites to send; skipping this register
[warn]: Client last hole punch is too old and we have sites to send; skipping this register
... (repeats every 2 seconds indefinitely)

2. tcpdump confirms zero UDP packets from iPhone

Full packet capture on the server's eth0 interface with tcpdump -i eth0 -n 'udp' shows no UDP packets from any T-Mobile IPv6 prefix or NAT64 address while the iPhone is attempting to connect. All captured UDP traffic belongs to other (IPv4) sites that are functioning correctly.

A separate capture filtered for IPv6 UDP (tcpdump -i eth0 -n 'ip6 and udp') shows 0 packets from the iPhone, while UDP from other IPv6 sources (e.g., another Hetzner VPS) arrives correctly.

3. Server infrastructure is correctly configured

  • Hetzner Cloud Firewall: UDP 51820 open for both "Any IPv4" and "Any IPv6"
  • ip6tables: default policy ACCEPT, no DROP rules
  • docker-proxy: listening on both 0.0.0.0:51820 and [::]:51820
  • IPv6 UDP from other hosts arrives correctly (verified with test from another VPS)

4. tcpdump of the WebSocket (TCP) over IPv6

The TCP connection from the iPhone's IPv6 address works perfectly — full TLS handshake, HTTP 101 upgrade, and the client sends 732-byte keepalive frames every 2 seconds. The server ACKs on TCP level but sends no application data back (because registration is blocked by missing hole punch).

5. WiFi (IPv4) works perfectly

Same iPhone, same app, same Pangolin server — switching from 5G to WiFi (which provides IPv4) immediately connects all sites.

Analysis

The sequence on an IPv6-only/NAT64 network:

  1. iOS resolves pangolin.mydomain → gets both A and AAAA records
  2. TCP/WebSocket: iOS networking stack (NSURLSession/Network.framework) uses Happy Eyeballs, prefers IPv6, connects via AAAA record → works
  3. UDP hole punch: The OLM code appears to resolve base_endpoint to the A record (IPv4) and opens an AF_INET UDP socket → fails silently on IPv6-only network (no native IPv4 stack available)

On NAT64 networks, the correct approach is to use getaddrinfo() with AF_UNSPEC and let the OS synthesize a NAT64 IPv6 address (e.g., 64:ff9b::4d2a:147a) for IPv4-only destinations. Alternatively, the code should attempt IPv6 UDP first when an AAAA record is available.

Note on AAAA records making it worse: With an AAAA record present, DNS64 synthesis does not activate (DNS64 only synthesizes when there is no AAAA record). This means the TCP stack connects via real IPv6, but the UDP stack gets a real IPv4 address it cannot use. Without the AAAA record, DNS64 would at least synthesize a NAT64 address, which might work if the UDP code uses getaddrinfo() properly.

Suggested Fix

In the iOS OLM client's hole punch implementation:

  1. Use getaddrinfo() with AF_UNSPEC instead of resolving to IPv4 only
  2. Prefer IPv6 (AF_INET6) when available, fall back to IPv4
  3. Or: use Network.framework's NWConnection with UDP, which handles Happy Eyeballs and NAT64 transparently

This would also fix the issue for all other NAT64/DS-Lite networks, which are extremely common in Germany (Vodafone Kabel, 1&1, Telekom mobile).

Impact

This affects all users on IPv6-only/NAT64 mobile networks, which includes:

Provider Network Type Affected
T-Mobile DE NAT64 (5G/LTE) Yes
Vodafone DE (Kabel) DS-Lite / CGNAT Yes
o2/Telefónica DE DS-Lite Yes
1&1 DS-Lite Yes

This is the majority of German mobile and cable internet users. The issue likely also affects other countries where carriers have deployed IPv6-only with NAT64.

Workaround

Currently none — the only option is to use WiFi with IPv4 connectivity. Removing the AAAA record from DNS does not help because the hole punch code appears to force AF_INET regardless.

Image
Image
Image
Image

Environment

  • Pangolin iOS: 0.6.2
  • Pangolin Server: running on Hetzner VPS (Debian), dual-stack (IPv4 + IPv6) -> Docker, all images latest
  • Gerbil: latest (from fosrl/gerbil image)
  • Mobile Network: T-Mobile Germany, 5G, IPv6-only with NAT64/DNS64
  • DNS: host has both A and AAAA records

To Reproduce

  1. Set up Pangolin server on a dual-stack VPS with both A and AAAA DNS records
  2. Configure sites/resources via Newt (working correctly)
  3. Connect the Pangolin iOS app via WiFi (IPv4) → works, all sites show "Connected"
  4. Switch to mobile data (5G, IPv6-only/NAT64 network) → app shows "Registering", sites never connect

Expected Behavior

The iOS client should complete registration and connect to all sites regardless of whether the client is on an IPv4 or IPv6-only network.

Originally created by @SystemFuchs on GitHub (Mar 25, 2026). Original GitHub issue: https://github.com/fosrl/olm/issues/108 Originally assigned to: @oschwartz10612 on GitHub. ### Describe the Bug The Pangolin iOS client (OLM) fails to register on IPv6-only mobile networks (e.g., T-Mobile Germany 5G with NAT64/DNS64). The WebSocket connection (TCP) establishes correctly over IPv6, but the UDP hole punch never sends a single packet, causing the client to be stuck in "Registering" state indefinitely. The root cause appears to be that the hole punch code resolves `base_endpoint` to an IPv4 address and opens an `AF_INET` UDP socket, which cannot function on an IPv6-only network. On NAT64 networks, there is no native IPv4 stack available — only TCP connections benefit from the system's Happy Eyeballs / NAT64 translation. ## Actual Behavior The iOS client connects the WebSocket (TCP) successfully over IPv6 but gets stuck at "Registering" because the UDP hole punch never completes. ## Diagnostic Evidence ### 1. WebSocket connects, but server receives no hole punch Pangolin server logs show the WebSocket is established, but the hole punch timestamp is never updated: ``` [info]: Establishing websocket connection [info]: Client added to tracking - OLM ID: 72jspdj84knpeu4 [info]: WebSocket connection fully established and ready - OLM ID: 72jspdj84knpeu4 [info]: Handling register olm message! [info]: Public key mismatch. Updating public key and clearing session info... [warn]: Client last hole punch is too old and we have sites to send; skipping this register [warn]: Client last hole punch is too old and we have sites to send; skipping this register [warn]: Client last hole punch is too old and we have sites to send; skipping this register ... (repeats every 2 seconds indefinitely) ``` ### 2. tcpdump confirms zero UDP packets from iPhone Full packet capture on the server's `eth0` interface with `tcpdump -i eth0 -n 'udp'` shows **no UDP packets from any T-Mobile IPv6 prefix or NAT64 address** while the iPhone is attempting to connect. All captured UDP traffic belongs to other (IPv4) sites that are functioning correctly. A separate capture filtered for IPv6 UDP (`tcpdump -i eth0 -n 'ip6 and udp'`) shows **0 packets** from the iPhone, while UDP from other IPv6 sources (e.g., another Hetzner VPS) arrives correctly. ### 3. Server infrastructure is correctly configured - Hetzner Cloud Firewall: UDP 51820 open for both "Any IPv4" and "Any IPv6" ✅ - `ip6tables`: default policy ACCEPT, no DROP rules ✅ - `docker-proxy`: listening on both `0.0.0.0:51820` and `[::]:51820` ✅ - IPv6 UDP from other hosts arrives correctly (verified with test from another VPS) ✅ ### 4. tcpdump of the WebSocket (TCP) over IPv6 The TCP connection from the iPhone's IPv6 address works perfectly — full TLS handshake, HTTP 101 upgrade, and the client sends 732-byte keepalive frames every 2 seconds. The server ACKs on TCP level but sends no application data back (because registration is blocked by missing hole punch). ### 5. WiFi (IPv4) works perfectly Same iPhone, same app, same Pangolin server — switching from 5G to WiFi (which provides IPv4) immediately connects all sites. ## Analysis The sequence on an IPv6-only/NAT64 network: 1. iOS resolves `pangolin.mydomain` → gets both A and AAAA records 2. **TCP/WebSocket**: iOS networking stack (NSURLSession/Network.framework) uses Happy Eyeballs, prefers IPv6, connects via AAAA record → ✅ works 3. **UDP hole punch**: The OLM code appears to resolve `base_endpoint` to the **A record (IPv4)** and opens an `AF_INET` UDP socket → ❌ fails silently on IPv6-only network (no native IPv4 stack available) On NAT64 networks, the correct approach is to use `getaddrinfo()` with `AF_UNSPEC` and let the OS synthesize a NAT64 IPv6 address (e.g., `64:ff9b::4d2a:147a`) for IPv4-only destinations. Alternatively, the code should attempt IPv6 UDP first when an AAAA record is available. **Note on AAAA records making it worse**: With an AAAA record present, DNS64 synthesis does not activate (DNS64 only synthesizes when there is no AAAA record). This means the TCP stack connects via real IPv6, but the UDP stack gets a real IPv4 address it cannot use. Without the AAAA record, DNS64 would at least synthesize a NAT64 address, which *might* work if the UDP code uses `getaddrinfo()` properly. ## Suggested Fix In the iOS OLM client's hole punch implementation: 1. Use `getaddrinfo()` with `AF_UNSPEC` instead of resolving to IPv4 only 2. Prefer IPv6 (`AF_INET6`) when available, fall back to IPv4 3. Or: use `Network.framework`'s `NWConnection` with UDP, which handles Happy Eyeballs and NAT64 transparently This would also fix the issue for all other NAT64/DS-Lite networks, which are extremely common in Germany (Vodafone Kabel, 1&1, Telekom mobile). ## Impact This affects **all** users on IPv6-only/NAT64 mobile networks, which includes: | Provider | Network Type | Affected | |----------|-------------|----------| | T-Mobile DE | NAT64 (5G/LTE) | ✅ Yes | | Vodafone DE (Kabel) | DS-Lite / CGNAT | ✅ Yes | | o2/Telefónica DE | DS-Lite | ✅ Yes | | 1&1 | DS-Lite | ✅ Yes | This is the majority of German mobile and cable internet users. The issue likely also affects other countries where carriers have deployed IPv6-only with NAT64. ## Workaround Currently none — the only option is to use WiFi with IPv4 connectivity. Removing the AAAA record from DNS does not help because the hole punch code appears to force `AF_INET` regardless. ![Image](https://github.com/user-attachments/assets/a3621a93-b430-40a4-8a3d-154648cc4a63) ![Image](https://github.com/user-attachments/assets/0c78e223-e45c-4886-b091-c02fb75b969d) ![Image](https://github.com/user-attachments/assets/5ce71df4-99f3-4dea-add1-b1d14468a238) ![Image](https://github.com/user-attachments/assets/0b9f3737-e86a-499b-a31e-9f3c87e14495) ### Environment - **Pangolin iOS**: 0.6.2 - **Pangolin Server**: running on Hetzner VPS (Debian), dual-stack (IPv4 + IPv6) -> Docker, all images latest - **Gerbil**: latest (from `fosrl/gerbil` image) - **Mobile Network**: T-Mobile Germany, 5G, IPv6-only with NAT64/DNS64 - **DNS**: host has both A and AAAA records ### To Reproduce 1. Set up Pangolin server on a dual-stack VPS with both A and AAAA DNS records 2. Configure sites/resources via Newt (working correctly) 3. Connect the Pangolin iOS app via WiFi (IPv4) → works, all sites show "Connected" 4. Switch to mobile data (5G, IPv6-only/NAT64 network) → app shows "Registering", sites never connect ### Expected Behavior The iOS client should complete registration and connect to all sites regardless of whether the client is on an IPv4 or IPv6-only network.
GiteaMirror added the bugneeds investigating labels 2026-04-29 17:02:45 -05:00
Author
Owner

@SystemFuchs commented on GitHub (Mar 30, 2026):

As an additional information:

I'm facing the same issue if I'm using my the Pangolin client on my Macbook & a NAT64 (5G/LTE) based connection via a mobile router.
If I change the APN of the router to an IPV4 based APN (which is sadly not possible at anytime) the client is connecting fine.

<!-- gh-comment-id:4154224661 --> @SystemFuchs commented on GitHub (Mar 30, 2026): As an additional information: I'm facing the same issue if I'm using my the Pangolin client on my Macbook & a NAT64 (5G/LTE) based connection via a mobile router. If I change the APN of the router to an IPV4 based APN (which is sadly not possible at anytime) the client is connecting fine.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/olm#509