TL/DR
- In-network tunneling turns the entire underlay into a PMTUD-blackhole for the overlay
- The lack of a clear solution probably has a lot to do with ambiguity as to the role of tunnel interfaces
- Existing workarounds boil down to either
- “Throw bigger MTU at the underlay”, or
- “Claw back MTU from the overlay”
Verbose
In-network tunneling / IP-overlay Breaks PMTUD
- There’s not a working mechanism for underlay PMTU information to make it into the overlay
- The entire underlay becomes a PMTUD blackhole for the overlay
How Can This Be?
- Hosts and routers have different PMTUD obligations
- Routers only have to generate PTB messages
- Not implement path MTU
- Hosts have to implement/maintain path MTU state
- Devices doing IP-overlay encapsulation are forwarding datagrams (like a router) and generating datagrams (like a host)
- And (usually) fulfilling the PMTUD obligations of neither role
Overlay/Underlay/Tunnel Interfaces
- What happens when Eth-b receives a PTB to Tu-a?
- Does it send the PTB to Tu-a?
- Does Tu-a xlate the PTB into the overlay?
- Does Tu-a implement path MTU?
- Status quo, basically nothing
- Tu-a isn’t maintaining its own path-MTU state
- Tu-a isn’t translating the PTB into the overlay
Why nothing?
- If Tu-a receives a PTB from the underlay
- The PTB itself doesn’t contain enough information to successfully translate it into the overlay
- Tu-a typically doesn’t implement path-MTU because … (?)
- Everybody wants to buy “wire-speed” overlay capabilities
- And that’s not feasible(?) to implement in-ASIC?
Current Workarounds
-
Overlay providers can declare much smaller MTUs on the overlay network services than on the underlay services and avoid the problem altogether
-
Overlay consumers can just decrease their interface MTUs for the same result