Path MTU Discovery
The normative definitions, from the RFCs. Path MTU discovery (PMTUD) can refer to any of several related strategies an IP host can use (and IP routers may have to facilitate) in order to recover from IP MTU mismatches between sender and receiver (and the path between them).
PMTUD RFC 1191
- First RFC-defined mechanism for allowing two nodes on separate subnets to discover the “effective” MTU of the path between them.
- Has different functional requirements for “routers” and “hosts”
- Routers are required to generate an ICMP message when they receive a packet that has the don’t fragment bit set to “1” and it is larger than the IP interface MTU of the egress interface. This ICMP “packet too big” message:
- Declares the original source/destination IP-address fields of the packet that was too big
- Declares the actual IP MTU of the egress interface on which encapsulation of the too-big packet had failed
- Includes the first 60 bytes of the too-big packet
- Hosts are required to respond to ICMP packet-too-big messages by creating a “path MTU” entry for the IP interface on which the packet-too-big (PTB) message was received
- The path-MTU entry supersedes the configured interface MTU
- The path-MTU entry defines the IP MTU value specified in the PTB message as applicable to the destination address specified in the PTB message.
- Solves for all IP traffic from host A to host B
- As long as those hosts are on separate subnets
- And as long as every router in between the hosts generates (and forwards) the ICMP PTB messages
Packetization Layer PMTUD (PLPMTUD) RFC 4821
- Created to solve for “PMTUD blackholes”, in which the ICMP messages required by RFC1191 PMTUD are not delivered to the hosts
- Many network devices filter ICMP PTB messages
- Many routers don’t generate ICMP PTB messages
- Especially SDN routers
- Uses packetization layer (typically TCP) to infer path MTU
- Changes the behavior of a TCP endpoint, manipulating TCP segment sizes to trigger packet-loss and correllate it with a specific segment size.
- Only solves for TCP traffic
- Or, at best, for traffic between hosts on which there is at least one TCP socket
- RFC4821 acknowledges that it would be “optimal” for a PLPMTUD mechanism to instruct the downstack IP interface to update IP path MTU for the endpoint in question, using a “shared cache in the IP layer”, but doesn’t address the issue any further.
- Only works for “packetization layers” that have sufficiently rigorous state awareness (explicit acknowledgements of transmissions, in particular.)
Datagram PLPMTUD (DPLPMTUD) [RFC 8899] https://datatracker.ietf.org/doc/html/rfc8899
PLPMTUD, but for datagram protocols (datagram protocols don’t have built-in mechanisms for tracking state)
Has to be implemented at the application layer (the application provides the required state-awareness)
Stack Adjacency
(A term of art I invented for this analysis.)
The relationship between two interfaces directly above/below each other in the network stack
- “Layer N ” interface notated as: “If:L( n )”
- Interface immediately above “If:L(n)” in the stack notated as: “If:L( n+1 )”
- If:L( n ) is “ downstack-adjacent ” to If:L( n+1 )
- If:L( n+1 ) is “ upstack-adjacent ” to If:L( n )
Tunnel Endpoints
(My own distillation of the meaning attributed to these terms by the collective subconscious.)
A Tunnel Endpoint (TEP) encapsulates packets from an overlay network into packets to be transmitted on an underlay network for transport. (And vice-versa.)
- “TEP” does get explicitly defined in the RFCs defining VXLAN and GENEVE (and probably others.)
- The Geneve RFC (8926) includes the following language
- “As the ultimate consumer of any tunnel metadata, tunnel endpoints have the highest level of requirements for parsing and interpreting tunnel headers”
- TEP interfaces are logically “between” overlay and underlay interfaces. Whether that neighboring relationship is vertical or _horizontal _ is… hard to pin down.
- Let’s call them “diagonal” neighbors. (Since they’re a bit of both.)
MTU, Generalized
(I’m taking the liberty of adding some different variations of what we might mean by MTU.)
Interface _ MTU_
The largest message that a message-oriented protocol interface will attempt to transmit to its downstack-adjacent interface. (This is what we’re usually talking about when we describe a network device’s “MTU”.)
Effective _ MTU_
The largest message that can be successfully transmitted between two endpoints. (This is described in RFC 1191.)
Path _ MTU_
The largest message that an interface will attempt to send to a specific destination. (This is described in RFC 1191.)
Protocol _ MTU_
The largest message permitted by the semantics of the interface’s native protocol. (This is a term of art that I made up here.)
M_x_U
Some more maximxum ___ unit concepts to help make a few points.
Maximum Payload Unit (MPU)
(This is a straight-up Menckenism.)
The maximum size payload that a message-oriented protocol interface can transmit in a single message.
- Cannot be greater than interface-MTU minus protocol encapsulation overhead
Maximum Receive Unit (MRU)
(This is not a Menckenism, per-se. MRU is formally defined in the context of point-to-point-tunneling-protocol (RFC 2637). But, I am decontextualizing it for use in this discussion.)
The maximum size message that a message-oriented protocol interface will accept from its downstack-adjacent interfaces
- Should not be higher than the protocol-MTU
Is not a prescriptive element of IP or Ethernet protocol standards (but maybe it should be). It is a descriptive property of any message-oriented protocol interface