Design elements

The initial design iteration included the following elements:

  • Two workload-hosting sites (sites 1/2)
    • Sites selected for their “goldilocks” geo-proximity (~11ms RTT latency)
      • Far enough away to host resilient workloads for each other during most extreme failure modes.
      • Close enough together that they can both run active workloads
      • Site-level failure induces “degraded capacity” instead of “enterprise-wide failure” and monolithic recovery
  • Three VRFs for workload-hosting subnets (network security zones 1/2/3)
  • All zones are present at each site.
  • A single HA FW pair is present in each DC, with a “leg” in each security zone
  • Inter-data-center connectivity is provisioned for zone “A” only
  • eBGP sessions are established between the firewall and each of the three workload VRFs at each site

Graphical depiction of design

This design is depicted in the following diagram

image

Routing behavior of design

With eBGP sessions between the firewalls and zone VRFs at each site, as well as the zone-1 VRFs at each site peering to a zone-1 “WAN” vrf, each VRF and firewall has a predictable path to every other VRF, with AS-path-length providing a sufficient metric. For traffic that is local to a single site, the path is optimal. For traffic moving into/out of either zone 2 or 3 and moving between sites, the path is decidely sub-optimal.

Gaps / Limitations

This design iteration was sub-optimal in the following regards:

  • Inter-site/intra-zone traffic for zones 2 and 3
    • Traversed firewalls at site 1 and at site 2
      • Desired behavior is to traverse no firewalls
    • Had a transport path that included zone 1
      • Desired behavior was for traffic to only be present in workload-hosting security zones if the traffic’s source or destination was present in that zone

These gaps are illustrated in the following figure, illustrating the path taken for traffic from hosts in Site-1/Zone-3 to Site-2/Zone-3.

image

Retrospection

The initial design iteration fulfilled the core functional requirements, but at the expensive of sub-optimal pathing. Subsequent design iterations were expected to close that gap.