TL/DR
Please don’t put network/application load-balancers (ALBs) in front of your workload if the LB is just going to be providing clunkier solutions for problems that have already been solved by the workload’s hosting platform.
The actual “short” Take
How we got here
Network load-balancers (ALB/SLB, even GLB) only exist in the first place to solve (work around) capacity scalability limitations of network-delivered applications. In doing that they also (incidentally) gave us additional knobs to twist on the resiliency characteristics of those same applications.
“Historically” (am I showing my age?) network-delivered applications were factored as one or more executables, running on dedicated operating-system instances, which ran on dedicated compute hardware. The physical limits of the hardware platform and the logical limits of the operating sytem established an upper boundary of capacity available to a single instance of an application.
The problem-to-solve at the time was:
How do we raise the capacity ceiling for these applications?
Potential solutions that took to long, or would introduce unacceptable capacity overhead included:
- Wait for compute hardware capacity to increase (PCI-E displacing PCI; compute nodes with more CPU/RAM sockets, etc..)
- Build an all-active clustering capability into the application itself
If the capacity overhead can be kept low enough, all-active clustering is a much better strategy than waiting for future generations of compute hardware and operating systems. And, the first problem to solve when trying to evolve from a single-instance architecture to an all-active/multi-instance architecture is that of workload distribution. That is:
How we make 10 little things look and feel like one big thing?”
It turned out that the network services we were using to direct traffic/connections to a single instance of an application could be manipulated with shockingly low overhead to make those “10 little things” almost act like one big thing. I’ll spare us all the mechanics of ALBs over the years, but its that “almost” that I want to talk about.
A network-load-balancer can do shockingly well at distributing workload in this scenario, but what it can’t do is provide application-layer state synchronization across application instances. That means that the maximum granularity for workload distribution is a TCP-session (or six-tuple “flow”). And if one of ten application-servers fails, all active connections to that server fail with it. The big compromise that settled on the IT world back then was this:
Not having state sychronization between application instances, and having to deal with the ‘occasional’ elephant-flow is a fair price to pay for not having to wait for new hardware and OS iterations every time an application hits a capacity ceiling
Now, to be fair, some applications/services can’t tolerate even partial loss of state when a single isntance fails, and some are just too bursty at per-flow granularity to distribute workload effectively. Those are the things that eventually did end up with native all-active clustering, and that you generally wouldn’t think about putting an ALB in front-of.
And now?
“So what, Mencken?” “We all know that, Mencken!” What’s my point? It’s that we no longer live in a world where every application has a dedicated OS instance and presents to our networks as a single host/node.
Micro-services application architecture emphasizes/encourages stateless behavior. If an application is truly stateless, network-load-balanced applications now lose individual transactions if an application instance fails, instead of entire connections/sessions.
If your application is running in a container context (not a VM, or dedicated physical host), you’re probably using Kubernetes to orchestrate those containers, and Kubernetes has a native “ingress” function/operator that is a network load-balancer (but only serving applications on containers in the Kubernetes cluster’s “container network” overlay.) And the Kubernetes ingress function/operator has much richer telemetry for the state/health of individual containers (because it’s directly integrated with the Kubernetes control-plane) than a traditional network load-balancer ever would.
Kubernetes-hosted applications are already presenting to the network as either a unique FQDN or unique URL within a Kubernetes-cluster-dedicated FQDN. Please be sure you actually have a reason before you stick another ALB in front of them!
Don’t be afraid to point your clients at the FQDNs/URLs that your Kubernetes cluster publishes them to. (They’re already doing a better job of workload distribution an external ALB will be able to.)