Previously, we discussed the five things to consider when setting up health checks for Classic AWS Elastic Load Balancer (ELB), as well as the basics of how health checks work, but the conversation wouldn’t be complete without a quick note on Autoscaling, as it adds a level of complexity to this picture.
If you choose to utilize the ELB health check with your Autoscaling group, you’ll create a dilemma when your application dependencies have issues. Now, your health check is far more than infrastructure, and a code bug or dependency blip could take you down. This is because, once an ELB health check marks the application instance(s) as unhealthy, Autoscaling will replace them.
Within EC2, replacements can happen very quickly, and the rate of replacement depends on your health check and Autoscaling group configuration. This is great if the failure is an isolated one. However, if you have set your Autoscaling group to use the ELB health check and the problem is embedded in your application or in a dependency, you could end up with an outage and a huge cost bump. The outage and cost bump stem from the fact that if something is wrong with the application, its health check, or a remote dependency (DB, etc.), your instances will fail regardless of whether or not they are newly launched. This also means that all of your instances could fail and enter a replacement phase. Since EC2 bills a minimum of one hour for successful launches of the instance, this could drive your EC2 costs up substantially, even if the instances are up for a period of seconds or minutes.
To prevent this risky scenario, you should move away from using the ELB health check process to fail instances through Autoscaling. Using the EC2 health check ensures that only infrastructure or parent host related problems will trigger an “unhealthy” state in the health check. This will result in an instance fail only if there’s an infrastructure problem. You can then have your microservice execute a decision tree that decides when an issue should be remediated by instance replacement, and report that to Autoscaling directly or take appropriate action through your microservice. The EC2 health check will take care of downed infrastructure. The ELB health check will keep problematic instances out of rotation. Your microservice will take appropriate action and allow you to gather more data for your team to diagnose and fix application level issues.
To summarize what we’ve learned throughout this blog series, tailored health checks are important for critical production systems running under the “Classic” version of ELB. Take time to develop a health check strategy, design a health check page for your applications, and consider the implications of Autoscaling. And, above all else, continually reiterate on your strategy and implementation to ensure success in AWS.
For more details of health checks, a critical production systems running under the “Classic” version of ELB, check out our previous blogs in this series as well as the original article that inspired this discussion: