Our Professional Services team recently received a request from a customer that wanted to build their application from the ground up in order to take advantage of the many features of AWS.
They had already decided on Docker containers managed by Amazon ECS, a Direct Connect from their office into Amazon, Ansible to configure the individual ECS instances, and they wanted the entire deployment to be repeatable in virtually any Amazon region. All other requirements would be decided along the way, as the application was still being written.
With this rough idea in mind, here were some of the challenges the Datapipe Professional Services team faced and how they were able to overcome them:
Rapid and Repeatable Deployment
Since this was a work in progress, it was essential to be able to quickly destroy and rebuild individual pieces of this environment rapidly. For this, we turned to Terraform and our CI/CD pipeline. This gave us the flexibility to see changes before they were applied, since multiple people were working on varying parts of the environment at the same time. Additionally, since we make use of AMI maps, we were able to quickly set up a new pipeline, a remote state file, and change the subnet IP scheme. We could also set a few variables and redeploy the final solution over and over again in as many regions as needed or add additional deployments in the same region.
Split Horizon DNS – Route53 Private Zones, and Dnsmasq Forwarder
The corporate side of the Direct Connect needed to resolve the internal addresses of resources in the VPC and the instances. The instances also needed to resolve the private IP addresses of the hosts on the corporate side of the Direct Connect by querying the corporate private DNS Servers. We wrote custom scripts that would install and configure Dnsmasq and change the DHCP option set to point to the Dnsmasq server. The Dnsmsaq server would then have conditional forwarders for certain domains that would forward requests to the corporate internal DNS servers.
The corporate DNS servers also had a forwarder configure to send all lookups for zone, which was hosted in the Route53 private zone to the Dnsmasq server. Since the corporate side of the Direct Connect did not accept dynamic updates for the forwarders, we employed Amazon Instance recovery with custom alarms to ensure that if something went wrong with the Dnsmasq instances, the replacement Dnsmsaq server would maintain the same internal IP address.
Automatic Route53 Updates
First, we created roles that got assigned to each instance, which would allow them to update specific types of Route53 records. Then we used our custom init/upstart/systemd scripts so that each instance, as it was either launched or shut down, would be able to add/remove/update its own Route53 record in both the public and private zones. We also employed special scripts that could clean up orphaned Route53 records if a terminated instance did not clean up after itself.
ECS Cluster, Autoscaling, Bootstrapping with Ansible
The Amazon Autoscaling feature allowed us to maintain the ECS cluster utilizing user data templates passed to the launch configuration via the Terraform code. The instance itself would, on launch, ensure it had the proper version of Docker and other utilities, set its own private zone Route53 DNS entry, then pull down a series of scripts from an s3 bucket and install everything needed to run the Ansible playbooks. The client also managed portions of these scripts and since this was being developed live, these scripts changed frequently. By using s3 and ASGs when a change to one of the playbooks needed to be tested or applied we could simply terminate an instance and allow autoscaling to replace the instance, which would be pulling down the latest script on the fly.
IAM roles, IAM Users, and Bucket Policies
Since this was a fast changing dynamic environment, we needed to make sure that if a temporary change were made, those changes would not end up sticking around. Again we turned to Terraform, which reset all these entries as part of each Terraform plan and Terraform apply. It also assisted us in discovering what permanent changes needed to get coded as part of the Terraform stack.
We needed to ensure everything which supported tags had a tag. Terraform would enforce these tagging requirements and reset any tag that was changed.
Building an application from the ground up can be an arduous process for an organization to handle alone. Pairing with a reliable partner is a good way to reach new levels of success. For more information on all of the services we offer, please visit the Datapipe Professional Services page.