Within Amazon Web Services, EC2 instances can be scaled up easilybased on various metrics using auto-scaling. You can use auto-scaling to scale down instances. However, sometimes while performing the scale down operation it is necessary to perform mandatory operations on the instance before bringing it down. In that case, it is not possible or advisable to simply use the auto-scaling service directly to scale down the instances.
One solution is to create a custom API from the application to stop all required processes gracefully and then proceed for termination.
- Detect the instance to be scaled-down
- Make a POST call to stop/suspend all required processes gracefully
- Check suspend status with GET call
- Once suspend status is true, terminate the instance from auto-scaling group
- Repeat above steps till instance count reaches to minimum count specified in auto-scaling groups
With the help of auto-scaling, lambda, Cloudwatch monitoring, and SNS, all of these steps can be completely automated. Let’s proceed with the steps one by one:
Detect the instance
While detecting the instance from the auto-scaling group for termination, look for the availability zone that has a higher number of instances and select an instance from it. If by mistake an incorrect instance is selected for termination, then the algorithm designed for the auto-scaling service restructures the instances in the auto-scaling group and equally divides them into their associated availability zones. Let’s look at an example:
Availability zone 1 is called as az-1 and Availability zone 2 is called as az-2.
If az-1 has two instances and az-2 has one instance, then the instance from az-1 should be selected, since it has a higher number of instances. By mistake, if an instance from az-2 is selected for termination, then the following undesired activity happens:
Availability zone - Number of instances
az-1 - 2
az-2 - 0
(1 instances terminated from az-2)
az-1 - 1
az-2 - 1
(Auto-scaling terminates an instance from az-1 and launches a new one in az-2 in order to maintain balance between the availability zones)
The above condition terminated a healthy instance that we did not want to delete along with the instance we selected and launches an altogether new instance.
POST call to suspend processes
A public endpoint for this API is necessary to suspend processes on the instances. This API should access an authentication key and may be an internal IP for the instance to be suspended. By doing this, we can keep the instances inside the private subnet of VPC, and suspend them as needed through the public endpoint.
Check suspend status with GET call
Once the suspend process is initiated, we will need to check its progress. In the proof of concept, we have two AWS lambda functions because of the maximum 60 second execution time restriction for lambda functions. The first lambda function determines the instance to be downscaled, and makes a POST call for suspension, and the second lambda function checks if the suspend status is true then proceeds to instance termination, otherwise the lambda function revokes itself again to check suspend status through GET call.
Terminate the instance
Once the instance is suspended, an API becomes available to terminate the instance directly from the auto-scaling group and adjusts the number of desired instances accordingly. Please note, this termination option is different from the detach option. In the detach option, the instance is removed from the corresponding auto-scaling group, however the instance still runs unless you stop or terminate it.
After the instance is terminated, check whether the remaining instances running are greater than the minimum number of instances specified in the respective auto-scaling group. If this number is greater, then repeat the process to scale down all the additional instances launched during high traffic.
How does it work?
A new, more feasible method is outlined in our latest post about scaling down!
We would love to hear your own best practices, tips and tricks in the comments.