In our last post, we saw how to efficiently scale down EC2 instances using a combination of different AWS services: Lambda, CloudWatch and SNS.
The following is a more feasible (since this approach allows us to scale down gradually, the earlier method involved an abrupt scale down action) method to scale down EC2 instances gradually:
- Scale down action is triggered from a CloudWatch alarm.
- An instance is marked for termination.
- Lifecycle hook associated with that auto-scaling group causes instance to begin Terminating: Wait state, and sends out notification to corresponding SNS topic.
- Lambda function is invoked once notification is received on SNS topic.
- From lambda, make a POST call to stop/suspend all required processes gracefully.
- Check suspend status with GET call.
- Once suspend status is true, notify the lifecycle hook that instance can be safely terminated.
- These steps are repeated by Cloudwatch alert after every check carried out on specific intervals, provided the specified threshold remains low in order for scaling down to continue. Thus, scaling down by one instance at a time.
Let’s look at each step in more detail:
Create a lifecycle hook
Create IAM role for lifecycle hook
The lifecycle hook needs an IAM role to be associated with it. It can be created by following steps outlined in this article.
Grant required IAM permissions
The IAM user whose keys will be used for creating lifecycle hook through AWS CLI or API, needs to have iam:PassRole action allowed for source ARN of the role created in above step.
Also, the IAM role associated with lambda functions needs to have permission for autoscaling:CompleteLifecycleAction.
Create lifecycle hook using AWS CLI
Replace the highlighted values with your values and run following command using AWS CLI:
aws autoscaling put-lifecycle-hook --lifecycle-hook-name <b>HOOK_NAME</b> --auto-scaling-group-name <b>ASG_NAME</b><span style="font-weight: 400;"> --lifecycle-transition autoscaling:EC2_INSTANCE_TERMINATING --notification-target-arn </span><b>SNS_ARN</b> --role-arn <b>ROLE_ARN</b>
Create separate CloudWatch alerts
Create separate CloudWatch alerts for scaling out and for scaling in. While scaling out, the alert will have a threshold such as, if value ≥ threshold, then trigger an alert. Similarly, while scaling in, the alert will be triggered if value ≤ threshold.
Having two separate alerts allows us to have +infinity and -infinity in our auto-scaling policies. If both actions are specified in single alert, then neither is available, which results in improper scaling actions.
Modification to lambda functions
The biggest advantage of a lifecycle hook is, our lambda functions become lightweight. Instance selection for termination can be removed, as lambda is invoked from SNS topic which is notified when an instance is being terminated.
The rest of the logic remains the same, except at the end, an instance is not terminated by lambda function itself, instead the lambda function just notifies the lifecycle hook to continue with the termination process.
With this approach, both scaling out and in actions are managed by auto-scaling and CloudWatch. This enables us to specify which termination policy to use while scaling in.
‘closest-to-instance-hour’ - this termination policy will terminate the instances which are close to their hourly charging pulse, which results in economical scale-in action.
How does it work?
We’ve updated the PoC with this approach implemented; hope you enjoy it!
We would love to hear your own best practices, tips and tricks in the comments.