November 15, 2015

Save AWS costs by scheduled start and stop of EC2 instances

Most of the AWS resources are billed on per-hour basis which provides us an opportunity to save cost based on the usage pattern. Especially in case of Dev/Test environment, we need them only during the working days and working hours.

By using AWS Lambda function (python) in combination with EC2 instance tags, scheduled start or stop can be achieved with few lines of code.

Advantages of using AWS Lambda:

  • For 128MB memory we get 3,200,000 seconds free tier usage per month in AWS Lambda which will be more than enough for this function.
  • IAM role will be associated with the Lambda function which will be used to perform the required task and no separate keys need to be configured.

I used a tag with name as schedule & JSON string for tag value using which I could define the day and time when it needs to be started or stopped.

Sample tag value

{
    "mon": {
        "start": 7,
        "stop": 18
    },
    "tue": {
        "start": 7,
        "stop": 18
    },
    "wed": {
        "start": 7,
        "stop": 18
    },
    "thu": {
        "start": 7,
        "stop": 18
    },
    "fri": {
        "start": 7,
        "stop": 18
    }
}

One interesting scenario where this script is useful is to start the instance manually when required (not to start by schedule), but want to stop it in the evening by schedule. In this scenario just leave the value of start key as empty string and configure appropriate stop value for that instance.

This function will check whether the current day and hour match the start or stop value configured in the tag for the particular instance. By default it checks all the EC2 instances for the tag. For instance, if the tag with name “schedule” doesn’t exist, it will create the tag using the default schedule configured in the script.

If you want to exclude any instance from being tagged for schedule, there are two ways to do that. 1) By adding the instance id of the particular instance in the exclude variable inside the script. 2) Adding a tag with name as “ignore” to the instance which needs to be excluded. Value of this tag doesn’t matter.

Handing instance[s] behind ELB:

If the instance is behind ELB and once the instance has been stopped, ELB will stop checking the health of the instance. Hence when the instance gets powered on, it won’t be available via ELB automatically. This script checks whether the instance which got started as per schedule has been attached to any ELB. If it’s attached to an ELB, it will be de-registered and registered again.

Handling instance[s] in Auto Scaling Group:

When instances in auto scaling group get stopped, ASG health check will identify whether the particular instance is unhealthy and trigger the necessary corrective action to launch alternate instance. Therefore to prevent the auto scaling from triggering creation of new instances, certain processes need to be suspended.

► Lambda function

import boto3
import json
import time

config = {
    'tag': 'schedule',
    'exclude': ['i-a4514ff6'],
    'default': '{"mon": {"start": 7, "stop": 18},"tue": {"start": 7, "stop": 18},"wed": {"start": 7, "stop": 18},"thu": {"start": 7, "stop": 18},"fri": {"start": 7, "stop": 18}}'
}
sps = ['Launch', 'Terminate', 'HealthCheck', 'ReplaceUnhealthy', 'AZRebalance']


#
# Loop EC2 instances and check if a 'schedule' tag has been set. Next, evaluate value and start/stop instance if needed.
#
def lambda_handler(event, context):
    print "=== Start parsing AWS schedule."
    ec2 = boto3.client('ec2')
    asc = boto3.client('autoscaling')
    response = ec2.describe_instances()

    # Get current day + hour (using GMT)
    hh = int(time.strftime("%H", time.gmtime()))
    day = time.strftime("%a", time.gmtime()).lower()

    started = []
    stopped = []

    exclude_list = config['exclude']

    # Loop reservations/instances.
    for r in response['Reservations']:
        for ins in r['Instances']:
            if (ins['InstanceId'] not in exclude_list) and (not any('ignore' in t['Key'] for t in ins['Tags'])):
                try:
                    for tag in ins['Tags']:
                        if tag['Key'] == 'schedule':
                            data = tag['Value']
                        if tag['Key'] == 'aws:autoscaling:groupName':
                            asg = tag['Value']

                    if data is None:
                        create_schedule_tag(ins, ec2)
                        data = config['default']
                    schedule = json.loads(data)
                    data = None
                    '''
                    TODO: Be smart to find time and start/stop based on the scheduled window instead of just checking
                          at the hour of start or stop.
                    '''
                    try:
                        if hh == schedule[day]['start'] and not ins['State']['Name'] == 'running':
                            print "Starting instance "%s" ..." % (ins['InstanceId'])
                            started.append(ins['InstanceId'])
                            ec2.start_instances(InstanceIds=[ins['InstanceId']])
                    except:
                        pass  # catch exception if 'start' is not in schedule.

                    try:
                        if hh == schedule[day]['stop'] and ins['State']['Name'] == 'running':
                            if asg is not None:
                                print "Suspending autoscaling process for ASG "%s" before shutting down
                                       the instance." % asg
                                asc.suspend_processes(AutoScalingGroupName=asg, ScalingProcesses=sps)
                            print "Stopping instance "%s" ..." % (ins['InstanceId'])
                            stopped.append(ins['InstanceId'])
                            ec2.stop_instances(InstanceIds=[ins['InstanceId']])
                    except:
                        pass  # catch exception if 'stop' is not in schedule.
                    asg = None
                except KeyError as e:
                    create_schedule_tag(ins, ec2)
                except ValueError as e:
                    # invalid json
                    print 'Invalid value for tag "schedule" on instance "%s", please check!' % (ins['InstanceId'])
            else:
                print "Instance %s is successfully ignored." % ins['InstanceId']

    # Fix ELB configuration
    '''
    TODO: Deregister & register from ELB only if instances are not InService.
    '''
    if len(started) > 0:
        print "Instances have been started... Checking instances in Elastic Load Balancer."
        elb = boto3.client('elb')
        lbd = elb.describe_load_balancers()
        for e in lbd['LoadBalancerDescriptions']:
            for inss in e['Instances']:
                if inss['InstanceId'] in started:
                    print "Deregistering instance %s from ELB %s" % (inss['InstanceId'], e['LoadBalancerName'])
                    elb.deregister_instances_from_load_balancer(LoadBalancerName=e['LoadBalancerName'],
                                                                Instances=[{'InstanceId': inss['InstanceId']}])
                    time.sleep(3)
                    print "Registering instance %s with ELB %s" % (inss['InstanceId'], e['LoadBalancerName'])
                    elb.register_instances_with_load_balancer(LoadBalancerName=e['LoadBalancerName'],
                                                              Instances=[{'InstanceId': inss['InstanceId']}])

    print "=== Finished parsing AWS schedule."


def create_schedule_tag(instance, ec2):
    if instance['InstanceId'] not in config['exclude']:
        try:
            tag_name = config['tag']
            tag_value = config['default']
            print "About to create tag on instance %s with value: %s" % (instance['InstanceId'], tag_value)
            ec2.create_tags(Resources=[instance['InstanceId']], Tags=[{'Key': tag_name, 'Value': tag_value}])
        except Exception as e:
            print e
    else:
        print "Instance %s is successfully ignored." % instance.id

if __name__ == '__main__':
    lambda_handler('event', 'handler')

Following are the list of privileges that should be enabled for the IAM role associated to the Lambda function.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BaseLambdaPolicy",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Sid": "SchedulerPolicy",
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags",
                "ec2:DescribeInstances",
                "ec2:StartInstances",
                "ec2:StopInstances",
                "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
                "autoscaling:SuspendProcesses"
            ],
            "Resource": "*"
        }
    ]
}

Lambda configuration:

Configure timeout as 10 seconds.

Add scheduler as event source and configure it to run every hour.

Lambda_Schedule_Event

TODO:

Handle instance start/stop schedule in a better way than depending on that particular hour.

© Prakash P 2015 - 2019

Powered by Hugo & Kiss.