June 9, 2016

Scheduling automated EBS snapshots serverless using AWS Lambda

EBS snapshot as backup mechanism is a very common practice and it also has been defined in Backup and Recovery section of EC2 best practices. If you have read my another article Save AWS costs by scheduled start and stop of EC2 instances, similar approach will be used in creating Snapshots too.

Using JSON in EC2 Tags enables us to provide granular configuration details. Create an EC2 tag with name as backup and value with details containing time to take backup and retention period. Obtaining these details from Tags will enable us to configure different schedule & retention period for different EC2 instance. If no backup is needed on a particular day, that day shall be removed from the value.

Sample tag value

{
	"time": {
		"mon": 23,
		"tue": 23,
		"wed": 23,
		"thu": 23,
		"fri": 23,
		"sat": 23,
		"sun": 23
	},
	"retention": 1
}

Lambda function

import boto3
import botocore
import json
import time
import sys

config = {
    'tag': 'backup',
    'exclude': ['i-a163b90e'],
    'default': '{"time": {"mon": 23, "tue": 23, "wed": 23, "thu": 23, "fri": 23, "sat": 23, "sun": 23},'
               '"retention": 1}',
    'exclude_name': ['TestScript'],
    'auto-create-tag': 'true',
    'sns_topic': 'arn:aws:sns:us-west-2:112345678901:pprakash'
}


def lambda_handler(event, context):
    print "=== Start parsing EBS backup script. ==="
    ec2 = boto3.client('ec2')
    response = ec2.describe_instances()
    namesuffix = time.strftime('-%Y-%m-%d-%H-%M')
    data = None

    # Get current day + hour (using GMT)
    hh = int(time.strftime("%H", time.gmtime()))
    day = time.strftime("%a", time.gmtime()).lower()

    exclude_list = config['exclude']

    # Loop Volumes.
    try:
        for r in response['Reservations']:
            for ins in r['Instances']:
                for t in ins['Tags']:
                    if t['Key'] == 'Name':
                        for namestr in config['exclude_name']:
                            if namestr in t['Value']:
                                print 'Excluding Instance with ID ' + ins['InstanceId']
                                exclude_list.append(ins['InstanceId'])
                    if (ins['InstanceId'] not in exclude_list) and (not any('ignore' in t['Key'] for t in ins['Tags'])):
                        for tag in ins['Tags']:
                            if tag['Key'] == config['tag']:
                                data = tag['Value']

                        if data is None and config['auto-create-tag'] == 'true':
                            print "Instance %s doesn't contains the tag and auto create is enabled." % ins['InstanceId']
                            create_backup_tag(ins, ec2)
                            data = config['default']
                        schedule = json.loads(data)
                        data = None

                        if hh == schedule['time'][day] and not ins['State']['Name'] == 'terminated':
                            print "Getting the list of EBS volumes attached to \"%s\" ..." % ins['InstanceId']
                            volumes = ins['BlockDeviceMappings']
                            for vol in volumes:
                                vid = vol['Ebs']['VolumeId']
                                print "Creating snapshot of volume \"%s\" ..." % (vid)
                                snap_res = ec2.create_snapshot(VolumeId=vid, Description=vid + namesuffix)
                                if snap_res['State'] == 'error':
                                    if config['sns_topic'] is not None or config['sns_topic'] != '':
                                        notify_topic('Failed to create snapshot for volume with ID %s.\nCheck Cloudwatch \
                                                     logs for more details.' % vid)
                                    print 'Failed to create snapshot for volume with ID %s.' % vid
                                    sys.exit(1)
                                elif maintain_retention(ec2, vid, schedule['retention']) != 0:
                                    print "Failed to maintain the retention period appropriately."
                    else:
                        print "Instance %s is successfully ignored." % ins['InstanceId']
    except botocore.exceptions.ClientError as e:
        print 'Recieved Boto client error %s' % e
    except KeyError as k:
        if config['auto-create-tag'] == 'true':
            print "Inside KeyError %s" % k
            create_backup_tag(ins, ec2)
    except ValueError:
        # invalid json
        print 'Invalid value for tag \"backup\" on instance \"%s\", please check!' % (ins['InstanceId'])
    print "=== Finished parsing EBS backup script. ==="


def create_backup_tag(instance, ec2):
    if instance['InstanceId'] not in config['exclude']:
        try:
            tag_name = config['tag']
            tag_value = config['default']
            print "About to create tag on instance %s with value: %s" % (instance['InstanceId'], tag_value)
            ec2.create_tags(Resources=[instance['InstanceId']], Tags=[{'Key': tag_name, 'Value': tag_value}])
        except Exception as e:
            print e
    else:
        print "Instance %s is successfully ignored." % instance.id


def maintain_retention(ec2, vid, retention_days):
    try:
        snapls = ec2.describe_snapshots(Filters=[{'Name': 'volume-id', 'Values': [vid]}])
        snapdes = []
        for snap in snapls['Snapshots']:
            snapdes.append({snap['Description']: snap['SnapshotId']})
            snaps = sorted(snapdes)
            while len(snaps) > retention_days:
                print snaps
                snapval = snaps[0].values()[0]
                print 'Deleteing snapshot with ID %s' % snapval
                ec2.delete_snapshot(SnapshotId=snapval)
                snaps.pop(0)
        return 0
    except botocore.exceptions.ClientError as e:
        print 'Recieved Boto client error %s' % e
        return 1
    except:
        print 'Unknown exception in maintain_retention'
        return 1


def notify_topic(msg):
    sns = boto3.client('sns')
    try:
        res = sns.publish(TopicArn=config['sns_topic'], Message=msg, Subject='EBS Volume backup notification.')
        if 'MessageId' not in res:
            print 'Failed to send notification to SNS topic %s' % sns_topic
            return 1
        print 'Sent notification successfully.'
        return 0
    except:
        return 1


if __name__ == '__main__':
    lambda_handler('event', 'handler')

This function contains a config section which defines how you expect the script to behave. If auto-create-tag is set to true the script will check all EC2 instances for a tag named backup, if it exists it will use the value of it. If it doesn’t exists, it will create the tag with the value as defined in default section of config. Script can be configured to ignore certain EC2 instances by specifying their instance IDs in exclude or the name of the instance in exclude_name section. If an EC2 instance contains a tag named ignore the script will ignore that instance too.

If auto-create-tag is set to false the script will check all EC2 instances for a tag named backup, if it exists it will trigger backup, else ignore that instance.

Once the script identifies that the instance needs to be backed up, it will get the list of EBS volumes attached to that instance and create snapshot of them. Once snapshot creation has been triggered, it will check the number of snapshots created from that volume and delete the older snapshots to maintain the retention period configured in the tags.

If sns_topic has been configured with ARN of the SNS topic, it will send an email notification whenever it fails to create the snapshot.

List of privileges that should be enabled for the IAM role associated to the Lambda function.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:*"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:describeinstances",
                "ec2:createsnapshot",
                "ec2:createtags",
                "ec2:describesnapshots",
                "ec2:deletesnapshot"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sns:Publish"
            ],
            "Resource": [
                "arn:aws:sns:us-west-2:112345678901:pprakash"
            ]
        }
    ]
}

Lambda function shall be scheduled to run every hour.

Lambda configuration:

Add scheduler as event source and configure it to run every hour.

Lambda_Schedule_Event

© Prakash P 2015 - 2023

Powered by Hugo & Kiss.