EBS snapshot as backup mechanism is a very common practice and it also has been defined in Backup and Recovery section of EC2 best practices. If you have read my another article Save AWS costs by scheduled start and stop of EC2 instances, similar approach will be used in creating Snapshots too.
Using JSON in EC2 Tags enables us to provide granular configuration details. Create an EC2 tag with name as backup and value with details containing time to take backup and retention period. Obtaining these details from Tags will enable us to configure different schedule & retention period for different EC2 instance. If no backup is needed on a particular day, that day shall be removed from the value.
Sample tag value
{
"time": {
"mon": 23,
"tue": 23,
"wed": 23,
"thu": 23,
"fri": 23,
"sat": 23,
"sun": 23
},
"retention": 1
}
Lambda function
import boto3
import botocore
import json
import time
import sys
config = {
'tag': 'backup',
'exclude': ['i-a163b90e'],
'default': '{"time": {"mon": 23, "tue": 23, "wed": 23, "thu": 23, "fri": 23, "sat": 23, "sun": 23},'
'"retention": 1}',
'exclude_name': ['TestScript'],
'auto-create-tag': 'true',
'sns_topic': 'arn:aws:sns:us-west-2:112345678901:pprakash'
}
def lambda_handler(event, context):
print "=== Start parsing EBS backup script. ==="
ec2 = boto3.client('ec2')
response = ec2.describe_instances()
namesuffix = time.strftime('-%Y-%m-%d-%H-%M')
data = None
# Get current day + hour (using GMT)
hh = int(time.strftime("%H", time.gmtime()))
day = time.strftime("%a", time.gmtime()).lower()
exclude_list = config['exclude']
# Loop Volumes.
try:
for r in response['Reservations']:
for ins in r['Instances']:
for t in ins['Tags']:
if t['Key'] == 'Name':
for namestr in config['exclude_name']:
if namestr in t['Value']:
print 'Excluding Instance with ID ' + ins['InstanceId']
exclude_list.append(ins['InstanceId'])
if (ins['InstanceId'] not in exclude_list) and (not any('ignore' in t['Key'] for t in ins['Tags'])):
for tag in ins['Tags']:
if tag['Key'] == config['tag']:
data = tag['Value']
if data is None and config['auto-create-tag'] == 'true':
print "Instance %s doesn't contains the tag and auto create is enabled." % ins['InstanceId']
create_backup_tag(ins, ec2)
data = config['default']
schedule = json.loads(data)
data = None
if hh == schedule['time'][day] and not ins['State']['Name'] == 'terminated':
print "Getting the list of EBS volumes attached to \"%s\" ..." % ins['InstanceId']
volumes = ins['BlockDeviceMappings']
for vol in volumes:
vid = vol['Ebs']['VolumeId']
print "Creating snapshot of volume \"%s\" ..." % (vid)
snap_res = ec2.create_snapshot(VolumeId=vid, Description=vid + namesuffix)
if snap_res['State'] == 'error':
if config['sns_topic'] is not None or config['sns_topic'] != '':
notify_topic('Failed to create snapshot for volume with ID %s.\nCheck Cloudwatch \
logs for more details.' % vid)
print 'Failed to create snapshot for volume with ID %s.' % vid
sys.exit(1)
elif maintain_retention(ec2, vid, schedule['retention']) != 0:
print "Failed to maintain the retention period appropriately."
else:
print "Instance %s is successfully ignored." % ins['InstanceId']
except botocore.exceptions.ClientError as e:
print 'Recieved Boto client error %s' % e
except KeyError as k:
if config['auto-create-tag'] == 'true':
print "Inside KeyError %s" % k
create_backup_tag(ins, ec2)
except ValueError:
# invalid json
print 'Invalid value for tag \"backup\" on instance \"%s\", please check!' % (ins['InstanceId'])
print "=== Finished parsing EBS backup script. ==="
def create_backup_tag(instance, ec2):
if instance['InstanceId'] not in config['exclude']:
try:
tag_name = config['tag']
tag_value = config['default']
print "About to create tag on instance %s with value: %s" % (instance['InstanceId'], tag_value)
ec2.create_tags(Resources=[instance['InstanceId']], Tags=[{'Key': tag_name, 'Value': tag_value}])
except Exception as e:
print e
else:
print "Instance %s is successfully ignored." % instance.id
def maintain_retention(ec2, vid, retention_days):
try:
snapls = ec2.describe_snapshots(Filters=[{'Name': 'volume-id', 'Values': [vid]}])
snapdes = []
for snap in snapls['Snapshots']:
snapdes.append({snap['Description']: snap['SnapshotId']})
snaps = sorted(snapdes)
while len(snaps) > retention_days:
print snaps
snapval = snaps[0].values()[0]
print 'Deleteing snapshot with ID %s' % snapval
ec2.delete_snapshot(SnapshotId=snapval)
snaps.pop(0)
return 0
except botocore.exceptions.ClientError as e:
print 'Recieved Boto client error %s' % e
return 1
except:
print 'Unknown exception in maintain_retention'
return 1
def notify_topic(msg):
sns = boto3.client('sns')
try:
res = sns.publish(TopicArn=config['sns_topic'], Message=msg, Subject='EBS Volume backup notification.')
if 'MessageId' not in res:
print 'Failed to send notification to SNS topic %s' % sns_topic
return 1
print 'Sent notification successfully.'
return 0
except:
return 1
if __name__ == '__main__':
lambda_handler('event', 'handler')
This function contains a config section which defines how you expect the script to behave. If auto-create-tag
is set to true
the script will check all EC2 instances for a tag named backup, if it exists it will use the value of it. If it doesn’t exists, it will create the tag with the value as defined in default
section of config. Script can be configured to ignore certain EC2 instances by specifying their instance IDs in exclude
or the name of the instance in exclude_name
section. If an EC2 instance contains a tag named ignore
the script will ignore that instance too.
If auto-create-tag
is set to false
the script will check all EC2 instances for a tag named backup, if it exists it will trigger backup, else ignore that instance.
Once the script identifies that the instance needs to be backed up, it will get the list of EBS volumes attached to that instance and create snapshot of them. Once snapshot creation has been triggered, it will check the number of snapshots created from that volume and delete the older snapshots to maintain the retention period configured in the tags.
If sns_topic
has been configured with ARN of the SNS topic, it will send an email notification whenever it fails to create the snapshot.
List of privileges that should be enabled for the IAM role associated to the Lambda function.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:*"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"ec2:describeinstances",
"ec2:createsnapshot",
"ec2:createtags",
"ec2:describesnapshots",
"ec2:deletesnapshot"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"sns:Publish"
],
"Resource": [
"arn:aws:sns:us-west-2:112345678901:pprakash"
]
}
]
}
Lambda function shall be scheduled to run every hour.
Lambda configuration:
Add scheduler as event source and configure it to run every hour.