AWS Backup and Disaster Recovery

Business continuity is crucial for modern organizations, and implementing a robust backup and disaster recovery strategy on AWS can mean the difference between minor disruption and catastrophic data loss. AWS provides a comprehensive suite of services and architectural patterns that enable organizations to build resilient systems with multiple layers of protection, automated recovery processes, and cost-effective data retention policies.

Understanding AWS Backup Architecture

AWS Backup serves as a centralized service that automates and manages backups across multiple AWS services. It provides a unified backup solution that eliminates the need to create custom scripts and manual processes for each service. The service supports cross-region backup, cross-account backup, and provides comprehensive monitoring and compliance reporting.

The service integrates natively with Amazon EC2, Amazon EBS, Amazon RDS, Amazon DynamoDB, Amazon EFS, Amazon FSx, AWS Storage Gateway, and Amazon S3. This integration allows for consistent backup policies across your entire infrastructure, reducing complexity and ensuring comprehensive protection.

Disaster Recovery Fundamentals

AWS disaster recovery strategies are built around four key patterns, each offering different levels of protection and cost structures. The Backup and Restore pattern provides the most cost-effective approach for less critical workloads, storing backups in Amazon S3 and using AWS services for restoration when needed.

Pilot Light maintains a minimal version of your environment running in AWS, with critical data continuously replicated. During a disaster, you scale up the pilot light environment to handle production loads. Warm Standby runs a scaled-down version of your production environment, providing faster recovery times but at higher costs.

Multi-Site Active-Active represents the most robust approach, running your workload simultaneously in multiple locations with full capacity. This approach provides near-zero downtime but requires significant investment in infrastructure and complexity management.

Comprehensive Implementation: Multi-Tier Application Recovery

Let’s build a complete disaster recovery solution for a three-tier web application, demonstrating how to implement automated backups, cross-region replication, and orchestrated recovery processes.

Infrastructure Setup with CloudFormation

Here’s a comprehensive CloudFormation template that establishes the backup and disaster recovery infrastructure:

AWSTemplateFormatVersion: '2010-09-09'
Description: 'Comprehensive AWS Backup and Disaster Recovery Infrastructure'

Parameters:
  PrimaryRegion:
    Type: String
    Default: us-east-1
    Description: Primary region for the application
  
  SecondaryRegion:
    Type: String
    Default: us-west-2
    Description: Secondary region for disaster recovery
  
  ApplicationName:
    Type: String
    Default: webapp
    Description: Name of the application

Resources:
  # AWS Backup Vault
  BackupVault:
    Type: AWS::Backup::BackupVault
    Properties:
      BackupVaultName: !Sub '${ApplicationName}-backup-vault'
      EncryptionKeyArn: !GetAtt BackupKMSKey.Arn
      Notifications:
        BackupVaultEvents: 
          - BACKUP_JOB_STARTED
          - BACKUP_JOB_COMPLETED
          - BACKUP_JOB_FAILED
          - RESTORE_JOB_STARTED
          - RESTORE_JOB_COMPLETED
          - RESTORE_JOB_FAILED
        SNSTopicArn: !Ref BackupNotificationTopic

  # KMS Key for backup encryption
  BackupKMSKey:
    Type: AWS::KMS::Key
    Properties:
      Description: KMS Key for AWS Backup encryption
      KeyPolicy:
        Statement:
          - Sid: Enable IAM User Permissions
            Effect: Allow
            Principal:
              AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'
            Action: 'kms:*'
            Resource: '*'
          - Sid: Allow AWS Backup
            Effect: Allow
            Principal:
              Service: backup.amazonaws.com
            Action:
              - kms:Encrypt
              - kms:Decrypt
              - kms:ReEncrypt*
              - kms:GenerateDataKey*
              - kms:DescribeKey
            Resource: '*'

  BackupKMSKeyAlias:
    Type: AWS::KMS::Alias
    Properties:
      AliasName: !Sub 'alias/${ApplicationName}-backup-key'
      TargetKeyId: !Ref BackupKMSKey

  # SNS Topic for backup notifications
  BackupNotificationTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: !Sub '${ApplicationName}-backup-notifications'
      DisplayName: Backup and Recovery Notifications

  # Backup Plan
  ComprehensiveBackupPlan:
    Type: AWS::Backup::BackupPlan
    Properties:
      BackupPlan:
        BackupPlanName: !Sub '${ApplicationName}-comprehensive-backup-plan'
        BackupPlanRule:
          - RuleName: DailyBackups
            TargetBackupVault: !Ref BackupVault
            ScheduleExpression: 'cron(0 2 * * ? *)'  # Daily at 2 AM
            StartWindowMinutes: 60
            CompletionWindowMinutes: 120
            Lifecycle:
              MoveToColdStorageAfterDays: 30
              DeleteAfterDays: 365
            RecoveryPointTags:
              Environment: Production
              BackupType: Daily
            CopyActions:
              - DestinationBackupVaultArn: !Sub 
                  - 'arn:aws:backup:${SecondaryRegion}:${AWS::AccountId}:backup-vault:${ApplicationName}-dr-vault'
                  - SecondaryRegion: !Ref SecondaryRegion
                Lifecycle:
                  MoveToColdStorageAfterDays: 30
                  DeleteAfterDays: 365
          
          - RuleName: WeeklyBackups
            TargetBackupVault: !Ref BackupVault
            ScheduleExpression: 'cron(0 3 ? * SUN *)'  # Weekly on Sunday at 3 AM
            StartWindowMinutes: 60
            CompletionWindowMinutes: 180
            Lifecycle:
              MoveToColdStorageAfterDays: 7
              DeleteAfterDays: 2555  # 7 years
            RecoveryPointTags:
              Environment: Production
              BackupType: Weekly
            CopyActions:
              - DestinationBackupVaultArn: !Sub 
                  - 'arn:aws:backup:${SecondaryRegion}:${AWS::AccountId}:backup-vault:${ApplicationName}-dr-vault'
                  - SecondaryRegion: !Ref SecondaryRegion
                Lifecycle:
                  MoveToColdStorageAfterDays: 7
                  DeleteAfterDays: 2555

  # IAM Role for AWS Backup
  BackupServiceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: backup.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForBackup
        - arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForRestores

  # Backup Selection
  BackupSelection:
    Type: AWS::Backup::BackupSelection
    Properties:
      BackupPlanId: !Ref ComprehensiveBackupPlan
      BackupSelection:
        SelectionName: !Sub '${ApplicationName}-resources'
        IamRoleArn: !GetAtt BackupServiceRole.Arn
        Resources:
          - !Sub 'arn:aws:ec2:*:${AWS::AccountId}:instance/*'
          - !Sub 'arn:aws:ec2:*:${AWS::AccountId}:volume/*'
          - !Sub 'arn:aws:rds:*:${AWS::AccountId}:db:*'
          - !Sub 'arn:aws:dynamodb:*:${AWS::AccountId}:table/*'
          - !Sub 'arn:aws:efs:*:${AWS::AccountId}:file-system/*'
        Conditions:
          StringEquals:
            'aws:ResourceTag/BackupEnabled': 'true'

  # RDS Primary Database
  DatabaseSubnetGroup:
    Type: AWS::RDS::DBSubnetGroup
    Properties:
      DBSubnetGroupName: !Sub '${ApplicationName}-db-subnet-group'
      DBSubnetGroupDescription: Subnet group for RDS database
      SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
      Tags:
        - Key: Name
          Value: !Sub '${ApplicationName}-db-subnet-group'

  PrimaryDatabase:
    Type: AWS::RDS::DBInstance
    Properties:
      DBInstanceIdentifier: !Sub '${ApplicationName}-primary-db'
      DBInstanceClass: db.t3.medium
      Engine: mysql
      EngineVersion: 8.0.35
      MasterUsername: admin
      MasterUserPassword: !Ref DatabasePassword
      AllocatedStorage: 20
      StorageType: gp2
      StorageEncrypted: true
      KmsKeyId: !Ref BackupKMSKey
      DBSubnetGroupName: !Ref DatabaseSubnetGroup
      VPCSecurityGroups:
        - !Ref DatabaseSecurityGroup
      BackupRetentionPeriod: 7
      DeleteAutomatedBackups: false
      DeletionProtection: true
      EnablePerformanceInsights: true
      MonitoringInterval: 60
      MonitoringRoleArn: !GetAtt RDSMonitoringRole.Arn
      Tags:
        - Key: BackupEnabled
          Value: 'true'
        - Key: Environment
          Value: Production

  # Read Replica in Secondary Region (for disaster recovery)
  SecondaryReadReplica:
    Type: AWS::RDS::DBInstance
    Properties:
      DBInstanceIdentifier: !Sub '${ApplicationName}-secondary-replica'
      SourceDBInstanceIdentifier: !GetAtt PrimaryDatabase.DBInstanceArn
      DBInstanceClass: db.t3.medium
      PubliclyAccessible: false
      Tags:
        - Key: Role
          Value: DisasterRecovery
        - Key: Environment
          Value: Production

  # DynamoDB Table with Point-in-Time Recovery
  ApplicationTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: !Sub '${ApplicationName}-data'
      AttributeDefinitions:
        - AttributeName: id
          AttributeType: S
        - AttributeName: timestamp
          AttributeType: N
      KeySchema:
        - AttributeName: id
          KeyType: HASH
        - AttributeName: timestamp
          KeyType: RANGE
      BillingMode: PAY_PER_REQUEST
      PointInTimeRecoverySpecification:
        PointInTimeRecoveryEnabled: true
      SSESpecification:
        SSEEnabled: true
        KMSMasterKeyId: !Ref BackupKMSKey
      StreamSpecification:
        StreamViewType: NEW_AND_OLD_IMAGES
      Tags:
        - Key: BackupEnabled
          Value: 'true'
        - Key: Environment
          Value: Production

  # Lambda Function for Cross-Region DynamoDB Replication
  DynamoDBReplicationFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: !Sub '${ApplicationName}-dynamodb-replication'
      Runtime: python3.9
      Handler: index.lambda_handler
      Role: !GetAtt DynamoDBReplicationRole.Arn
      Environment:
        Variables:
          SECONDARY_REGION: !Ref SecondaryRegion
          TABLE_NAME: !Ref ApplicationTable
      Code:
        ZipFile: |
          import json
          import boto3
          import os
          
          def lambda_handler(event, context):
              secondary_region = os.environ['SECONDARY_REGION']
              primary_table = os.environ['TABLE_NAME']
              
              # Initialize DynamoDB clients for both regions
              primary_dynamodb = boto3.resource('dynamodb')
              secondary_dynamodb = boto3.resource('dynamodb', region_name=secondary_region)
              
              for record in event['Records']:
                  if record['eventName'] in ['INSERT', 'MODIFY']:
                      # Replicate data to secondary region
                      try:
                          secondary_table = secondary_dynamodb.Table(f"{primary_table}-replica")
                          
                          if record['eventName'] == 'INSERT':
                              item = record['dynamodb']['NewImage']
                              # Convert DynamoDB format to regular format
                              formatted_item = {k: list(v.values())[0] for k, v in item.items()}
                              secondary_table.put_item(Item=formatted_item)
                          
                          elif record['eventName'] == 'MODIFY':
                              item = record['dynamodb']['NewImage']
                              formatted_item = {k: list(v.values())[0] for k, v in item.items()}
                              secondary_table.put_item(Item=formatted_item)
                              
                      except Exception as e:
                          print(f"Error replicating record: {str(e)}")
                          
              return {'statusCode': 200}

  # Event Source Mapping for DynamoDB Streams
  DynamoDBStreamEventSource:
    Type: AWS::Lambda::EventSourceMapping
    Properties:
      EventSourceArn: !GetAtt ApplicationTable.StreamArn
      FunctionName: !GetAtt DynamoDBReplicationFunction.Arn
      StartingPosition: LATEST
      BatchSize: 10
      MaximumBatchingWindowInSeconds: 5

  # S3 Bucket for application data with cross-region replication
  ApplicationBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Sub '${ApplicationName}-data-${AWS::AccountId}'
      VersioningConfiguration:
        Status: Enabled
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: aws:kms
              KMSMasterKeyID: !Ref BackupKMSKey
      ReplicationConfiguration:
        Role: !GetAtt S3ReplicationRole.Arn
        Rules:
          - Id: ReplicateToSecondaryRegion
            Status: Enabled
            Prefix: ''
            Destination:
              Bucket: !Sub 
                - 'arn:aws:s3:::${ApplicationName}-replica-${AWS::AccountId}-${SecondaryRegion}'
                - SecondaryRegion: !Ref SecondaryRegion
              StorageClass: STANDARD_IA
              EncryptionConfiguration:
                ReplicaKmsKeyID: !Sub 
                  - 'arn:aws:kms:${SecondaryRegion}:${AWS::AccountId}:alias/${ApplicationName}-backup-key'
                  - SecondaryRegion: !Ref SecondaryRegion
      NotificationConfiguration:
        LambdaConfigurations:
          - Event: s3:ObjectCreated:*
            Function: !GetAtt BackupValidationFunction.Arn
      Tags:
        - Key: BackupEnabled
          Value: 'true'
        - Key: Environment
          Value: Production

  # Lambda Function for Backup Validation
  BackupValidationFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: !Sub '${ApplicationName}-backup-validation'
      Runtime: python3.9
      Handler: index.lambda_handler
      Role: !GetAtt BackupValidationRole.Arn
      Code:
        ZipFile: |
          import json
          import boto3
          import time
          from datetime import datetime, timedelta
          
          def lambda_handler(event, context):
              backup_client = boto3.client('backup')
              sns_client = boto3.client('sns')
              
              # Check backup job status
              try:
                  # Get recent backup jobs
                  end_time = datetime.now()
                  start_time = end_time - timedelta(hours=24)
                  
                  response = backup_client.list_backup_jobs(
                      ByCreatedAfter=start_time,
                      ByCreatedBefore=end_time
                  )
                  
                  failed_jobs = []
                  successful_jobs = []
                  
                  for job in response['BackupJobs']:
                      if job['State'] == 'FAILED':
                          failed_jobs.append({
                              'JobId': job['BackupJobId'],
                              'ResourceArn': job['ResourceArn'],
                              'StatusMessage': job.get('StatusMessage', 'Unknown error')
                          })
                      elif job['State'] == 'COMPLETED':
                          successful_jobs.append({
                              'JobId': job['BackupJobId'],
                              'ResourceArn': job['ResourceArn'],
                              'CompletionDate': job['CompletionDate'].isoformat()
                          })
                  
                  # Send notification if there are failed jobs
                  if failed_jobs:
                      message = f"ALERT: {len(failed_jobs)} backup jobs failed in the last 24 hours:\n\n"
                      for job in failed_jobs:
                          message += f"Job ID: {job['JobId']}\n"
                          message += f"Resource: {job['ResourceArn']}\n"
                          message += f"Error: {job['StatusMessage']}\n\n"
                      
                      sns_client.publish(
                          TopicArn=os.environ['SNS_TOPIC_ARN'],
                          Subject='AWS Backup Job Failures Detected',
                          Message=message
                      )
                  
                  return {
                      'statusCode': 200,
                      'body': json.dumps({
                          'successful_jobs': len(successful_jobs),
                          'failed_jobs': len(failed_jobs)
                      })
                  }
                  
              except Exception as e:
                  print(f"Error validating backups: {str(e)}")
                  return {
                      'statusCode': 500,
                      'body': json.dumps({'error': str(e)})
                  }

  # Disaster Recovery Orchestration Function
  DisasterRecoveryFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: !Sub '${ApplicationName}-disaster-recovery'
      Runtime: python3.9
      Handler: index.lambda_handler
      Role: !GetAtt DisasterRecoveryRole.Arn
      Timeout: 900
      Environment:
        Variables:
          SECONDARY_REGION: !Ref SecondaryRegion
          APPLICATION_NAME: !Ref ApplicationName
      Code:
        ZipFile: |
          import json
          import boto3
          import time
          import os
          
          def lambda_handler(event, context):
              secondary_region = os.environ['SECONDARY_REGION']
              app_name = os.environ['APPLICATION_NAME']
              
              # Initialize AWS clients
              ec2 = boto3.client('ec2', region_name=secondary_region)
              rds = boto3.client('rds', region_name=secondary_region)
              route53 = boto3.client('route53')
              
              recovery_plan = event.get('recovery_plan', 'pilot_light')
              
              try:
                  if recovery_plan == 'pilot_light':
                      return execute_pilot_light_recovery(ec2, rds, route53, app_name)
                  elif recovery_plan == 'warm_standby':
                      return execute_warm_standby_recovery(ec2, rds, route53, app_name)
                  else:
                      return {'statusCode': 400, 'error': 'Invalid recovery plan'}
                      
              except Exception as e:
                  return {'statusCode': 500, 'error': str(e)}
          
          def execute_pilot_light_recovery(ec2, rds, route53, app_name):
              # Promote read replica to standalone database
              replica_id = f"{app_name}-secondary-replica"
              
              try:
                  rds.promote_read_replica(DBInstanceIdentifier=replica_id)
                  
                  # Wait for promotion to complete
                  waiter = rds.get_waiter('db_instance_available')
                  waiter.wait(DBInstanceIdentifier=replica_id)
                  
                  # Launch EC2 instances from AMIs
                  # This would contain your specific AMI IDs and configuration
                  
                  # Update Route 53 to point to DR environment
                  # Implementation depends on your DNS configuration
                  
                  return {
                      'statusCode': 200,
                      'message': 'Pilot light recovery initiated successfully'
                  }
                  
              except Exception as e:
                  return {'statusCode': 500, 'error': f"Recovery failed: {str(e)}"}
          
          def execute_warm_standby_recovery(ec2, rds, route53, app_name):
              # Scale up existing warm standby environment
              # Implementation would include auto scaling adjustments
              # and traffic routing changes
              
              return {
                  'statusCode': 200,
                  'message': 'Warm standby recovery initiated successfully'
              }

  # Required IAM Roles
  DynamoDBReplicationRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: DynamoDBReplicationPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - dynamodb:DescribeStream
                  - dynamodb:GetRecords
                  - dynamodb:GetShardIterator
                  - dynamodb:ListStreams
                  - dynamodb:PutItem
                  - dynamodb:UpdateItem
                Resource: '*'

  S3ReplicationRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: s3.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: S3ReplicationPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - s3:GetObjectVersionForReplication
                  - s3:GetObjectVersionAcl
                Resource: !Sub '${ApplicationBucket}/*'
              - Effect: Allow
                Action:
                  - s3:ListBucket
                Resource: !Ref ApplicationBucket
              - Effect: Allow
                Action:
                  - s3:ReplicateObject
                  - s3:ReplicateDelete
                Resource: !Sub 
                  - 'arn:aws:s3:::${ApplicationName}-replica-${AWS::AccountId}-${SecondaryRegion}/*'
                  - SecondaryRegion: !Ref SecondaryRegion

  BackupValidationRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: BackupValidationPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - backup:ListBackupJobs
                  - backup:DescribeBackupJob
                  - sns:Publish
                Resource: '*'

  DisasterRecoveryRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: DisasterRecoveryPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - ec2:*
                  - rds:*
                  - route53:*
                  - autoscaling:*
                Resource: '*'

  RDSMonitoringRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: monitoring.rds.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonRDSEnhancedMonitoringRole

  # VPC and Networking (simplified)
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      Tags:
        - Key: Name
          Value: !Sub '${ApplicationName}-vpc'

  PrivateSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [0, !GetAZs '']
      Tags:
        - Key: Name
          Value: !Sub '${ApplicationName}-private-subnet-1'

  PrivateSubnet2:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.2.0/24
      AvailabilityZone: !Select [1, !GetAZs '']
      Tags:
        - Key: Name
          Value: !Sub '${ApplicationName}-private-subnet-2'

  DatabaseSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for RDS database
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 3306
          ToPort: 3306
          SourceSecurityGroupId: !Ref ApplicationSecurityGroup
      Tags:
        - Key: Name
          Value: !Sub '${ApplicationName}-db-sg'

  ApplicationSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for application servers
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 80
          ToPort: 80
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          CidrIp: 0.0.0.0/0
      Tags:
        - Key: Name
          Value: !Sub '${ApplicationName}-app-sg'

Parameters:
  DatabasePassword:
    Type: String
    NoEcho: true
    Description: Master password for RDS database
    MinLength: 8
    MaxLength: 41
    AllowedPattern: '[a-zA-Z0-9]*'

Outputs:
  BackupVaultArn:
    Description: ARN of the backup vault
    Value: !GetAtt BackupVault.BackupVaultArn
    Export:
      Name: !Sub '${ApplicationName}-backup-vault-arn'
  
  BackupPlanId:
    Description: ID of the backup plan
    Value: !Ref ComprehensiveBackupPlan
    Export:
      Name: !Sub '${ApplicationName}-backup-plan-id'
  
  DisasterRecoveryFunctionArn:
    Description: ARN of the disaster recovery Lambda function
    Value: !GetAtt DisasterRecoveryFunction.Arn
    Export:
      Name: !Sub '${ApplicationName}-dr-function-arn'

  PrimaryDatabaseEndpoint:
    Description: Primary database endpoint
    Value: !GetAtt PrimaryDatabase.Endpoint.Address
    Export:
      Name: !Sub '${ApplicationName}-primary-db-endpoint'

Automated Recovery Testing

Testing your disaster recovery procedures is crucial for ensuring they work when needed. Here’s a Python script that automates DR testing:

import boto3
import json
import time
from datetime import datetime, timedelta
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class DisasterRecoveryTester:
    def __init__(self, primary_region='us-east-1', secondary_region='us-west-2'):
        self.primary_region = primary_region
        self.secondary_region = secondary_region
        self.backup_client = boto3.client('backup', region_name=primary_region)
        self.rds_client = boto3.client('rds', region_name=secondary_region)
        self.ec2_client = boto3.client('ec2', region_name=secondary_region)
        
    def test_backup_integrity(self, vault_name):
        """Test backup integrity by verifying recent backups"""
        try:
            # List recent recovery points
            end_time = datetime.now()
            start_time = end_time - timedelta(days=7)
            
            response = self.backup_

Regards
Osama

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.