AWS-Disaster Recovery

AWS-Disaster Recovery

Disaster recovery (DR) is an organization’s ability to restore access and functionality to IT infrastructure after a disaster event, whether natural or caused by human action (or error). Today, disaster recovery planning is crucial for any business, especially those operating either partially or entirely in the cloud.

Any event that has a negative impact on a company’s business continuity or finances is a disaster. Disaster recovery(DR) is about preparing for and recovering from a disaster.

What kind of disaster recovery ?

  • On-premise => On-premise: traditional DR, and very expensive
  • On-premise => AWS Cloud: hybrid recovery
  • AWS Cloud Region A => AWS Cloud Region B

Need to define two term

  • RPO: Recovery Point Objective
  • RTO: Recovery Time Objective

Disaster Recovery Stategies

  • Backup and Restore
  • Pilot Light
  • Warm Standby
  • Hot Site / Multi Site Approach

1. Backup and Restore

Backup and restore can take a lot of time and so you get a high RTO as well. But

  • It is quite cheap to do backup and restore.
  • We don’t manage infrastructure in the middle.
  • We just recreate infrastructure when we need it,
  • We only have cost of storing these backups.

2. Pilot Light

  • A small version of the app is always running in the cloud.
  • Userful for the critical core (pilot light).
  • Very similar to Backup and Restore but faster than because critical systems are already up.

3. Warm Standby

  • Full system is up and running but at minimum size.
  • Upon disaster, we can scale to production load.

4. Hot Site / Multi Site Approach

  • Full production scale is running (AWS and On-premise) or (AWS and AWS).
  • Very low RTO (minutes or seconds) but very expensive.

Disaster Recovery Tips

Backup

  • EBS Snapshots, RDS automated backups/snapshots.
  • Regular pushes to S3.
  • From On-premise by Snowball or Storage Gateway.

High Availability

  • Use Route53 to migrate DNS over from Region to Region.
  • RDS Multi-AZ, ElasticCache Multi-AZ, EFS, S3.
  • Site to Site VPN as a recovery from Direct Connect.

Replication

  • RDS Replication (Cross Region), AWS Aurora + Global Databases.
  • Database replication from on-premise to RDS.
  • Hybrid Cloud Storage by AWS Storage Gateway.

Automation

  • CloudFormation / Elastic Beanstalk to re-create a whole new environment.
  • Recover/Reboot EC2 instances with CloudWatch if alarm fail.
  • AWS Lambda functions for customized automations.

Chaos

  • Randomly terminating EC2.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *