About Failback with Arpio

Failback is an Arpio Enterprise feature that allows you to quickly and easily copy the most recent data from your recovery environment back to your primary environment in the event of a failover.

Failback is currently available for EC2, RDS, Aurora, and DynamoDB resource types only. If you are looking for more detailed documentation on performing a failback, please view our performing a failback documentation.

Where failback fits in your Disaster Recovery plan

A disaster recovery plan is not just about responding immediately to malicious attacks, or unexpected events, although Arpio can also help you with that. A complete Disaster Recovery plan allows you to return your business to normal as quickly and safely as possible. Failback restores your primary environment applications to an operational state. 

An example failback process

Situation: In response to an incident, you have performed a failover. Your application is now running in your recovery environment.

When you are ready to return to your primary environment:

  1. Ensure that your primary environment is ready to take production traffic. This may involve manual steps, like validating that malicious actions have been undone, or verifying that AWS services and resources are operating as designed .
  2. Perform a test failback, by following our guide to Performing a Failback.
    1. Arpio will ask you to set up some extra access to the primary environment via the installation of a new CloudFormation stack.
    2. Once Arpio has completed the failback process, you should test that your application works correctly, and that the primary environment has been restored with up-to-date data from the recovery environment.
  3. Schedule a time to perform a real failback. We suggest scheduling a maintenance window to perform a failback, during which no production traffic is served by your applications, to minimize the chance of data loss.
  4. Perform the failback, by:
    1. Following the steps to perform a failback, just like in step 2.
    2. Flip DNS, or make any other changes required to serve production traffic from your primary environment.
  5. Test that your application is working correctly in the primary environment.
  6. Begin the “Conclude Recovery” process to reset the recovery environment back to its original “pilot light” state.
  7. (optional) To improve the security posture of your Arpio usage, you can delete the CloudFormation stack that was added as part of step 2.
  8. At this point your application should be running just like it was before the incident, and recovery points should be getting created again. 
  9. (optional) Perform a “Test Recovery” to gain extra confidence that you’re protected, should some other incident occur.