Introduction
A Retention Policy is an account-wide set of rules that describe how many recovery points Arpio should retain on a rolling basis. Any of the retained recovery points may then be selected when failing over or testing failover. These rules can be viewed or modified by signing into the Arpio console and navigating to Settings > Retention Policy.
Concepts
- Metadata Snapshot: a snapshot of a resource’s metadata (title, description, tags, etc.) at a given point in time. These are stored in the Arpio database.
- Data Snapshot: a snapshot of a resource’s data (RDS tables, EFS files, etc.) at a given point in time. These are stored in the recovery environment.
- Intermediate Data Snapshot: in some cases, a resource’s data cannot be copied directly from the primary account/region to the recovery account/region. In this case, an intermediate copy of the snapshot is created in the recovery region within the primary account to facilitate the copy.
- Recovery Point: the set of metadata and data snapshots for all resources in an application at a given point in time
- Retention Policy: set of rules that describe how many recovery points should be retained
Choosing the Right Retention Policy
A good retention policy balances the need for data recovery, regulatory compliance, and cost management. Arpio’s support for tiered retention rules allows for these factors to be balanced according to your business needs.
As an example, let’s say your Retention Policy is configured as follows:
-
Always retain the 12 most recent recovery points.
-
Also retain 1 recovery point per day for the 5 most recent days.
-
Also retain 1 recovery point per week for the 2 most recent weeks.
-
Also retain 1 recovery point per month for the 2 most recent months.
-
Also retain 1 recovery point per quarter for the 2 most recent quarters.
In this case, Arpio will retain up to 23 recovery points (12 + 5 + 2 + 2 + 2), and may temporarily retain two additional recovery points while cleanup is pending/processing. This allows for a fairly small number of recovery points to be retained (to save costs), while allowing for recovery to points in time going back a year (for compliance needs).
Note that the behavior of the first rule (“Always retain the N most recent recovery points”), depends on the RPO value that is set per application. Using our example value of 12, if the RPO is set to 15 minutes, each recovery point created over the last 3 hours will be retained. Conversely, if the RPO is set to 1 hour, then each recovery point over the last 12 hours will be retained.
Cleaning Up Expired Recovery Points and Snapshots
After each backup attempt (and additionally once per day), Arpio invokes a cleanup process to delete Recovery Points that no longer need to be retained according to the configured Retention Policy.
It’s important to note that cleanup runs after backup, so after each backup completes, one additional Recovery Point (and associated Data Snapshots) may exist temporarily until cleanup is complete. In rare cases, it may also be possible that backup runs twice before cleanup, in which case two additional Recovery Points will exist temporarily until cleanup completes.
At the most basic level, the cleanup process will delete all Data Snapshots that are associated with expired Recovery Points, but there are several reasons why Data Snapshots may be retained even after a Recovery Point is deleted.
1. The Data Snapshot is the most recent in sequence
For some types of resources, AWS supports creating incremental snapshots, where each new snapshot only captures the differences from the previous one. This helps to save on time, storage, and data transfer costs.
This requires that the most recent Data Snapshot be retained, otherwise AWS will revert to creating a full snapshot on the next backup attempt.
For the most part, this is already covered by retaining the latest N Recovery Points, but Arpio goes beyond this to ensure that the most recent Data Snapshot in a sequence is always retained. This allows a resource to be removed from an application and added back later without losing the benefits of incremental backups.
2. The Data Snapshot is referenced by another Recovery Point
A Data Snapshot may be referenced by more than one Recovery Point if the data for a resource didn’t change between backups, or if multiple applications reference the same resource. In this case, the Data Snapshot will only be deleted once all Recovery Points that reference the snapshot have expired.