Configure the location of customer-managed Redis snapshots that Arpio should use to seed a recovered ElastiCache cluster
arpio-config:snapshot-location = <recovery-environment-bucket>/<prefix>/
<recovery-environment-bucket>
The name of an S3 bucket in the recovery environment that contains .rdb snapshot files that can be used to seed the tagged cache cluster when it is recovered. The bucket must be in your application's recovery account and region.
<prefix>
The prefix inside the bucket where Arpio should look for timestamp-formatted folders containing .rdb files to use during recovery. The prefix is the folder one level above the folders with timestamp names. You can use prefixes to keep snapshots for different cache clusters separate in the same bucket.
The prefix is optional. If one is not present, Arpio will look for timestamp-formatted folders in the root of the bucket.
The trailing slash in a non-empty prefix is optional but recommended. If it's not present, Arpio will add one when it searches for timestamp-formatted folders under your prefix.
Supported Resources
- ElastiCache Replication Group (Redis OSS)
- ElastiCache Cache Cluster (Redis OSS)
Description
You can use the arpio-config:snapshots-location tag to direct Arpio to seed recovered ElastiCache cache clusters and replication groups with data from .rdb files you maintain in your recovery environment. You should store the .rdb snapshot files inside a folder named for the date and time that the snapshots were created, using a supported timestamp format. When restoring cache resources with this config tag, Arpio selects the appropriate set of files from the most appropriate timestamp-formatted folder at this location.
If you don't use this config tag, Arpio recovers cache clusters and replication groups without initial data.
Arpio doesn't create .rdb files in your primary environment or replicate them to your recovery environment. To use this config tag effectively, you should automate the process of exporting .rdb files to S3 in the primary environment and copying them to the configured recovery environment S3 bucket using a supported timestamp-formatted folder structure.
Refer to the AWS ElastiCache documentation section Exporting a backup for help on exporting .rdb files. You can use Arpio to replicate the bucket containing exported .rdb files into your recovery environment, or use another method to replicate them.
Recovery Bucket Permissions
Both Arpio and the ElastiCache service need permission to read .rdb snapshots from the bucket you specify in this config tag. Arpio's IAM permissions normally grant it the permissions it needs to access this bucket, but you'll need to manually grant permissions to the ElastiCache service using a bucket policy (recommended) or ACLs.
Follow the instructions in Step 4: Grant ElastiCache read access to the .rdb file to grant the appropriate permissions to the ElastiCache service. Although the ElastiCache documentation suggests using ACLs for non-opt-in regions, bucket policies work in these cases as well, and are generally preferred to ACLs.
The following example bucket policy grants permissions to the elasticache-snapshot service in us-east-2. If you use this policy, customize the region and resource names (buckets) for your environment:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowElastiCacheSnapshotRead",
"Effect": "Allow",
"Principal": {
"Service": "us-east-2.elasticache-snapshot.amazonaws.com"
},
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:GetBucketAcl"
],
"Resource": [
"arn:aws:s3:::mycompany-cache-backups",
"arn:aws:s3:::mycompany-cache-backups/*"
]
}
]
}
How Arpio Selects Snapshots to Restore
When the arpio-config:snapshot-location tag is present on a supported resource during recovery, Arpio looks inside the configured location (bucket and prefix) for folders that are named like timestamps. It parses all of these timestamps and selects the one whose value is closest to, but not after, the timestamp of the recovery point being restored. Then it looks inside that folder for .rdb files, and if a file exists for each cache node in the cluster, it instructs ElastiCache to use those .rdb files to seed the cache cluster.
How Arpio Handles Missing Buckets or Snapshots
If the tag value cannot be used to locate .rdb files because the bucket is no longer available, or because the prefix contains no timestamp-formatted folders, or because of permissions problems, Arpio restores the cluster without seeded data and notifies you with an issue.
Creating the cache without data allows resources that depend on the cluster to continue to create. In these error cases, you can manually add your data from another backup location after the recovery finishes, or make adjustments to your bucket contents or permissions and follow the instructions in the issue to restart the recovery and use the bucket's data next time.
Timestamp Formats
The folders inside your configured location (bucket and prefix) should have names that express the date and time their snapshots were created in a supported format. This convention lets Arpio pick the best folder to use to seed cache cluster data automatically.
Arpio uses the dateutil library to parse timestamps from folder names. You can use any format supported by the library, but we recommend you pick one like RFC 3339 or ISO 8601, with a precision that's appropriate for your backup frequency. These formats are recommended because they represent years, months, and days unambiguously, sort naturally, can contain time zone information, are well-documented, and are supported by most software tools.
Arpio ignores folders with names that can't be parsed as timestamps. This means you can store additional files beside your backups in these buckets without interfering with the restore process.
Time Zones in Timestamps
Arpio parses time zone information from timestamp-formatted folders. If no time zone information is present, the timestamp is assumed to be in UTC.
RDB Files
After Arpio selects the timestamp-formatted folder that matches the restored recovery point, it checks that the correct number of .rdb files are present. If the number of .rdb files in a folder doesn't match the number of node groups (shards) in a replication group, Arpio reports an issue and recovers the cluster without seed data. For cache clusters without a replication group, Arpio checks that there is one .rdb file present.
After Arpio confirms the count of .rdb files is correct, it sorts the files by name alphanumerically, and passes the list of those keys to the AWS API as the SnapshotArns request parameter when it calls CreateReplicationGroup or CreateCacheCluster.
Arpio ignores files in timestamp-formatted folders that don't end in .rdb, so you can store other files beside your backups in these buckets.
Example
arpio-config:snapshot-location = mycompany-cache-backups/prod-cluster/
In this example, the config tag directs Arpio to restore .rdb snapshots from the mycompany-cache-backups bucket in the recovery environment. Arpio will look inside the prod-cluster/ for folders with timestamps as names to locate the appropriate set of .rdb files to restore.
Assume the bucket contains the following keys:
prod-cluster/2024-07-10T12:00:48Z/prod-cluster-0001.rdb
prod-cluster/2024-07-10T12:00:48Z/prod-cluster-0002.rdb
prod-cluster/2024-07-10T12:00:48Z/prod-cluster-0003.rdb
prod-cluster/2024-07-11T12:00:16Z/prod-cluster-0001.rdb
prod-cluster/2024-07-11T12:00:16Z/prod-cluster-0002.rdb
prod-cluster/2024-07-11T12:00:16Z/prod-cluster-0003.rdb
prod-cluster/2024-07-12T12:00:22Z/prod-cluster-0001.rdb
prod-cluster/2024-07-12T12:00:22Z/prod-cluster-0002.rdb
prod-cluster/2024-07-12T12:00:22Z/prod-cluster-0003.rdb
These keys represent backups made on 3 consecutive days for a replication group named prod-cluster, which has 3 node groups. There are three unique timestamp-formatted folders at the configured prefix (prod-cluster/), one for each day backups were made:
prod-cluster/2024-07-10T12:00:48Z/
prod-cluster/2024-07-11T12:00:16Z/
prod-cluster/2024-07-12T12:00:22Z/
When you restore the prod-cluster replication group with Arpio, you choose a recovery point to restore to. Each recovery point has a timestamp, and in our example, let's assume it's 2024-07-11 15:00:00 UTC. This time is after the first two folders' parsed timestamp values, but before the third. Arpio chooses the prod-cluster/2024-07-11T12:00:16Z/
folder because its timestamp value is closest to the recovery point timestamp, but not newer than it.
Next, Arpio looks for .rdb files inside that folder. It finds the following files:
prod-cluster/2024-07-11T12:00:16Z/prod-cluster-0001.rdb
prod-cluster/2024-07-11T12:00:16Z/prod-cluster-0002.rdb
prod-cluster/2024-07-11T12:00:16Z/prod-cluster-0003.rdb
Since there are 3 files, one for each node group in the replication group, Arpio uses these three files to seed the clusters with data when it restores them.