Arpio Lifecycle Event Notifications

An introduction to Arpio lifecycle events and examples for handling them in your recovery environments

Introduction

Arpio provides a notification system that you can use to respond to changes Arpio is making in your recovery environment.   While Arpio is updating your recovery environment, whether or not you are actively testing or recovering, it sends notifications about the work it’s doing to an EventBridge event bus in your recovery environment.  You can add rules to this event bus to initiate actions based on these notifications.  You could, for example:

  • Send an email when a critical resource has been deployed to your recovery environment
  • Update DNS entries when your application has completed recovery
  • Run scripts to start servers on EC2 instances after they have been installed in the recovery environment

Overview

When you install the Arpio CloudFormation templates, Arpio will create an EventBridge event bus in the recovery environment with the name arpio-lifecycle-events-ARPIO_ACCOUNT_ID, where “ARPIO_ACCOUNT_ID” is the Arpio account ID assigned to you. While Arpio cycles through its process of restoring your AWS applications, it generates messages and sends them to this event bus. 

Arpio Event Message Details

Arpio events come in four major types:

  • An ArpioRecoveryStart event that is sent when the recovery environment begins any update.  This will happen when a failover or test is initiated.  It will also happen when a failover or test is concluded, when a new recovery point is published (if the application is not currently failed over or under test), or when a user clicks the "Try Again" button for any recovery environment issues in the Arpio UI.
  • An ArpioResourceUpsert event that is sent when a resource is newly created or updated in the recovery environment.
  • An ArpioResourceDelete event which is sent when a resource is deleted from the recovery environment because it is not included in the specific recovery point that is being applied to the environment.
  • An ArpioRecoveryEnd event that is sent when the recovery environment update has completed.

 

Table 1 below lists the fields you can expect in an event message.

Table 1 - Arpio Lifecycle Event Message Fields

Field Name

Purpose

Valid values

Example

Type

Availability

----- AWS Standard EventBridge Message Fields -----

account

Account that generated the event. Arpio recovery software is generating these events, so this is the recovery account.

 

123456123456

String

All events

detail-type

Freeform text describing event detail

"ArpioRecoveryStart",

"ArpioResourceUpsert",
"ArpioResourceDelete",
"ArpioRecoveryEnd"

ArpioRecoveryStart

String

All events

id

AWS-generated message ID

 

ABC123-EFG456

String

All events

region

Region where the event originated.  This will be the recovery region.

Any valid AWS region

eu-west-1

String

All events

resources

ARN of the resource being replicated

A valid AWS resource ARN

arn:aws:lambda:us-east-1:123456123456:function:ApiAuthorizer

List of String 

All events. May be an empty list for some event types

source

ID of message sender

io.arpio

io.arpio

String

All events

time

Timestamp for event

Timestamp for event, in ISO 8601 format, UTC timezone

2023-06-14T01:02:03Z

String

All events

version

AWS schema version

0

0

String

All events

----- Arpio Detail Fields -----

application-restore-phase

A map with a list of the applications being replicated, and the replication phase for each app.  

 

{  "hSN9jWjSCtEY7UdRJRslwo" : "standby"

}

Map of string:string

ArpioRecoveryStart and  ArpioRecoveryEnd events 

arpio-schema-version

 

0

0

String

All events

event-id

Recovery event id that can be used to group events together

 

123456123456/ap-southeast-2/622737438086/ap-southeast-1

String

All events

primary-account

Account containing source resources being replicated

AWS account ID

123456123456

String

ArpioResourceUpsert events

primary-region

Source Region

Any valid AWS region

us-east-1

String

All events

primary-resource-arn

ARN of the resource that is being replicated

 

arn:aws:ec2:us-east-1:123456123456:dhcp-options/dopt-00b9d100e33325084

String

ArpioResourceUpsert and ArpioResourceDelete events

primary-resource-external-type-variant

The stable unique external name of this subtype of the main source resource type.

 

rdsDbCluster

String

Only available for a few resources, such as RDS DB Clusters

primary-resource-id

The id field of the primary resource

 

dopt-00b9d100e33325084

String

ArpioResourceUpsert events

primary-resource-name

The name field of the resource being replicated.  

 

MyLambdaFunction

String

ArpioResourceUpsert and ArpioResourceDelete events

recovery-account

Account used for recovery.

AWS account ID

654321654321

String

All events

recovery-resource-arn

ARN of the replicated resource

 

arn:aws:ec2:us-east-2:654321654321:instance/i-0f362e20d6a37f31b

String

ArpioResourceUpsert events

recovery-resource-external-type-variant

The stable unique external name of this subtype of the main replicated resource type.

 

rdsDbCluster

String

ArpioResourceUpsert and ArpioResourceDelete events

recovery-resource-id

ID field of the replicated resource

 

i-0f362e20d6a37f31b

StringArpio

ArpioResourceUpsert events

recovery-resource-name

   

MyLambdaFunction

String

ArpioResourceUpsert events

recovery-region

Region where the recovery environment is

Any valid AWS region

us-east-2

String

All events

recovery-resource-state

Current status of restoration attempt for resource

"done", "failed", "processing"

failed

String

ArpioResourceUpsert events

restore-phase

Replication phase of resource

"standby", "failover_test", 

"failover"

failover

String

ArpioResourceUpsert events

resource-type

Describes the kind of resource being replicated.

An AWS resourcetype defined in AWS Config Supported Resource Types

AWS::EC2::Instance

List of String

ArpioResourceUpsert and ArpioResourceDelete events

 

Example Arpio Lifecycle Notification message

{
  "account": "123456123456",
  "detail": {
    "application-restore-phase": {
      "gRM8TmEKCtEY7Ud8J8ride": "standby"
    },
    "arpio-schema-version": "0",
    "event-id": "4307F5SsmLifecycle_1231_us-east-1_1231_us-east-2_WTW6jd7yz55lYc1hK",
    "primary-account": "123456123456",
    "primary-region": "us-east-1",
    "primary-resource-arn": "arn:aws:ec2:us-east-1:123456123456:instance/i-0b6084f517f2024fa",
    "primary-resource-external-type-variant": "i-0b6084f517f2024fa",
    "primary-resource-id": "i-0b6084f517f2024fa",
    "primary-resource-name": "ProdEc2",
    "recovery-account": "654321654321",
    "recovery-region": "us-east-2",
    "recovery-resource-arn": "arn:aws:ec2:us-east-2:654321654321:instance/i-0235a61b5b158a0b3",
    "recovery-resource-external-type-variant": "i-0b6084f517f2024fa",
    "recovery-resource-id": "i-8009bfbb5bee5ad33",
    "recovery-resource-name": "ProdEc2Recovery",
    "recovery-resource-state": "done",
    "resource-type": [
      "AWS::EC2::Instance"
    ],
    "restore-phase": "failover"
  },
  "detail-type": "ArpioResourceUpsert",
  "id": "7cecb50d-06e5-536a-127f-ccb365b2f47d",
  "region": "us-east-2",
  "resources": [
    "arn:aws:ec2:us-east-1:123456123456:instance/i-0b6084f517f2024fa"
  ],
  "source": "io.arpio",
  "time": "2023-08-31T16:57:35Z",
  "version": "0"
}

Recovery resources may be deleted during an ArpioResourceUpsert Event

As part of updating a recovery environment, Arpio may remove resources. For example, when you conclude a failover, resources that are not maintained during standby will be removed from the recovery environment to save costs. Arpio will report changes, including a delete, to these resources as an ArpioResourceUpsert event.

Cleaning up the Arpio recovery CloudFormation stack

If you attach rules to the arpio-lifecycle-events event bus and try to remove the Arpio CloudFormation stack on your recovery environment, CloudFormation will be unable to delete the event bus until you manually remove the rules attached to that event bus

Message changes and Backward compatibility

Arpio may add new fields to the event message at any release.   If existing fields are removed or the type of existing fields changes, Arpio will update the Arpio schema version in the message so that customers know to process the messages differently and will send both old and new versions of the message. 

Arpio will notify customers in advance of the messaging schema changes.  If customers aren’t ready to process the new schema, they can then set up a rule that ignores messages with a future schema version.

Example scenario:  Run a script on a recovery EC2 instance


Let’s say that you need to install some special software and run a script to configure that software on an EC2 instance after Arpio has recovered the EC2 instance during failover. 

We’ll assume that all of your EC2 instances are Linux instances, and that you want to run the same command on each instance after it’s installed in the recovery environment.  We’ll also assume that the SSM agent is installed on your EC2 instances in the primary environment (and therefore will be present on the recovered server).

The general approach is to create an event bus rule that selects messages sent to your Arpio event bus, and then add a target to the rule that updates your system.

Step 1: Create System Manager Automation Runbook


First, create a System Manager runbook document in your RECOVERY environment that runs a shell command on a specified EC2 instance.

  • Go to AWS Systems Manager → Shared Resources → Documents
  • Choose "Create document"  and create an Automation document.
An example Automation runbook is below.  This runbook accepts a Role for running the automation and an instance ID to run it on as arguments, and then appends a text string “Hello at time” with the current time to a file in the EC2 instance temporary directory.   Wrapping the runCommand in an automation runbook allows you to specify the EC2 instance ID at runtime.
description: Run a shell script that appends the time to a file on a given instanceId
schemaVersion: '0.3'
assumeRole: ''
parameters:
InstanceId:
type: String
description: (Required) ID of EC2 Instance to change standby state for within ASG
AutomationAssumeRole:
default: ''
type: String
description: (Recommended) The ARN of the role that allows Automation to perform the actions on your behalf.
mainSteps:
- name: runShellCommand
action: 'aws:runCommand'
maxAttempts: 3
timeoutSeconds: 60
onFailure: Abort
inputs:
DocumentName: AWS-RunShellScript
InstanceIds:
- ''
Parameters:
commands:
- date -u +"Hello at time %Y-%m-%d %H:%M:%S" >> /tmp/hello.txt

Step 2: Create Role for SSM Automation Runbook

You’ll also need a role that allows the EventBridge service to execute the runbook.  The role should have the AmazonSSMAutomationRole permission policy, and it should have trust relationships that allow both Events and SSM services to assume the role, like this:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": [
"events.amazonaws.com",
"ssm.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}

Call this role SsmRunCommand.

Step 3: Create a Rule and Target to Process the Event

Next, create a Rule on the RECOVERY environment on the arpio-lifecycle-events event bus  that filters Arpio events where the recovery EC2 instances have been created during failover.

  • Go to Amazon EventBridge → Rules and select the arpio-lifecycle-events event bus.  
  • Choose “Create rule”
  • Enter a name and description for the Rule that makes sens to you, and make sure Rule Type “Rule with an event pattern” is selected.   Click Next.
  • For Event Source, choose “Other”
  • Leave the optional Sample event area alone.
  • For Create method, select “Custom pattern (JSON editor)” and enter the following pattern for the Event pattern:
{
"source": [
"io.arpio"
],
"detail-type": [
"ArpioResourceUpsert"
],
"detail": {
"resource-type": [
"AWS::EC2::Instance"
],
"restore-phase": [
"failover"
]
}
}

This pattern selects Arpio events for EC2 instance creation (upsert) during the failover phase.  

  • Click “Next” and create a Target for the rule.  
  • Leave “AWS” as the target type, and select “Systems Manager Automation” as the target
  • Select the automation runbook you created in above.
  • Switch “Configure automation parameters” to “Input Transformer”
  • Set the input path to extract the recovery EC2 instance ID from the Arpio message:
{
"instance": "$.detail.recovery-resource-id"
}
  • Then set the input template to convert the extracted EC2 instance ID into arguments to the Automation runbook:
{"InstanceId": [<instance>]}
  • Finally, set the execution role option to “Use existing Role”, and select the “SsmRunCommand” role you created earlier.

Your system is now ready to run a script as soon as Arpio replicates EC2 instances in your recovery environment.  If you log onto the EC2 instance shortly after it’s been created by Arpio, you should see a file named /tmp/hello.txt created.  Inside the file, there should be a line with text “Hello at time” and a recent timestamp.