DR Testing Strategies with Arpio

Techniques for accessing and sending traffic to your recovery environment

Testing Disaster Recovery with Arpio

Adequately validating that your disaster recovery solution works is an essential to any BCDR program, and it can be surprisingly difficult to achieve.  Arpio can make recovering your workload easy, and can make it safe to recover your workload without negatively impacting your production environment, but Arpio doesn't know how to validate that your application is working as expected after recovery.  You will need to do that piece yourself.

This document provides guidance on the most common DR testing challenges that most customers face: accessing their recovery environment resources, and sending test traffic to the recovery environment.

Accessing Recovery Environment Resources

Arpio creates for you a high-fidelity replica of your primary environment, including all of the security controls that govern access to the resources in that environment.  For many customers, those security controls make it challenging to get into the recovery environment to validate that it is working.

AWS Connectivity Options

Most AWS customers have an established path for their employees to connect into their primary environment for the purposes of managing and troubleshooting that environment.  There are many ways this can happen – here are some of the most common.

  • Bastion Host – a bastion host is a server that can be connected to from outside of the AWS environment.  This usually takes the form of an EC2 instance that is deployed in a public subnet and accepts inbound network traffic from outside of their AWS environment.  This server is often referred to as a "jump box" as it allows somebody to establish an SSH or RDP session to that box, and then use it to jump to other servers within the environment.
  • SSM Session Manager – AWS's SSM service provides a feature that allows AWS users to connect directly to EC2 instances in their cloud environment.  This feature relies on an agent (the "SSM agent") running on the EC2 instance that enables access if the correct IAM permissions are in place.  Once the agent is running, users can connect directly to the EC2 instance through the AWS console, or through the AWS command line client, even if the server is secured in a private network.
  • "Client VPN" – many AWS users deploy a virtual private network (VPN) solution within their AWS environments, and then configure end user workstations with a VPN client that allows them to connect to this environment.  This is referred to as a "client VPN" because of the need to run this client on the user's workstation.  AWS provides a client VPN solution as part of the VPC service, or you may choose to run an EC2-based VPN based on software like OpenVPN.
  • "Site-to-Site VPN" – In many cases, organizations set up a persistent VPN connection between their workplace or datacenter and their AWS environment.  These VPN solutions eliminate the need to run client software on the user's workstation.  They can connect directly to resources in AWS just as they connect to resources in their local environment because the AWS network is bridged with the local network.  AWS provides a site-to-site VPN capability as part of the VPC service, or you may choose to run a third party site-to-site solution on an EC2 instance.
  • Direct Connect – AWS provides a service called "Direct Connect" to connect on-prem and datacenter environments to AWS over private network links (not part of the public internet).  This service provides network transport between AWS and non-AWS environments.  It is common that a site-to-site VPN solution is also used over this network transport for additional security.

Connecting to Recovery Environments

In many scenarios, connectivity to your disaster recovery environment will mirror the connectivity you have established with your primary environment, but this is not always the case.  Some connectivity options only work if you've chosen to use different IP addresses in your recovery environment, which can complicate you DR strategy in new ways.  Other connectivity options require additional setup that may be inconvenient when you're initially evaluating or implementing Arpio.

Your connectivity to your recovery environment likely depends on your connectivity to the primary environment.

If Your Primary Environment Connectivity Is…

Your Recovery Environment Likely Should Be…

Bastion Host

Also a bastion host.  In fact, you can simply have Arpio replicate your primary environment bastion host to the recovery environment as part of your DR configuration.  During a test, Arpio will recover the bastion host within your recovery network, enabling the same connectivity on the recovery environment that you have in the primary environment.

Note that your recovered bastion host will have different public IP addresses.  Arpio will display those in the Arpio UI to make it easy to connect to this server.

SSM Session Manager

SSM Session Manager.  When Arpio recovers a server that is configured with the SSM agent, the agent is already resident on the recovered server.  It automatically connects to the SSM service with the correct IAM configuration to allow connectivity to that recovered machine.

Client VPN

Sometimes a client VPN, but often a bastion host to avoid the complexity of setting up and configuring a VPN.

If you are using AWS's Client VPN service in your primary environment, Arpio does not replicate the Client VPN into your recovery environment.  If desired, you can configure one yourself, and then leverage the newly configured Client VPN endpoint to connect into your environment.

If you are running a non-AWS VPN solution on an EC2 instance, you can have Arpio replicate that EC2 instance as part of your recovery process.  That server, with the VPN solution it hosts, will be restored during your DR tests.  Most VPN solutions have configuration that knows the public IP address where clients will connect, and that IP address is different on the recovered server.  You will likely need to connect to the VPN solution's administrative interface and update that IP address before you can establish VPN connectivity.

Site-to-Site VPN

Probably a bastion host, SSM Session Manager, or AWS Client VPN initially.  As your DR environment matures, you may choose to implement persistent connectivity such as another site-to-site VPN.

You should keep in mind, though, that persistent connectivity will require that your recovery environment uses different IP address blocks than your production environment.  Without this in place, your network won't know to which environment to send a specific address's traffic.  Arpio enables this scenario by allowing you to configure an alternate CIDR block for recovered VPCs, but your application may have buried IP addresses that will need to be updated during the recovery process.

Direct Connect

Same as site-to-site VPN.  Both of these are persistent connections, and they both are more difficult to set up initially and introduce the requirement of using alternate IP addresses in your recovery environment.

Testing Internet-Facing Web Applications

Many applications expose an HTTP endpoint to the public internet that can be queried to validate application functionality during a failover test.  These applications may avoid the need to connect within an environment as discussed above if the application can be validated via the exposed HTTP endpoint.

The common challenge in testing these applications is sending authentic traffic to the endpoint that Arpio has created in the recovery environment.

Typically, you want to test recovery while your primary environment is still fielding production traffic.  This means you can't update your production DNS entries to point traffic at the recovery environment, or your production users would start hitting the wrong environment.  We need to trick the computer from which the test traffic originates into sending that traffic to the recovery environment without a DNS change.

/etc/hosts Aliases

All modern operating systems include a file, commonly referred to as "/etc/hosts," where you can associate a hostname with a specific IP address.  On Linux and Mac systems, the file is located at /etc/hosts on your local disk.  On a Windows system, the file is usually located at c:\Windows\System32\Drivers\etc\hosts.

To "trick" your computer into sending traffic to the recovery environment using the production environment hostname, without actually changing the production DNS entry, you will need to add an entry to /etc/hosts.  When your system needs to resolve that hostname to an IP address, it will first check the /etc/hosts file and use the IP address you've provided there.

The format of /etc/hosts is very simple.  Each line in the file represents a unique entry comprised of 2 fields separated by spaces or tabs.  The first field is the IP address, and the second field is the hostname you are overriding.

The /etc/hosts file can only map a hostname to an IP address.  In many cases, Arpio has recovered your web application behind a load balancer or an API gateway for which Amazon has provided a hostname, and not an IP address.  You'll need to convert that hostname to an IP address before updating /etc/hosts.

Identifying the Hostname For Your Recovered Environment

 

If your application resides behind a load balancer, you are provided by Amazon with a hostname for that load balancer.  This is the hostname that you need to convert to an IP address for /etc/hosts.  Arpio will show this hostname within the Arpio console.

 

If your application resides behind an API Gateway, you must have a custom domain name for your API Gateway.  Arpio will have replicated that custom domain name for you.  To retrieve the hostname that you need to convert to an IP address, you'll need to log into the AWS Console, navigate to the API Gateway service, and select the Custom Domain Names section in the navigation bar.  Select the custom domain name for the application and locate the Endpoint configuration section of the UI.  The "API Gateway domain name" will tell you the hostname to convert to an IP address.

Converting the Hostname to an IP Address

There are many different tools that can be used to convert this hostname into an IP address. 

On Mac and Linux operating systems, the simplest tool is the "hosts" tool on the command line.

$ hosts d-3ttnter07d.execute-api.us-east-1.amazonaws.com

api.mydomain.com has IP address 1.2.3.4

On Windows the commonly used tool is nslookup.

c:\Windows> nslookup d-3ttnter07d.execute-api.us-east-1.amazonaws.com

Server:  your.dns.server.com

Address:  4.3.2.1
Non-authoritative answer:

Name:    d-3ttnter07d.execute-api.us-east-1.amazonaws.com

Address:  1.2.3.4

Editing /etc/hosts

Once you've retrieved your IP address, you can proceed to update the /etc/hosts file on your system.  Edit it with your favorite text editor, and add a new line with the IP address you just retrieved followed by the hostname you're trying to override.  You can add comments on lines following a # character.

##

# Host Database

#

# localhost is used to configure the loopback interface

# when the system is booting.  Do not change this entry.

##

127.0.0.1       localhost

# This next line is my override for DR testing

1.2.3.4     api.mydomain.com

 

After saving the file, web requests to this hostname will be sent to the recovery environment.