Configuring Disaster Recovery for Managed AD Environments

This document describes configuring a multi-region directory in conjunction with leveraging Arpio to implement cross-region and cross-account recovery for AWS workloads.

If you’re using Amazon’s Directory Service to host a managed Active Directory in your environment, a traditional backup/restore DR strategy is not an option.  Instead, you’ll need to leverage Directory Service’s multi-region replication feature to provide a recovery solution in the event of an AWS outage.  If your DR strategy also needs to enable recovery from a cyber event such as an AWS account compromise, you will also need to consider hosting your directory in a dedicated AWS account and managing access there independently of your other AWS accounts.

This document describes configuring a multi-region directory in conjunction with leveraging Arpio to implement cross-region and cross-account recovery for AWS workloads.

Note: Multi-region replication for Amazon’s Directory service requires that your Microsoft AD directory utilizes the Enterprise Edition license.

Overall Approach

In this procedure, we will use Arpio to replicate Windows systems running in EC2, as well as all of their surrounding infrastructure, into an alternate region.  The recovery VPC that Arpio creates in this alternate region will utilize a different CIDR block than the original VPC, so that it can be networked with the same directory service as the primary VPC.  Once the VPC is created, we will configure a multi-region directory to connect the recovery VPC to a replicated directory server.

Steps:

  1. Add an arpio-config:renumber tag to the primary environment VPC, instructing Arpio to use a different CIDR block when it creates the recovery environment.  Managed AD will require all connected VPCs to utilize a unique CIDR block.
  2. Configure a new application in Arpio to replicate your environment to an alternate location.  
    1. Ensure you are replicating at least 2 of the subnets in which you have managed Active Directory endpoints.  AWS will require a minimum of 2 subnets in a later step.
  3. Wait for a full backup cycle to run.  The initial cycle can take a couple of hours for large workloads.  After it’s complete, the Arpio console will show you the identities of the VPC and subnets that Arpio has created in the recovery environment.
  4. In the AWS console, navigate to the Directory Service and locate the managed Active Directory that your workload depends upon.  Navigate to the Multi-region replication setup for that directory and choose to add a region.  If you do not see this feature, you may need to upgrade your directory to the Enterprise Edition.
  5. In the Add Region screen:
    1. Select the region where Arpio has replicated your environment.  
    2. For the VPC, choose the VPC that Arpio has created for you in the replicated region (the ID is in the Recovery Resource column of the Arpio console.  It will also have the same Name tag as the primary environment VPC).  
    3. For the subnets, select at least 2 of the subnets that Arpio has replicated for you in this region.
    4. Click the “Add” button.
  6. Wait patiently while AWS creates a new directory in the recovery region.  In our experience, this may take hours.

At this point, you have a pilot-light recovery environment that includes a managed Active Directory that is replicating in real-time from your primary environment.

Handling IP Addresses for Domain Controllers

Windows hosts query DNS to find their domain controllers (DNS is often hosted on the domain controllers itself).  You’ll need your Windows hosts to know or discover the correct IP addresses for the DNS servers in the recovery environment in order to locate the domain controllers and authenticate logins.

There are 3 options for configuring DNS knowledge/discovery to make this work:

  1. Fixed DNS IP addresses on each host - when launching a new Windows host in your network, if you choose the option to connect the host to a managed Active Directory, the host’s DNS will be configured with the specific IP addresses of the directory service.  When you failover to the recovery environment, those IP addresses will not be correct.  You’ll need to connect to the server with a local admin account and update the IP address to the DNS endpoints for the recovery region’s directory service.  This may be a lot of manual reconfiguration during a recovery.
  2. Dynamic IP address configuration via DHCP – instead of fixed DNS entries on each Windows host, you can configure DNS via DHCP.  On each host’s DNS setup, select the option to “Obtain DNS server address automatically” so that the host will utilize the DNS endpoints in your DHCP Options Set. Finally, utilize the arpio-config:domain-name-servers tag to override the DNS settings in the recovery environment in order to have recovered servers connect to the Active Directory servers for DNS.
  3. AmazonProvidedDNS and Route53 Resolver – all VPCs contain an implicit DNS host backed by Route53 and served off of the second IP address in the VPC’s CIDR block (i.e. 10.0.0.2).  You can configure your DHCP Options Set in your primary VPC to use this AmazonProvidedDNS as the DNS server advertised via DHCP.  You then configure Route53 Resolver to forward DNS requests for your AD domain to the DNS IP addresses of your directory.  When servers launch, they’ll discover the AmazonProvidedDNS via DHCP, and then discover the domain controllers when querying those DNS endpoints.  You’ll need to configure Route53 Resolver in both the primary environment and the recovery environment, pointing it to the corresponding DNS addresses in each environment.

Running a Test

Testing this environment is identical to running any test in Arpio.  When you’re ready, click the Test button in the Arpio console and select the recovery point you want to test.  Arpio has already created the network infrastructure, security settings, and other infrastructure that can be created in advance.  Arpio will now spin up EC2 instances, RDS databases, load balancers, and other infrastructure that costs money.

Once your Windows servers are running, you can connect to them as you would connect to the corresponding servers in your primary environment.  The same user credentials that you use for the primary environment servers will apply to the recovered servers.  Note, however, that the servers will have different IP addresses, because we had to apply a different CIDR block to the recovery environment VPC.

Mitigating Cyber-Threats

AWS’s Directory Service does not enable cross-account replicas, which means you can’t leverage the security boundaries of different AWS accounts to protect your workload from an account compromise or attack on AD.  

To mitigate this threat, the best practice is to operate the Directory Service in a dedicated AWS account, apart from your other workloads.  This dedicated account can implement the multi-region replication described above to ensure the directory is resilient to a major AWS outage.  And because this account is dedicated, it can provide significantly reduced access from your primary AWS account.

To configure this, create a new account and a multi-region directory within that new account.  Then share the directory in your primary region with your primary account.  Similarly, share the directory in your recovery region with your recovery account.