To prevent unintended interaction with production workloads during a disaster recovery test, Arpio can block the DR environment from communicating externally.
Jump to:
- Overview
- How to enable the network sandbox
- Manage outbound access by Network Firewall service (Advanced Sandbox)
- Manage outbound access by Security Group rules (Original Sandbox)
- IP Whitelist FAQs
Overview
When it's time to test your disaster recovery setup, you will need to consider the implications of launching a full replica of your environment.
Launching a test using Arpio's test capability will turn up your recovery resources in isolation in your recovery environment, while your production environment continues to serve traffic. However, you need to ensure that the workloads under test do not interact in any conflicting manner with the production workloads.
Unless you have configured network connectivity between your primary environment and your recovery environment, your recovery environment systems cannot interact directly with your primary environment. But, if your workload actively connects outbound to resources on the internet, your recovery environment could interact with those same resources in ways that could impact your production services.
To eliminate this risk, Arpio can isolate your recovery environment and block outbound access to the internet. Inbound access is still permitted, so you can still test your systems by connecting through load balancers, bastion hosts, or other publicly-exposed resources. To do this, you will need to enable the Network Sandbox capability before running your failover test.
How to enable the network sandbox:
To enable the Network Sandbox feature, begin by launching a test for your Arpio application by clicking the “Test” button in the Arpio console. Then, click the checkbox for "Enable Network Sandbox" in the Test Recovery dialog in Arpio.
Manage outbound access by Network Firewall service (Advanced Sandbox)
When you enable this option, Arpio will create an AWS Network Firewall in the recovery environment to sit between your workloads and your Internet Gateways. All default routes (those pointing to 0.0.0.0/0) are then updated to route through the firewall instead of directly to the internet gateway.
If a VPC does not have its own Internet Gateway, but has a default route pointing to a Transit Gateway, Arpio will either create a Network Firewall in the shared VPC that egresses to the internet (if the shared VPC and Internet Gateway are part of the failover test), or will rewrite the default route in the original VPC to only allow traffic to internal/private networks.
When you conclude your test, these changes will be reverted.
Allowing Some Outbound Access via Network Firewall
If your application relies on sending traffic outside your internal network to function, Arpio allows you to add a combination of either domains or public CIDRs to an allowlist. These are then translated into Network Firewall rules to allow the specified traffic to flow out to the internet.
To enable some outbound access, first enable the network sandbox. Once enabled, you will have the option to to fill out the allowed domains and/or CIDR blocks.
- Arpio will remember your sandbox settings, including allow lists. So the next time you want to test your recovery, the values will be pre-filled for you.
- To allow all domains with a common suffix, prepend a '.' to the beginning of the domain. For example, to allow traffic to all AWS services, use ".amazonaws.com" as the allowed domain.
- Only IPv4 CIDR blocks are currently supported.
- To specify a single IP, you add the /32 suffix to the CIDR.
- If you have an application in RECOVERY TEST, and you would like to initiate a Test for a second application that shares the same target environment, the existing sandbox settings for the previously restored application cannot be changed with the second recovery test. This is to prevent conflicting, or surprising results in your recovery environment. In order to change the settings, you will need to conclude the test in the first application.
- If you want to test multiple applications with the same recovery environment, we recommend initiating both tests at the same time, with the same network sandbox settings.
Monitoring Blocked Traffic
The AWS Network Firewall(s) that Arpio creates for filtering traffic are configured to send block events to CloudWatch to allow for monitoring of blocked traffic. This can help you identify additional domains or CIDR blocks that may need to be added to the allowlist in order for your workloads to function properly during a failover test.
Arpio makes these events directly available within the Arpio console by clicking on the "See most recent events" link.
For more advanced analysis of the block events, you can also view the events directly within CloudWatch by searching the "NetworkSandboxFirewallLogs" log group.
Manage outbound access by Security Group rules (Original Sandbox)
When you enable this option, Arpio will apply a filter to all egress rules on all security groups that it replicates. This filter will reduce the scope of egress rules that reference IP addresses and prefix lists to only allow access to the internal network destinations of your Amazon VPCs. Egress destinations that overlap with your VPCs (i.e. 0.0.0.0/0) will be scoped down to internal destinations; egress destinations that fall outside of your VPCs will be eliminated entirely; and egress destinations within your VPCs will be left intact.
Because security groups offer stateful traffic filtering, outbound responses to inbound traffic are not impacted. You can still initiate the same application requests that you would expect and receive legitimate application responses.
Arpio's network sandbox will impact the ability for your application components to communicate with the AWS API. If your application needs to communicate with the AWS API during recovery tests, you can manually add VPC endpoints to the networks that Arpio has set up for you.
When you conclude your test, these changes will be reverted.
Security Group Limits
The Network Sandbox feature re-writes your egress rules on security groups. In some cases what is a single rule in your primary environment can become multiple rules in the recovery environment. Because of this, it is possible to hit the AWS-imposed limit on the number of rules allowed per security group.
If this limit is hit, Arpio will halt the recovery and raise an appropriate issue in the Arpio console. You can then navigate to the "Service Quotas” section of the AWS console, find the quotas for the Amazon VPC service, and request an increase of the "Inbound or outbound rules per security group" quota.
Allowing Some Outbound Access via Security Group rules
If your application relies on sending traffic outside your internal network to function, Arpio does allow you to whitelist specific IP addresses and CIDR blocks to grant outbound access when the network sandbox is enabled.
To enable some outbound access to specific IPs, first select enable the network sandbox. Once selected, you will have the option to to fill out the IP addresses and ranges in the dialog.
You may also select to allow traffic to Amazon S3 and/or Amazon DynamoDB IPs if your application requires access to one of these services in the recovery environment.
IP Whitelist FAQs:
- Arpio will remember your sandbox settings, including outbound access CIDR blocks. So the next time you want to test your recovery, the values will be pre-filled for you.
- To specify a single IP, you add the /32 suffix.
- Arpio will apply the CIDR blocks you specify directly to your security groups, without modifying, de-duplicating, or other complex logic.
- Like the network sandbox itself, the outbound access settings are compared with your existing security group rules. Arpio will only allow access to ports and IP addresses that allow outbound access in your primary environment.
- Only IPv4 CIDR blocks are currently supported.
- If outbound traffic is allowed for the S3 and/or DynamoDB IPs and an S3 or DynamoDB Gateway endpoint is attached to a VPC in the recovery environment, the traffic will flow over Amazon's private network. Otherwise, the traffic will traverse the public internet to reach S3 or DynamoDB.
- If you have an application in RECOVERY TEST, and you would like to initiate a Test for a second application that shares the same target environment, the existing sandbox settings for the previously restored application cannot be changed with the second recovery test. This is to prevent conflicting, or surprising results in your recovery environment. In order to change the settings, you will need to conclude the test in the first application.
- If you want to test multiple applications with the same recovery environment, we recommend initiating both tests at the same time, with the same network sandbox settings.