Amazon EKS Resource Replication with Arpio
Jump to:
- EKS Cluster
- EKS Add-ons
- EKS Fargate Profiles
- EKS Nodegroups
- Kubernetes Resources (v2)
- Supported Integrations
- Method of Operation
EKS Cluster
Arpio replicates EKS Clusters hosted inside AWS to your recovery environment. Outpost clusters are not supported.
The following EKS cluster attributes are translated when a cluster is replicated into the recovery environment:
Attribute |
Translation |
EncryptionConfig Logging Name NetworkConfig OutpostConfig Tags VPC Config Private Access VPC Config Public Access VPC Config Public Access CIDRs Version |
These attributes are replicated to your recovery environment without translation. |
Role ARN |
Translated to role in recovery environment |
VPC Config Subnet IDs |
Translated to IDs from translating underlying VPC |
VPC Config Security Group IDs |
Translated to IDs from translating referenced SG ID |
The following internal Kubernetes attributes are translated when a cluster is replicated into the recovery environment:
Internal Attribute |
Translation |
IAM Roles in aws-auth configmap2 |
Translated to equivalently created roles in recovery environment |
ECR Images owned by primary accounts |
Translated to replicated ECR repositories in recovery environment |
ECR Images owned by other accounts |
Replicated to recovery environment without translation |
DockerHub and other external Image references |
Replicated to recovery environment without translation |
eks.amazonaws.com/role-arn annotations12 |
Pointed to mirrored IAM roles |
EFS (and AP IDs)/EBS Volume IDs in PersistentVolumes1 |
Translated to replicated EFS (and AP)/EBS volumes |
alb.ingress.kubernets.io annotations12 |
Translated to relevant ARNs/IDs in target |
Secret objects12 |
Encrypted in source account to target account key. Identifiers found in source will be translated. |
AWS Resource hostnames |
FQDNs that match other included resources are translated to target names |
AWS Resource Identifiers1 |
Unique resource IDs belonging to reproduced resources are translated. |
AWS Region ID references |
IDs that are not part of a larger string are translated from source to target region |
AWS Account IDs |
Stand alone references to accounts included in the source side of restores will be translated to the target of that sync pair |
AWS Availability Zone Ids/Names |
|
Bare AWS ARNs2 |
Translated if referenced resource is also recovered |
- Base shared IDs along with account generic references will be translated.AWS Resource Identifiers are known unique strings that are part of a resource’s state. Any of those that are unique qualify for translation most anywhere in the cluster resource configuration.
- Bare ARNs are translated to match recovered resources in the cluster resource configuration.
EKS Add-ons
Because Arpio backs up and restores all resources within the EKS cluster, it is not necessary to separately restore add-ons, since the results of the add-ons being added in the primary environment are captured within the cluster resources.
This ensures that the versions of all resources running in the recovery environment match the version that was running in the primary environment.
EKS Fargate Profiles
The following attributes are translated when a Fargate profile is replicated into the recovery environment:
Attribute |
Translation |
Name Selectors Tags |
These attributes are replicated to your recovery environment without translation. |
Pod Execution Role |
Translated to role in recovery environment |
Subnets |
Translated to IDs from translating underlying VPC |
EKS Nodegroups
The following attributes are translated when a nodegroup is replicated into the recovery environment:
Attribute |
Translation |
AMI Type Capacity Type Disk Size Instance Types Labels Name Release Version Scaling Config Tags Taints Update Config Version |
These attributes are replicated to your recovery environment without translation. |
Launch Template |
Translated to Target instance of LTs |
Remote Access Security Groups |
Translated to equivalent security groups in recovery environment |
Remote Access SSH Key |
Untranslated, but will cause a failure if a key named the same in the target does not exist. |
Subnets |
Translated to IDs from translating underlying VPC |
Kubernetes Resources (v2)
In the initial version of Arpio’s EKS support, the entire cluster was reproduced. Our second major iteration on EKS supports selecting a subset of the cluster’s resources by a combination of namespace selection and the ability to reference resources by virtual tags.
- k8s:kind allows selection of individual kinds
- k8s:namespace selects a namespace
- k8s:label a specific label
- k8s:name selects resources matching the supplied name
Each of these tag types (and more) can be arranged in AND/OR tag rules to carve out very broad or narrow selections as are warranted.
The presentation of cluster internal resources in the application settings page is presently limited to the namespace level to ensure navigation around the other resources isn’t detracted from. On the default application view, however, each kubernetes resource in the latest recovery point will be shown.
This also allows Arpio to more quickly bring up or adopt resources in several scenarios. Instead of waiting for everything a cluster depends upon to be reproduced in the target environment before configuring the cluster, the dependencies are more tightly coupled to what is actually required to be up. For example: A Deployment referring to an RDS database will not be restored until that RDS instance is up, but everything else supporting the Deployment will have been brought up as those resources’ dependencies have been made available. This allows problematic resources to not halt the entire cluster configuration, just the pieces that depend upon it.
Internal dependencies are used in much the same way as other dependencies in Arpio. Unfortunately, some of these dependencies are implicit or exist only in code or documentation as opposed to direct references like most AWS resources allow. An example would be service accounts on workloads. If it is set, we look for that specific SA name, otherwise there’s a ‘default’ SA in that namespace.
Supported Integrations
Arpio will mirror IAM OIDC Provider configurations and the appropriate role configurations within a cluster such that controllers using/controlling AWS services will work.
Method of Operation
Arpio's Kubernetes delegate uses the official Kubernetes Python client's dynamic API to discover and replicate cluster resources available at the API level. One delegate function is created per Arpio application to access that application's public access clusters. One additional delegate function is created and attached to Amazon VPC subnets and the cluster security group for each private-only cluster to ensure connectivity to the Kubernetes control plane.
By using this dynamic API, Arpio can discover and protect most cluster resources automatically, even in versions of Kubernetes released after the Arpio Kubernetes delegates were configured. Most integrations should be replicated faithfully, but some may require additional introspection or translation.
Please contact us at support@arpio.io if any translated resources aren’t handled correctly.
In the primary environment, Arpio requires permissions to call DescribeCluster, the KMS permissions required to encrypt the sensitive data, sign the whole resource set, read and write to a scratch S3 bucket, and maintain the ENIs required for delegate Lambda functions attached to VPCs.
In the recovery environment, Arpio requires permissions to create new EKS clusters, the KMS permissions to validate the signature on and decrypt the configuration, read from the S3 bucket in which Arpio stores your cluster configurations, and read and write to a different scratch S3 bucket.
These primary and recovery environment permissions are configured for you automatically by the CloudFormation access stacks you create when you configure an Arpio application.
Private Cluster Networking
Arpio's Kubernetes delegate function needs to be able to contact the Amazon S3 service to get encrypted copies of configuration data in and out of your cluster. If the subnets your EKS cluster is in do not have Internet access, you can use PrivateLink endpoints to allow the delegate function to access Amazon S3 without opening your cluster to the Internet.
Configure the following endpoint types in VPC subnets containing private EKS clusters protected by Arpio:
- S3 VPC Interface Endpoint - Private dns entries required, ie - s3.<region>.amazonaws.com
Network Sandbox Considerations
Arpio’s network sandbox limits the ability of an EKS cluster’s ability to retrieve container images. To make this easier, Private ECR repositories with the appropriate ECR VPC Endpoints in the hosting VPC permits access very cleanly. Some built-in and popular images are hosted in public.ecr.aws but backed by dynamic cloudfront hostnames. Allowing traffic to those dynamic locations can be difficult. Moving these containers into ECR Pull Through Caches and referring to them from there is recommended.
Target Region Considerations
As of 2024-06-26, these are the disallowed AZs for EKS:
AWS Region | Region name |
Disallowed Availability Zone IDs |
us-east-1 | US East (N. Virginia) | use1-az3 |
us-west-1 | US West (N. California) | usw1-az2 |
ca-central-1 | Canada (Central) | cac1-az3 |
Arpio will attempt to place EKS Clusters in other availability zones in these regions when it is possible, but other resource restrictions are also included in these calculations. One example is AutoScaling Groups that are providing compute to your cluster will avoid AZs where specified instance types aren’t available.
Ensuring your target region has capacity for your whole, connected, workload is recommended.
Restoration Tiering
The previous iteration of EKS support used helm chart Kind ordering and retries to reproduce the resources in a cluster. The second iteration (v2) splits the restoration into tiers as well.
Presently the tiers are as follows:
- Base networking/access control apps (kube-proxy, aws-node, eni-configs, eks-pod-identity-agent)
CRDs, Namespaces, ClusterRoles
aws-auth ConfigMap - Karpenter & kube-dns
- istio, linkerd, aws load balancer controller
- Everything else
Compute resources are started after Tier 1, to ensure the permissions and other configuration is done prior to nodes attempting to join are capable of doing so. Other service meshes or compute management systems are yet to be supported, but can be.
v2 Cluster Onboarding Process
If you’re protecting a cluster under v1, the migration of your account to being v2 capable will keep that cluster selection, but offer the namespaces available as well. Since the entire cluster was selected in v1, everything that “supports” that will be selected, pulling in all of the namespaces like v1. To limit the application to a subset, de-select the cluster and select the appropriate namespaces or use tag rules to make your selections.
If Arpio does not yet have access to your cluster, it will not be able to discover the namespaces to offer for selection. In this case, select the cluster to ensure Arpio notifies you about the lack of permissions. In that issue, it will provide the commands to ensure permissions are available.
Once that access entry has been created, Arpio will start discovering the resources in the cluster and can then allow namespace or tag rule selection.