Amazon Elastic Kubernetes Service (Amazon EKS)

Amazon EKS Resource Replication with Arpio

Jump to:

EKS Cluster

Arpio replicates EKS Clusters hosted inside AWS to your recovery environment.  Outpost clusters are not supported.

The following EKS cluster attributes are translated when a cluster is replicated into the recovery environment:

Attribute

Translation

EncryptionConfig

Logging

Name

NetworkConfig

OutpostConfig

Tags

VPC Config Private Access

VPC Config Public Access

VPC Config Public Access CIDRs

Version

These attributes are replicated to your recovery environment without translation.

Role ARN

Translated to role in recovery environment

VPC Config Subnet IDs

Translated to IDs from translating underlying VPC

VPC Config Security Group IDs

Translated to IDs from translating referenced SG ID

 

The following internal Kubernetes attributes are translated when a cluster is replicated into the recovery environment:

Internal Attribute

Translation

IAM Roles in aws-auth configmap2

Translated to equivalently created roles in recovery environment

ECR Images owned by primary accounts

Translated to replicated ECR repositories in recovery environment

ECR Images owned by other accounts

Replicated to recovery environment without translation

DockerHub and other external Image references

Replicated to recovery environment without translation

eks.amazonaws.com/role-arn annotations12

Pointed to mirrored IAM roles

EFS (and AP IDs)/EBS Volume IDs in PersistentVolumes1

Translated to replicated EFS (and AP)/EBS volumes

alb.ingress.kubernets.io annotations12

Translated to relevant ARNs/IDs in target

Secret objects12

Encrypted in source account to target account key.  Identifiers found in source will be translated.

AWS Resource hostnames

FQDNs that match other included resources are translated to target names

AWS Resource Identifiers1

Unique resource IDs belonging to reproduced resources are translated.  

AWS Region ID references

IDs that are not part of a larger string are translated from source to target region

AWS Account IDs

Stand alone references to accounts included in the source side of restores will be translated to the target of that sync pair

AWS Availability Zone Ids/Names

 

Bare AWS ARNs2

Translated if referenced resource is also recovered

  1. Base shared IDs along with account generic references will be translated.AWS Resource Identifiers are known unique strings that are part of a resource’s state. Any of those that are unique qualify for translation most anywhere in the cluster resource configuration. 
  2. Bare ARNs are translated to match recovered resources in the cluster resource configuration.

EKS Add-ons

Because Arpio backs up and restores all resources within the EKS cluster, it is not necessary to separately restore add-ons, since the results of the add-ons being added in the primary environment are captured within the cluster resources.

This ensures that the versions of all resources running in the recovery environment match the version that was running in the primary environment.

EKS Fargate Profiles

The following attributes are translated when a Fargate profile is replicated into the recovery environment:

Attribute

Translation

Name

Selectors

Tags

These attributes are replicated to your recovery environment without translation.

Pod Execution Role

Translated to role in recovery environment

Subnets

Translated to IDs from translating underlying VPC

EKS Nodegroups

The following attributes are translated when a nodegroup is replicated into the recovery environment:

Attribute

Translation

AMI Type

Capacity Type

Disk Size

Instance Types

Labels

Name

Release Version

Scaling Config

Tags

Taints

Update Config

Version

These attributes are replicated to your recovery environment without translation.

Launch Template

Translated to Target instance of LTs

Remote Access Security Groups

Translated to equivalent security groups in recovery environment

Remote Access SSH Key

Untranslated, but will cause a failure if a key named the same in the target does not exist.

Subnets

Translated to IDs from translating underlying VPC

 

Kubernetes Resources (v2)

In the initial version of Arpio’s EKS support, the entire cluster was reproduced.  Our second major iteration on EKS supports selecting a subset of the cluster’s resources by a combination of namespace selection and the ability to reference resources by virtual tags.

  • k8s:kind allows selection of individual kinds
  • k8s:namespace selects a namespace
  • k8s:label a specific label
  • k8s:name selects resources matching the supplied name

Each of these tag types (and more) can be arranged in AND/OR tag rules to carve out very broad or narrow selections as are warranted. 

The presentation of cluster internal resources in the application settings page is presently limited to the namespace level to ensure navigation around the other resources isn’t detracted from.  On the default application view, however, each kubernetes resource in the latest recovery point will be shown.  

This also allows Arpio to more quickly bring up or adopt resources in several scenarios.  Instead of waiting for everything a cluster depends upon to be reproduced in the target environment before configuring the cluster, the dependencies are more tightly coupled to what is actually required to be up.  For example:  A Deployment referring to an RDS database will not be restored until that RDS instance is up, but everything else supporting the Deployment will have been brought up as those resources’ dependencies have been made available.  This allows problematic resources to not halt the entire cluster configuration, just the pieces that depend upon it.

Internal dependencies are used in much the same way as other dependencies in Arpio.  Unfortunately, some of these dependencies are implicit or exist only in code or documentation as opposed to direct references like most AWS resources allow.  An example would be service accounts on workloads.  If it is set, we look for that specific SA name, otherwise there’s a ‘default’ SA in that namespace.

Supported Integrations

Arpio will mirror IAM OIDC Provider configurations and the appropriate role configurations within a cluster such that controllers using/controlling AWS services will work.

Method of Operation

Arpio's Kubernetes delegate uses the official Kubernetes Python client's dynamic API to discover and replicate cluster resources available at the API level.  One delegate function is created per Arpio application to access that application's public access clusters.  One additional delegate function is created and attached to Amazon VPC subnets and the cluster security group for each private-only cluster to ensure connectivity to the Kubernetes control plane.  

By using this dynamic API, Arpio can discover and protect most cluster resources automatically, even in versions of Kubernetes released after the Arpio Kubernetes delegates were configured. Most integrations should be replicated faithfully, but some may require additional introspection or translation.

Please contact us at support@arpio.io if any translated resources aren’t handled correctly.


In the primary environment, Arpio requires permissions to call DescribeCluster, the KMS permissions required to encrypt the sensitive data, sign the whole resource set, read and write to a scratch S3 bucket, and maintain the ENIs required for delegate Lambda functions attached to VPCs.

In the recovery environment, Arpio requires permissions to create new EKS clusters, the KMS permissions to validate the signature on and decrypt the configuration, read from the S3 bucket in which Arpio stores your cluster configurations, and read and write to a different scratch S3 bucket.  

These primary and recovery environment permissions are configured for you automatically by the CloudFormation access stacks you create when you configure an Arpio application.

Private Cluster Networking

Arpio's Kubernetes delegate function needs to be able to contact the Amazon S3 service to get encrypted copies of configuration data in and out of your cluster.  If the subnets your EKS cluster is in do not have Internet access, you can use PrivateLink endpoints to allow the delegate function to access Amazon S3 without opening your cluster to the Internet.

Configure the following endpoint types in VPC subnets containing private EKS clusters protected by Arpio:

Network Sandbox Considerations

Arpio’s network sandbox limits the ability of an EKS cluster’s ability to retrieve container images.  To make this easier, Private ECR repositories with the appropriate ECR VPC Endpoints in the hosting VPC permits access very cleanly.  Some built-in and popular images are hosted in public.ecr.aws but backed by dynamic cloudfront hostnames.  Allowing traffic to those dynamic locations can be difficult.  Moving these containers into ECR Pull Through Caches and referring to them from there is recommended.

Target Region Considerations

 As of 2024-06-26, these are the disallowed AZs for EKS:

AWS Region Region name

Disallowed Availability Zone IDs

 us-east-1 US East (N. Virginia)     use1-az3
us-west-1  US West (N. California) usw1-az2
ca-central-1 Canada (Central) cac1-az3


Arpio will attempt to place EKS Clusters in other availability zones in these regions when it is possible, but other resource restrictions are also included in these calculations. One example is AutoScaling Groups that are providing compute to your cluster will avoid AZs where specified instance types aren’t available.

Ensuring your target region has capacity for your whole, connected, workload is recommended.

Restoration Tiering

The previous iteration of EKS support used helm chart Kind ordering and retries to reproduce the resources in a cluster.  The second iteration (v2) splits the restoration into tiers as well.

Presently the tiers are as follows:

  1. Base networking/access control apps (kube-proxy, aws-node, eni-configs, eks-pod-identity-agent)
    CRDs, Namespaces, ClusterRoles
    aws-auth ConfigMap
  2. Karpenter & kube-dns
  3. istio, linkerd, aws load balancer controller
  4. Everything else

Compute resources are started after Tier 1, to ensure the permissions and other configuration is done prior to nodes attempting to join are capable of doing so. Other service meshes or compute management systems are yet to be supported, but can be.

v2 Cluster Onboarding Process

If you’re protecting a cluster under v1, the migration of your account to being v2 capable will keep that cluster selection, but offer the namespaces available as well.  Since the entire cluster was selected in v1, everything that “supports” that will be selected, pulling in all of the namespaces like v1.  To limit the application to a subset, de-select the cluster and select the appropriate namespaces or use tag rules to make your selections.

If Arpio does not yet have access to your cluster, it will not be able to discover the namespaces to offer for selection.  In this case, select the cluster to ensure Arpio notifies you about the lack of permissions.  In that issue, it will provide the commands to ensure permissions are available.

Once that access entry has been created, Arpio will start discovering the resources in the cluster and can then allow namespace or tag rule selection.