Skip to content
English
  • There are no suggestions because the search field is empty.

Azure Kubernetes Service (AKS)

Arpio replicates Azure Kubernetes Service clusters and their node pools, including cluster configuration, networking, supported add-ons, and the Kubernetes resources running inside the cluster — enabling containerized workloads to be restored in recovery environments.

Managed Clusters

Arpio replicates AKS Managed Clusters (Microsoft.ContainerService/managedClusters) with their network profile, identity configuration, API server access profile, add-on profiles, workload auto-scaler configuration, security profile, ingress profile, and service mesh profile. Both publicly accessible and private clusters are supported, as is API server VNet integration.

Translated Attributes

The following attributes are translated during replication:

Attribute Translation Method
API Server Subnet Reference translated to recovery subnet
Disk Encryption Set Reference translated to recovery disk encryption set
Load Balancer Outbound Public IPs References translated to recovery public IP addresses
Load Balancer Outbound IP Prefixes References translated to recovery public IP prefixes
Azure Key Vault KMS Reference translated to recovery Key Vault (used for etcd encryption)
Microsoft Defender Workspace Reference translated to recovery Log Analytics workspace
Istio Service Mesh CA Key Vault Reference translated to recovery Key Vault
Web App Routing DNS Zones References translated to recovery DNS zones or private DNS zones
Pod Identity User-Assigned Identities References translated to recovery user-assigned identities (deprecated AKS feature)
OMS Agent Log Analytics Workspace Reference translated to recovery Log Analytics workspace
AGIC Application Gateway Reference translated to recovery Application Gateway
Windows Admin Password Secret The Key Vault secret URL supplied via the arpio-config:admin-password-secret tag is translated to the recovery Key Vault secret

Automatic Dependency Selection

The following resources are automatically selected into recovery points when a Managed Cluster is selected:

  • The cluster's agent pools
  • The cluster's virtual network and subnets
  • Subnet referenced by the private cluster API server (when VNet integration is enabled)
  • Disk encryption set referenced for OS/data disk encryption
  • Public IP addresses and IP prefixes referenced as load balancer outbound IPs
  • Key Vault referenced as the etcd encryption KMS key
  • Log Analytics workspaces referenced by the omsagent add-on and by Microsoft Defender for Containers
  • Application Gateway referenced by the AGIC add-on (or the AKS-managed Application Gateway, if AGIC manages it itself)
  • DNS zones and private DNS zones referenced by the Web App Routing ingress profile
  • Key Vault referenced as the Istio service mesh certificate authority
  • User-assigned identities referenced by the (deprecated) pod identity profile
  • Key Vault secret referenced by the arpio-config:admin-password-secret tag, if set
  • Cluster-scoped Kubernetes resources and all resources in the kube-system namespace (see Kubernetes Resources)
  • Disks referenced by Persistent Volumes in the cluster's Kubernetes resources

Add-on Support

The following AKS add-ons are supported and replicated:

Add-on Notes
azureKeyvaultSecretsProvider CSI driver for mounting Key Vault secrets into pods
azurepolicy Azure Policy / Gatekeeper. Cluster-resident Policy/Gatekeeper objects are excluded from replication (see Kubernetes Resources)
ingressApplicationGateway AGIC. When the Application Gateway was created and managed by AKS, it is promoted to a BYO Application Gateway in recovery (see Managed Resources Promoted to BYO)
omsagent Container Insights / Log Analytics integration

In addition, the KEDA workload auto-scaler (workloadAutoScalerProfile.keda) is supported.

If a cluster has any other add-on or workload auto-scaler enabled, Arpio surfaces a recovery-point issue identifying the unsupported feature. The cluster will still be replicated, but the unsupported feature may not function correctly after recovery and may need to be reconfigured manually.

Windows Node Pool Administrator Password

Windows node pools require a local administrator password that Azure treats as write-only — it cannot be read back from the cluster after creation. To replicate the password into recovery, store it in Azure Key Vault and reference the secret URL on the cluster using the arpio-config:admin-password-secret tag. Arpio translates the secret URL to its recovery-environment Key Vault secret and provides the value to AKS when the recovery cluster is created.

Agent Pools

Arpio replicates AKS agent pools (Microsoft.ContainerService/agentPools) along with their parent cluster. Agent pools are also discoverable as standalone resources but are normally selected automatically when their cluster is selected.

Translated Attributes

The following attributes are translated during replication:

Attribute Translation Method
VNet Subnet Reference translated to recovery subnet (or to the cluster's promoted BYO subnet if the source cluster used an AKS-managed VNet)
Pod Subnet Reference translated to recovery subnet
Application Security Groups References translated to recovery application security groups
Proximity Placement Group Reference translated to recovery proximity placement group
Capacity Reservation Group Reference translated to recovery capacity reservation group
Dedicated Host Group Reference translated to recovery host group
Node Public IP Prefix Reference translated to recovery public IP prefix
Snapshot Source Reference translated to recovery agent pool snapshot

Automatic Dependency Selection

The following resources are automatically selected into recovery points when an Agent Pool is selected:

  • The parent Managed Cluster
  • Subnets referenced as the node subnet (vnetSubnetID) and pod subnet (podSubnetID)
  • Application security groups attached to the pool's nodes
  • Proximity placement group, capacity reservation group, and dedicated host group referenced by the pool
  • Public IP prefix used for node public IPs
  • Agent pool snapshot referenced as the pool's creation source

Limitations

  • The pool's node count and node image version are managed by Azure and the cluster autoscaler; the recovered pool starts at its configured initial size and on the current AKS node image.

Kubernetes Resources

In addition to the AKS control-plane resources above, Arpio replicates the Kubernetes objects running inside the cluster such as deployments, services, config maps, and secrets. This allows these resources to to be restored in the recovered cluster. Internal Kubernetes attributes (image references, Azure resource IDs and ARNs embedded in manifests, secrets, and so on) are translated to point at their recovery-environment counterparts. References from in-cluster resources to Azure resources (for example, secrets pulled from Key Vault by the CSI Secrets Store driver) are detected and the referenced Azure resources are automatically included in the recovery point.

Kubernetes Resource Selection

When a Managed Cluster is selected into a recovery point, Arpio replicates a curated subset of the in-cluster Kubernetes resources rather than the entire cluster.

  • By default, cluster-scoped resources and all resources in the kube-system namespace are included.
  • By namespace, additional namespaces can be selected in the Arpio console. All resources in a selected namespace are included in the recovery point.
  • By tag, tag rules can be built against virtual tags on Kubernetes resources to carve out broader or narrower selections. The supported tags are:
Tag Selects
k8s:kind Resources of a specific Kubernetes kind
k8s:namespace Resources in a specific namespace
k8s:label Resources carrying a specific Kubernetes label
k8s:name Resources matching a specific name

Dependencies between Kubernetes resources are taken into account when building the recovery point — selecting a resource automatically pulls in the related cluster objects required to run it (for example, a Deployment's referenced ConfigMaps, Secrets, and ServiceAccount).

Managed Resources Promoted to Bring-Your-Own (BYO)

When AKS creates infrastructure on the cluster's behalf — its virtual network, or its AGIC Application Gateway — those resources live in the cluster's "infra" resource group (typically MC_*) and are owned by AKS. Arpio recreates these as customer-owned bring-your-own (BYO) resources in the recovery environment so they can be wired into the recovery network and (where applicable) routed through a network sandbox firewall. The cluster is then created against the BYO resources.

Promoted Virtual Network

If the source cluster was created with an AKS-managed VNet (named aks-vnet-* in the MC_* resource group), Arpio replicates it as a BYO Virtual Network in the recovery environment, located in the cluster's own resource group. The cluster's agent pool vnetSubnetID references and any apiServerAccessProfile.subnetId for VNet-integrated clusters are rewritten to point at the new BYO subnet (named aks-subnet).

To allow the cluster's control-plane managed identity to join nodes to the BYO subnet and reconcile VMSS NICs when LoadBalancer services are created, Arpio also creates a Network Contributor role assignment for the cluster's control-plane identity on the BYO VNet's resource group during recovery.

Promoted Application Gateway (AGIC)

If the AGIC add-on is enabled and the Application Gateway was created and managed by AKS, Arpio replicates the gateway as a BYO Application Gateway in the recovery environment (in the cluster's own resource group), and the AGIC add-on configuration is updated to reference it explicitly via applicationGatewayId.

To allow the AGIC managed identity to manage the BYO gateway, join its subnet, and read its public IP, Arpio creates a Network Contributor role assignment for the AGIC identity on the BYO gateway's resource group during recovery.

Network Sandbox

When a recovery test is run with the Advanced Network Sandbox enabled, Arpio places an Azure Firewall between the recovery cluster and the internet so the test environment can be exercised without giving it production network reachability. Several AKS-specific adjustments are made to the cluster so it remains functional behind the firewall.

Promotion of AKS-Managed Infrastructure to BYO

The sandbox firewall needs to be wired into the cluster's network — which requires user-defined routes on the cluster's subnets and an Application Gateway whose subnet Arpio controls. AKS does not allow either on infrastructure that AKS itself manages. As described in Managed Resources Promoted to BYO, Arpio recreates AKS-managed VNets and AGIC Application Gateways as customer-owned resources in the recovery environment so it can attach UDRs to the cluster's subnets and route AGIC ingress through the firewall. This promotion happens for all AKS recoveries (not only sandboxed ones) to keep failover-test and real-failover topologies consistent.

Outbound Traffic: User-Defined Routing

AKS supports several outbound types (loadBalancer, managedNATGateway, userAssignedNATGateway, userDefinedRouting, none, etc.). To force all egress through the sandbox firewall, the cluster's networkProfile.outboundType must be userDefinedRouting.

If the source cluster's outboundType is not already userDefinedRouting or none, Arpio rewrites it to userDefinedRouting in the failover-test copy of the cluster. This change is applied only to the test copy — real failovers preserve the source outboundType.

For background on the AKS UDR egress model and the Microsoft endpoints that must be reachable from a firewalled cluster, see Microsoft's Customize cluster egress with a user-defined routing table in AKS.

Inbound Traffic: DNAT to Load Balancer Frontends

Putting a firewall in front of the cluster also blocks ingress to Service type=LoadBalancer resources that AKS provisions for in-cluster services. To keep these services reachable during a sandboxed test, Arpio allocates an additional public IP per load balancer frontend on the sandbox firewall and adds DNAT rules to the firewall that translate traffic arriving at the firewall IP to the underlying load balancer frontend IP.

To reach a load-balanced service during a sandboxed test, send traffic to the firewall's public IP for that frontend (visible in the Arpio console alongside the recovery resources) rather than directly to the load balancer's own public IP.

AGIC Application Gateways follow Microsoft's parallel firewall + AGW pattern instead of DNAT — see the generic Network Sandbox page for details.

Required Outbound Allowlist

Because the sandbox firewall denies all egress by default, an AKS cluster behind the firewall cannot function until the destinations it needs to reach — image registries, package mirrors, the AKS control plane, etc. — are explicitly allowed. Add the following domains to the sandbox allowlist when enabling Network Sandbox for a recovery test that includes an AKS cluster:

Domain Purpose
acs-mirror.azureedge.net AKS base image and binary mirror
packages.aks.azure.com AKS package repository
packages.microsoft.com Microsoft package repository
.blob.storage.azure.net Azure Blob storage (Microsoft endpoints)
.blob.core.windows.net Azure Blob storage
mcr.microsoft.com Microsoft Container Registry
.data.mcr.microsoft.com Microsoft Container Registry data endpoint
dc.services.visualstudio.com Application Insights telemetry
.hcp.<region>.azmk8s.io Managed Kubernetes API server endpoint (region-specific)
login.microsoftonline.com Microsoft Entra ID — required for AGIC to authenticate to ARM
management.azure.com Azure Resource Manager — required for AGIC to manage the Application Gateway

!!! important "Adjust the azmk8s.io domain for your recovery region" The .hcp.<region>.azmk8s.io entry must use the recovery environment's region — for example, .hcp.westus2.azmk8s.io for a recovery environment in West US 2, or .hcp.eastus.azmk8s.io for East US. Without this entry, nodes in the recovery cluster cannot reach the managed control plane and the cluster will not come up.

This list is the minimum required for a basic AKS cluster (with AGIC) to start and join the control plane. Workloads running inside the cluster will likely need additional allowlist entries — for example, to reach Azure SQL, Service Bus, Event Hubs, third-party APIs, or other external dependencies. AKS add-ons and workload features may also require additional egress; Microsoft's Outbound network and FQDN rules for AKS clusters documents the complete set of endpoints per add-on and feature.

See Enabling Selective Outbound Access for how to enter the allowlist when starting a sandboxed recovery test.