This is the first in a series of posts taking a look at Google Cloud Anthos, and how it seeks to facilitate digital transformation and become the management plane for enterprise workloads across hybrid and multi-cloud environments, starting with GKE on AWS becoming generally available.
The value proposition of Anthos is to enable environmental agnosticism, with containers and Kubernetes being the common denominator for our workloads. This allows for a level of portability through Anthos to manage workload deployments and lifecycles across multi-cloud (GCP, AWS and Azure), as well as on-prem data centres (VMWare & bare metal).
Venafi experts at Jetstack are seeing an increasing amount of clients seeking to either adopt or mature their Kubernetes offering, whilst leveraging the advantages of cloud-native principles in accordance with their business requirements and existing infrastructure investments. The Anthos initiative typifies this requirement by laying the foundations to migrate and align workloads across organizational boundaries.
To open, we’ll be covering a facet of Anthos which brings the GKE experience to your AWS environment.
Anthos
Anthos is a framework of software components, orientated around managing the deployment and life-cycling of infrastructure and workloads across multiple environments and locations. With enterprises having established infrastructure presences across multiple cloud providers and on-premises locations, Anthos centralises the orchestration of segregated clusters and workloads, providing a single-pane-of-glass across hybrid and multi-cloud topologies. This consolidates operations and provides consistency across cloud providers, whilst embracing existing infrastructure investments and unlocking new possibilities for hybrid and multi-cloud compositions. This also allows for companies to modernise in place, continuing to run workloads on-prem or on their infrastructure but adopting Kubernetes and cloud-native principles.
As well as a Kubernetes distribution, Anthos also provides ways to simplify hybrid and multi-cloud consistency and compliance through Config Management & Policy Controller. With Config Management, the GitOps methodology is adopted to reconcile observed state with the desired state for Kubernetes objects in source control through a cluster Operator. Policy Controller facilitates conformance at scale by building on Gatekeeper to provide a constraint template library to ensure consistency and compliance of configuration, as well as offering extensibility through writing policies using OPA and Rego.
Anthos Service Mesh is core to the proposition of running hybrid Kubernetes across cloud and on-premises infrastructure. Built using Istio, it enhances our experience by abstracting and automating cross-cutting concerns, such as issuing workload identities via X.509 certificates to facilitate automatic mutual TLS across our workloads and clusters, and provides mechanisms for layer 7 traffic routing within the mesh.
Anthos Service Mesh also centralises the process of certificate issuance and renewal, leading to segregated clusters being able to have cross-boundary trust ensuring service-to-service communications can mutually authenticate.
When our experts are your experts, you can make the most of Kubernetes
GKE on AWS
GKE on AWS follows GKE On-Prem in being the next enabler to bring the GKE experience to your infrastructure. This means we can integrate with existing AWS environments, and leverage the Anthos stack to provide consistency across our clusters and workloads whilst having centralised operations with the GCP Console.
Bringing the GKE experience to AWS empowers developers, administrators and architects to incorporate GKE as we know it on Google Cloud into their existing infrastructure, whilst being selective about workload placement pertinent to business decisions and maximising business value.
This unlocks lots of opportunities to harness the advantages of multi-cloud. Now we can focus on the business logic of our applications, and deploy anywhere due to the homogenous runtime environments, with workloads being highly portable allowing for placement strategies to ensure high availability or scaling requirements.
We can also take advantage of proprietary managed services offered by cloud providers, allowing for flexibility when adopting a multi-cloud strategy and having an interoperability with our workloads and their infrastructure requirements.
Architecture
The solution for GKE on AWS provides the requisite tooling for deploying GKE into your AWS environment, creating new or working alongside existing resources. The design philosophy reuses the concept seen in the GKE On-Prem architecture, with a hierarchical model comprised of management and user clusters, the former using an AWS Controller to bootstrap the creation and manage the lifecycle of the latter.
Through the management cluster, we can create GKE clusters with the AWSCluster
and AWSNodePool
custom resource definitions. It is then the responsibility of the Cluster Operator controller to provision the necessary resources for the user clusters through the AWS APIs. This is implemented through the gke-aws-cluster-operator
static pod, which is an application that contains the core cluster management logic.
One management cluster can administer multiple downstream user clusters, with control plane configuration stored in etcd and with storage persisted on an AWS EBS volume.
Deployment
Management cluster
To begin, we’ll deploy our management cluster using the anthos-gke
cli. This will autogenerate some Terraform for us to deploy which will comprise the necessary infrastructure to host our management plane in AWS. It includes a dedicated VPC and subnets (or it can be integrated with an existing VPC), as well as security groups to facilitate inbound and outbound SSH and HTTPS traffic for the Kubernetes master.
The management cluster will allow us to administer GKE on AWS ‘user’ clusters for running our workloads, provisioned by Kubernetes objects. This hierarchical model of management and user clusters is a core principle of Anthos, whether it is with GKE On-Prem on VMWare or GKE on AWS, the orchestration on downstream clusters is all done through the Kubernetes API and CRDs.
We can begin our management cluster provisioning using anthos-gke
. The prerequisites for the installation are:
- KMS key for encrypting clusters secrets
- KMS key for securing management service’s
etcd
database - Inbound SSH CIDR range for the bastion host created in the AWS public subnet for accessing the GKE nodes
- GCP Project which has access to Anthos
- Service accounts and keys to:
- manage GKE for AWS membership to GKE Hub
- set up Connect between GKE on AWS and GKE Hub
- accessing
gcr.io
repository from GKE on AWS nodes
With these prerequisites, we can populate the required config file to configure the management service.
apiVersion: multicloud.cluster.gke.io/v1
kind: AWSManagementService
metadata:
name: management
spec:
version: aws-1.4.1-gke.15
region: eu-west-1
authentication:
awsIAM:
adminIdentityARNs:
- arn:aws:iam::0123456789012:user/gke-aws-admin
kmsKeyARN: arn:aws:kms:eu-west-1:0123456789012:key/4172f749-702e-41fe-ac3f-898d21930cb6
databaseEncryption:
kmsKeyARN: arn:aws:kms:eu-west-1:0123456789012:key/12560928-59cf-4157-a29f-eecb4ec93fd2
googleCloud:
projectID: jetstack-anthos
serviceAccountKeys:
managementService: management-key.json
connectAgent: hub-key.json
node: node-key.json
dedicatedVPC:
vpcCIDRBlock: 10.0.0.0/16
availabilityZones:
- eu-west-1a
- eu-west-1b
- eu-west-1c
privateSubnetCIDRBlocks:
- 10.0.1.0/24
- 10.0.2.0/24
- 10.0.3.0/24
publicSubnetCIDRBlocks:
- 10.0.4.0/24
- 10.0.5.0/24
- 10.0.6.0/24
bastionAllowedSSHCIDRBlocks:
- 198.51.100.0/24
Running anthos-gke aws management init
will encrypt our service account keys and generate a root CA, writing these values to a configuration file.
$ anthos-gke aws management init
generating cluster ID
encrypting Google Cloud service account key (Management Service)
encrypting Google Cloud service account key (Connect Agent)
encrypting Google Cloud service account key (Node)
generating root certificate authority (CA)
writing file: anthos-gke.status.yaml
To create the cluster, we need to apply the generated configuration.
$ anthos-gke aws management apply
creating S3 bucket: gke-jetstack-anthos-eu-west-1-bootstrap
writing file: README.md
writing file: backend.tf
writing file: main.tf
writing file: outputs.tf
writing file: variables.tf
writing file: vpc.tf
writing file: terraform.tfvars.json
Initializing modules...
Downloading gcs::https://www.googleapis.com/storage/v1/gke-multi-cloud-release/aws/aws-1.4.1-gke.15/modules/terraform-aws-gke.tar.gz for gke_bastion_security_group_rules...
- gke_bastion_security_group_rules in .terraform/modules/gke_bastion_security_group_rules/modules/gke-bastion-security-group-rules
Downloading gcs::https://www.googleapis.com/storage/v1/gke-multi-cloud-release/aws/aws-1.4.1-gke.15/modules/terraform-aws-gke.tar.gz for gke_controlplane_iam_policies...
- gke_controlplane_iam_policies in .terraform/modules/gke_controlplane_iam_policies/modules/gke-controlplane-iam-policies
Downloading gcs::https://www.googleapis.com/storage/v1/gke-multi-cloud-release/aws/aws-1.4.1-gke.15/modules/terraform-aws-gke.tar.gz for gke_controlplane_iam_role...
- gke_controlplane_iam_role in .terraform/modules/gke_controlplane_iam_role/modules/gke-controlplane-iam-role
Downloading gcs::https://www.googleapis.com/storage/v1/gke-multi-cloud-release/aws/aws-1.4.1-gke.15/modules/terraform-aws-gke.tar.gz for gke_management...
- gke_management in .terraform/modules/gke_management/modules/gke-management
...
Apply complete! Resources: 61 added, 0 changed, 0 destroyed.
This creates an S3 bucket for the gke-aws-node-agent
binary, and initialises the Terraform modules which will provision the necessary infrastructure to host the GKE on AWS management cluster.
Once Terraform has successfully completed provisioning the infrastructure, we can see that a bastion host, as well as the necessary EC2 instances, ELBs and security groups have been created.
Management cluster instances
Management cluster LBs
Management cluster SGs
Management cluster VPC
Management cluster Subnets
These are Ubuntu 18.04 instances, launched into an auto scaling group with a launch template and user data running the gke-aws-node-agent
, bootstrapping the instance to function as the management plane.
We can use the bastion host to gain access to the Kubernetes API by opening an SSH tunnel allowing anthos-gke
to complete the setup.
terraform output bastion_tunnel > bastion-tunnel.sh
chmod 755 bastion-tunnel.sh
./bastion-tunnel.sh -N &
anthos-gke aws management get-credentials
After this we can connect to the management cluster using kubectl
.
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectx gke_aws_management_gke-404767c1 && kubectl cluster-info
Kubernetes master is running at https://gke-404767c1-management-06afb2d2341f17cb.elb.eu-west-1.amazonaws.com
AWSCluster
Now that we have our management cluster, we can provision user clusters to run our workloads. In AWS, user clusters manifest through the AWSCluster
and AWSNodePool
custom resources. This leverages a declarative, Kubernetes-style API approach for cluster creation, configuration and management.
If we take a look at the CRDs currently on the management cluster, we can see the two resources available to provision AWS clusters.
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectl get crd
NAME CREATED AT
awsclusters.multicloud.cluster.gke.io 2020-07-27T09:57:57Z
awsnodepools.multicloud.cluster.gke.io 2020-07-27T09:57:57Z
Our previous Terraform deployment can be used to generate a configuration for a basic user cluster.
terraform output cluster_example > cluster-0.yaml
apiVersion: multicloud.cluster.gke.io/v1
kind: AWSCluster
metadata:
name: cluster-0
spec:
region: eu-west-1
authentication:
awsIAM:
adminIdentityARNs:
- arn:aws:iam::0123456789012:user/gke-aws-admin
networking:
vpcID: vpc-027d53db543f33b56
serviceAddressCIDRBlocks:
- 10.1.0.0/16
podAddressCIDRBlocks:
- 10.2.0.0/16
serviceLoadBalancerSubnetIDs:
- subnet-08fdd9360320f5f27
- subnet-03babff7f9b5d4a7b
- subnet-072949cee3d082184
- subnet-06efa94e317c13730
- subnet-03808c78800c82a9d
- subnet-07b471ccadb908e9b
controlPlane:
version: 1.16.9-gke.12
keyName: gke-404767c1-keypair
instanceType: t3.medium
iamInstanceProfile: gke-404767c1-controlplane
securityGroupIDs:
- sg-0372ceaae4fc17084
subnetIDs:
- subnet-08fdd9360320f5f27
- subnet-03babff7f9b5d4a7b
- subnet-072949cee3d082184
rootVolume:
sizeGiB: 10
etcd:
mainVolume:
sizeGiB: 10
databaseEncryption:
kmsKeyARN: arn:aws:kms:eu-west-1:0123456789012:key/12560928-59cf-4157-a29f-eecb4ec93fd2
hub:
membershipName: projects/jetstack-anthos/locations/global/memberships/cluster-0
---
apiVersion: multicloud.cluster.gke.io/v1
kind: AWSNodePool
metadata:
name: cluster-0-pool-0
spec:
clusterName: cluster-0
version: 1.16.9-gke.12
region: eu-west-1
subnetID: subnet-08fdd9360320f5f27
minNodeCount: 3
maxNodeCount: 5
instanceType: t3.medium
keyName: gke-404767c1-keypair
iamInstanceProfile: gke-404767c1-nodepool
maxPodsPerNode: 100
securityGroupIDs:
- sg-0372ceaae4fc17084
rootVolume:
sizeGiB: 10
---
apiVersion: multicloud.cluster.gke.io/v1
kind: AWSNodePool
metadata:
name: cluster-0-pool-1
spec:
clusterName: cluster-0
version: 1.16.9-gke.12
region: eu-west-1
subnetID: subnet-03babff7f9b5d4a7b
minNodeCount: 3
maxNodeCount: 5
instanceType: t3.medium
keyName: gke-404767c1-keypair
iamInstanceProfile: gke-404767c1-nodepool
maxPodsPerNode: 100
securityGroupIDs:
- sg-0372ceaae4fc17084
rootVolume:
sizeGiB: 10
---
apiVersion: multicloud.cluster.gke.io/v1
kind: AWSNodePool
metadata:
name: cluster-0-pool-2
spec:
clusterName: cluster-0
version: 1.16.9-gke.12
region: eu-west-1
subnetID: subnet-072949cee3d082184
minNodeCount: 3
maxNodeCount: 5
instanceType: t3.medium
keyName: gke-404767c1-keypair
iamInstanceProfile: gke-404767c1-nodepool
maxPodsPerNode: 100
securityGroupIDs:
- sg-0372ceaae4fc17084
rootVolume:
sizeGiB: 10
We can submit the AWSCluster
and AWSNodePool
resources to initiate the cluster creation.
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectl apply -f cluster-0.yaml
awscluster.multicloud.cluster.gke.io/cluster-0 created
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-0 created
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-1 created
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-2 created
As our custom resource is part of the Kubernetes parlance, we can interact with the object to see it’s state.
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectl get AWSClusters,AWSNodePools
NAME STATE AGE VERSION ENDPOINT
awscluster.multicloud.cluster.gke.io/cluster-0 Provisioning 22s 1.16.9-gke.12 gke-dfccaa67-controlplane-f26573d5bef4bba0.elb.eu-west-1.amazonaws.com
NAME CLUSTER STATE AGE VERSION
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-0 cluster-0 Provisioning 22s 1.16.9-gke.12
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-1 cluster-0 Provisioning 22s 1.16.9-gke.12
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-2 cluster-0 Provisioning 22s 1.16.9-gke.12
At this point, the AWS Controller is provisioning resources in AWS which will comprise our user cluster. The default GKE on AWS installation creates an AWSCluster
with three control plane replicas in the same availability zones. The management cluster places the control planes in a private subnet behind an AWS Network Load Balancer (NLB). The management cluster interacts with the control plane using that NLB.
User cluster instances
Management cluster lbs
Management cluster SGs
Once the cluster bootstrap process is complete we can see the events.
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
4m56s Normal StartedNodePoolProvisioning awsnodepool/cluster-0-pool-0 Started node pool provisioning
4m45s Normal CreatedLaunchTemplate awsnodepool/cluster-0-pool-0 Created launch template named "gke-dfccaa67-nodepool-3125db34-1.16.9-gke.12"
4m41s Normal CreatedAutoScalingGroup awsnodepool/cluster-0-pool-0 Created auto scaling group named "gke-dfccaa67-nodepool-3125db34"
4m56s Normal StartedNodePoolProvisioning awsnodepool/cluster-0-pool-1 Started node pool provisioning
4m44s Normal CreatedLaunchTemplate awsnodepool/cluster-0-pool-1 Created launch template named "gke-dfccaa67-nodepool-fdca9ec5-1.16.9-gke.12"
4m40s Normal CreatedAutoScalingGroup awsnodepool/cluster-0-pool-1 Created auto scaling group named "gke-dfccaa67-nodepool-fdca9ec5"
4m56s Normal StartedNodePoolProvisioning awsnodepool/cluster-0-pool-2 Started node pool provisioning
4m43s Normal CreatedLaunchTemplate awsnodepool/cluster-0-pool-2 Created launch template named "gke-dfccaa67-nodepool-76e54252-1.16.9-gke.12"
4m38s Normal CreatedAutoScalingGroup awsnodepool/cluster-0-pool-2 Created auto scaling group named "gke-dfccaa67-nodepool-76e54252"
4m56s Normal CreatingCluster awscluster/cluster-0 Cluster version 1.16.9-gke.12 is being created
4m52s Normal TagSubnets awscluster/cluster-0 Tagged subnets ["subnet-08fdd9360320f5f27" "subnet-03babff7f9b5d4a7b" "subnet-072949cee3d082184" "subnet-06efa94e317c13730" "subnet-03808c78800c82a9d" "subnet-07b471ccadb908e9b"] with tags map["kubernetes.io/cluster/gke-dfccaa67":"shared"]
4m51s Normal CreatedSecurityGroup awscluster/cluster-0 Created security group named "gke-dfccaa67-controlplane"
4m51s Normal CreatedSecurityGroup awscluster/cluster-0 Created security group named "gke-dfccaa67-nodepool"
4m51s Normal CreatedEtcdVolume awscluster/cluster-0 Created etcd volume on replica 0
4m50s Normal CreatedEtcdVolume awscluster/cluster-0 Created etcd volume on replica 1
4m50s Normal CreatedEtcdVolume awscluster/cluster-0 Created etcd volume on replica 2
4m50s Normal CreatedNetworkLoadBalancer awscluster/cluster-0 Created network load balancer named "gke-dfccaa67-controlplane"
4m49s Normal CreatedTargetGroup awscluster/cluster-0 Created target group named "gke-dfccaa67-controlplane"
4m48s Normal CreatedNetworkInterface awscluster/cluster-0 Created network interface on replica 0
4m48s Normal CreatedNetworkInterface awscluster/cluster-0 Created network interface on replica 1
4m47s Normal CreatedNetworkInterface awscluster/cluster-0 Created network interface on replica 2
4m47s Normal CreatedListener awscluster/cluster-0 Created listener on load balancer with ARN "arn:aws:elasticloadbalancing:eu-west-1:0123456789012:loadbalancer/net/gke-dfccaa67-controlplane/f26573d5bef4bba0"
4m46s Normal CreatedLaunchTemplate awscluster/cluster-0 Created launch template named "gke-dfccaa67-controlplane-0-1.16.9-gke.12"
4m46s Normal CreatedLaunchTemplate awscluster/cluster-0 Created launch template named "gke-dfccaa67-controlplane-1-1.16.9-gke.12"
4m46s Normal CreatedLaunchTemplate awscluster/cluster-0 Created launch template named "gke-dfccaa67-controlplane-2-1.16.9-gke.12"
4m43s Normal CreatedAutoScalingGroup awscluster/cluster-0 Created auto scaling group named "gke-dfccaa67-controlplane-0"
4m42s Normal CreatedAutoScalingGroup awscluster/cluster-0 Created auto scaling group named "gke-dfccaa67-controlplane-1"
4m41s Normal CreatedAutoScalingGroup awscluster/cluster-0 Created auto scaling group named "gke-dfccaa67-controlplane-2"
0s Normal RegisteredGKEHubMembership awscluster/cluster-0 Registered to GKE Hub using membership "projects/jetstack-anthos/locations/global/memberships/cluster-0"
0s Normal AddonsApplied awscluster/cluster-0 Addons applied for version 1.16.9-gke.12
0s Normal InstalledGKEHubAgent awscluster/cluster-0 Installed GKE Hub agent
0s Normal ClusterProvisioned awscluster/cluster-0 Cluster version 1.16.9-gke.12 has been provisioned
0s Normal ProvisionedNodePool awsnodepool/cluster-0-pool-1 Node pool provisioned
0s Normal ProvisionedNodePool awsnodepool/cluster-0-pool-2 Node pool provisioned
0s Normal ProvisionedNodePool awsnodepool/cluster-0-pool-0 Node pool provisioned
Also, we can see the logs from the gke-aws-cluster-operator
from the management instance.
export POD_ID=$(sudo crictl pods --name gke-aws-cluster-operator --latest --quiet)
export CONTAINER_ID=$(sudo crictl ps --pod $POD_ID --latest --quiet)
sudo crictl logs $CONTAINER_ID
{"level":"info","ts":1594307993.1424375,"logger":"setup","msg":"starting cluster controller","version":"aws-0.2.1-gke.7"}
{"level":"info","ts":1594307993.2432592,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"awsnodepool-reconciler"}
{"level":"info","ts":1594307993.243259,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"awscluster-reconciler"}
{"level":"info","ts":1594307993.343698,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"awsnodepool-reconciler","worker count":1}
{"level":"info","ts":1594307993.3439271,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"awscluster-reconciler","worker count":1}
{"level":"info","ts":1594308345.2519808,"msg":"Validating AWSCluster create"}
{"level":"info","ts":1594308345.257065,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308345.2595742,"logger":"controlplane-reconciler","msg":"adding finalizer","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308345.2624917,"msg":"Validating AWSCluster update"}
{"level":"info","ts":1594308345.264924,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308345.376223,"msg":"Validating AWSNodePool create"}
{"level":"info","ts":1594308345.3793032,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308345.382418,"logger":"nodepool-reconciler","msg":"adding finalizer","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308345.384661,"msg":"Validating AWSNodePool update"}
{"level":"info","ts":1594308345.387612,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308346.1197634,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.SetProvisioningStateCommand"}
{"level":"info","ts":1594308346.1245346,"logger":"controlplane-reconciler","msg":"reconciliation finished but more work is needed","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308346.125778,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308346.1586978,"msg":"Planning provisioning"}
{"level":"info","ts":1594308346.1587348,"logger":"nodepool-reconciler","msg":"executing command","command":"*nodepool.SetProvisioningStateCommand"}
{"level":"info","ts":1594308346.1637373,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308346.1641495,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308346.6994834,"msg":"Planning provisioning"}
{"level":"info","ts":1594308346.6995149,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308350.6837356,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateSecurityGroupCommand"}
{"level":"info","ts":1594308351.0741904,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateSecurityGroupCommand"}
{"level":"info","ts":1594308351.6007662,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateEtcdVolumeCommand"}
{"level":"info","ts":1594308351.7904568,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateEtcdVolumeCommand"}
{"level":"info","ts":1594308351.9867935,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateEtcdVolumeCommand"}
{"level":"info","ts":1594308352.1732028,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateNetworkLoadBalancerCommand"}
{"level":"info","ts":1594308352.6873991,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateTargetGroupCommand"}
{"level":"info","ts":1594308352.8467724,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateRootCASecretCommand"}
{"level":"info","ts":1594308352.864171,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateKeyPairSecretCommand"}
{"level":"info","ts":1594308352.8721669,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateSSHKeySecretCommand"}
{"level":"info","ts":1594308352.8793178,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateHubMembershipCommand"}
{"level":"info","ts":1594308353.4512975,"logger":"controlplane-reconciler","msg":"reconciliation finished but more work is needed","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308353.4515018,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308354.3009841,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.AddSecurityGroupIngressCommand"}
{"level":"info","ts":1594308354.4307039,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.AddSecurityGroupEgressCommand"}
{"level":"info","ts":1594308354.568091,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.AddSecurityGroupIngressCommand"}
{"level":"info","ts":1594308354.7273095,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.AddSecurityGroupEgressCommand"}
{"level":"info","ts":1594308354.881284,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateNetworkInterfaceCommand"}
{"level":"info","ts":1594308355.101913,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateNetworkInterfaceCommand"}
{"level":"info","ts":1594308355.3225775,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateNetworkInterfaceCommand"}
{"level":"info","ts":1594308355.5158548,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateListenerCommand"}
{"level":"info","ts":1594308355.5431557,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateAdminCertSecretCommand"}
{"level":"info","ts":1594308355.5547044,"logger":"controlplane-reconciler","msg":"reconciliation finished but more work is needed","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308355.554765,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308356.164133,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308356.523212,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateLaunchTemplateCommand"}
{"level":"info","ts":1594308356.633643,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateLaunchTemplateCommand"}
{"level":"info","ts":1594308356.747063,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateLaunchTemplateCommand"}
{"level":"info","ts":1594308356.8503606,"logger":"controlplane-reconciler","msg":"reconciliation finished but more work is needed","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308356.8505132,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308357.0783868,"msg":"Planning provisioning"}
{"level":"info","ts":1594308357.2415674,"logger":"nodepool-reconciler","msg":"executing command","command":"*nodepool.CreateLaunchTemplateCommand"}
{"level":"info","ts":1594308357.3434715,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308357.3441882,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308357.5997462,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateAutoScalingGroupCommand"}
{"level":"info","ts":1594308358.355025,"msg":"Planning provisioning"}
{"level":"info","ts":1594308358.3550591,"logger":"nodepool-reconciler","msg":"executing command","command":"*nodepool.CreateAutoScalingGroupCommand"}
{"level":"info","ts":1594308358.8569715,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateAutoScalingGroupCommand"}
{"level":"info","ts":1594308359.884949,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateAutoScalingGroupCommand"}
{"level":"info","ts":1594308597.8016522,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308598.6790671,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.ApplyAddonsCommand"}
{"level":"info","ts":1594308604.1369157,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308606.138593,"msg":"Planning provisioning"}
{"level":"info","ts":1594308606.1387239,"msg":"Waiting for nodes to join the cluster","readyNodes":0,"expected":3}
{"level":"info","ts":1594308606.1387453,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308616.1390402,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308617.4870696,"msg":"Planning provisioning"}
{"level":"info","ts":1594308617.4871018,"msg":"Waiting for nodes to join the cluster","readyNodes":0,"expected":3}
{"level":"info","ts":1594308617.4871092,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308618.3437061,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.InstallConnectAgentCommand"}
{"level":"info","ts":1594308623.9253933,"logger":"controlplane-reconciler","msg":"reconciliation finished but more work is needed","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308623.925449,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308624.7936692,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.SetProvisionedStateCommand"}
{"level":"info","ts":1594308624.799627,"logger":"controlplane-reconciler","msg":"reconciliation finished","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308624.7996898,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308625.5904305,"logger":"controlplane-reconciler","msg":"reconciliation finished","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308627.4873798,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308628.643484,"msg":"Planning provisioning"}
{"level":"info","ts":1594308628.6435158,"msg":"Waiting for nodes to join the cluster","readyNodes":1,"expected":3}
{"level":"info","ts":1594308628.6435242,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308633.9257207,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308634.866224,"logger":"controlplane-reconciler","msg":"reconciliation finished","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308638.6436768,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308639.952066,"msg":"Planning provisioning"}
{"level":"info","ts":1594308639.9520943,"msg":"All nodes ready, switching to 'Provisioned' state"}
With our GKE on AWS user cluster fully provisioned, we will see in the GCP console that the cluster has automatically been registered with the GKE Hub.
User cluster registered
As part of the bootstrap process, our cluster is registered with the GKE Hub using the service account key provided, as well as deploying a Connect agent into the user cluster. After the connection is established, the Connect Agent service can exchange account credentials, technical details, and metadata about connected infrastructure and workloads necessary to manage them with Google Cloud, including the details of resources, applications, and hardware.
All that remains is to obtain our kubeconfig for the user cluster.
$ env HTTP_PROXY=http://127.0.0.1:8118 \
anthos-gke aws clusters get-credentials cluster-0
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectx gke_aws_default_cluster-0_gke-dfccaa67 && kubectl cluster-info
Switched to context "gke_aws_default_cluster-0_gke-dfccaa67".
Kubernetes master is running at https://gke-dfccaa67-controlplane-f26573d5bef4bba0.elb.eu-west-1.amazonaws.com
CoreDNS is running at https://gke-dfccaa67-controlplane-f26573d5bef4bba0.elb.eu-west-1.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
KubeDNSUpstream is running at https://gke-dfccaa67-controlplane-f26573d5bef4bba0.elb.eu-west-1.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns-upstream:dns/proxy
Metrics-server is running at https://gke-dfccaa67-controlplane-f26573d5bef4bba0.elb.eu-west-1.amazonaws.com/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
Workloads
Now we have our user cluster deployed, we can start deploying workloads and see how the AWS Controller dynamically provisions AWS resources subsequent to our Kubernetes objects.
For this demonstration, we’ll be using the Online Boutique demonstration application to illustrate a microservices application leveraging underlying integrations to facilitate persistent storage, load balancing and routing for GKE on AWS.
env HTTP_PROXY=http://127.0.0.1:8118 \
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/master/release/kubernetes-manifests.yaml
Recall the Connect agent running in the user cluster, which is sending details and metadata about the user cluster infrastructure and workloads to the GKE Hub. Consequently, all of our workloads are viewable in the Kubernetes Engine dashboard just like if they were running in any other GKE cluster.
Management cluster SGs
Load balancing
With our Online Boutique application deployed, a LoadBalancer
service has been created to make it publically available. However, there are some steps necessary in order for the AWS Controller to facilitate routing to the services.
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectl get svc frontend-external
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/frontend-external LoadBalancer 10.1.113.141 <pending> 80:31128/TCP 17s
First, in order for the AWS Controller to configure our load balancer, we need to tag the subnet with the cluster ID to ensure correct placement. Depending on whether we want our Load Balancer to be public or private, the tags on the subnets allow the AWS Controller to find the correct cluster in which the GKE cluster has been deployed. By default, the AWS Controller will create a Classic ELB in the public subnet, in the according subnet pertinent to the GKE cluster’s placement. If we want to deploy an Network Load Balancer, or create a Load Balancer in the private subnet, we can annotate the Service
with service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
, and service.beta.kubernetes.io/aws-load-balancer-internal: "true"
respectively.
aws ec2 create-tags \
--resources $SUBNET_ID \
--tags Key=kubernetes.io/cluster/$CLUSTER_ID,Value=shared
Now, if we watch our frontend-external service, we get a public endpoint, with an ELB created in AWS:
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectl get svc frontend-external
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
frontend-external LoadBalancer 10.1.113.141 a52ebaf96215a4c489f7b47c0eafb4f1-523240740.eu-west-1.elb.amazonaws.com 80:31128/TCP 18m
Frontend external lb
Online Boutique
Storage
Persistent storage can be created for workloads within GKE on AWS using PersistentVolume
, PersistentVolumeClaim
and StorageClass
resources, providing persistent file and block storage.
Creating a PersistentVolumeClaim
without the field spec.storageClassName
set provisions a gp2 volume using the default GKE on AWS EBS CSI Driver StorageClass.
First, let’s create our PersistentVolumeClaim
:
Storage
Persistent storage can be created for workloads within GKE on AWS using PersistentVolume, PersistentVolumeClaim and StorageClass resources, providing persistent file and block storage.
Creating a PersistentVolumeClaim without the field spec.storageClassName set provisions a gp2 volume using the default GKE on AWS EBS CSI Driver StorageClass.
First, let’s create our PersistentVolumeClaim:
Then patch our redis deployment:
env HTTP_PROXY=http://127.0.0.1:8118 \
kubectl patch deploy redis-cart --patch '{
"spec": {
"template": {
"spec": {
"containers": [{
"name": "redis",
"volumeMounts": [{
"mountPath": "/data",
"name": "redis-pvc"
}]
}],
"volumes": [{
"name": "redis-pvc",
"persistentVolumeClaim": {
"claimName": "redis-pvc"
}
}]
}
}
}
}'
Now we can see the persistent volume claim is bound to the volume, with an EBS created in AWS.
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectl get pvc,pv
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/redis-pvc Bound pvc-78366255-3a17-4d16-8665-c9d5c6c8a88e 1Gi RWO standard-rwo 35m
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-78366255-3a17-4d16-8665-c9d5c6c8a88e 1Gi RWO Delete Bound default/redis-pvc standard-rwo 13m
Persistent volume
Autoscaling
Anthos GKE on AWS also provides for Cluster autoscaling to benefit from the elasticity of dynamically provisioning nodes in accordance to demand. This ensures that there are resources available relative to resource requests from workloads and that infrastructure is scaled in to optimise cost. Again, this is all achieved through custom resource definitions, with the AWSNodePool
resource defining the minNodeCount
and maxNodeCount
, and the gke-aws-cluster-operator
adjusting capacity of the AWS auto scaling group. All this aims to simplify scaling logic and reducing cognitive overhead when it comes to cost-efficient and performant compute infrastructure.
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectx gke_aws_management_gke-404767c1 && kubectl patch AWSNodePool cluster-0-pool-0 --type=json -p='[{"op": "replace", "path": "/spec/minNodeCount", "value": 4}]'
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-0 patched
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectl get AWSNodepool
NAME CLUSTER STATE AGE VERSION
cluster-0-pool-0 cluster-0 Resizing 38m 1.16.9-gke.12
cluster-0-pool-1 cluster-0 Provisioned 38m 1.16.9-gke.12
cluster-0-pool-2 cluster-0 Provisioned 38m 1.16.9-gke.12
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectx gke_aws_default_cluster-0_gke-dfccaa67 && kubectl get nodes
Switched to context "gke_aws_default_cluster-0_gke-dfccaa67".
NAME STATUS ROLES AGE VERSION
ip-10-0-1-172.eu-west-1.compute.internal Ready <none> 35m v1.16.9-gke.12
ip-10-0-1-204.eu-west-1.compute.internal Ready <none> 35m v1.16.9-gke.12
ip-10-0-1-59.eu-west-1.compute.internal Ready <none> 34m v1.16.9-gke.12
ip-10-0-1-78.eu-west-1.compute.internal Ready <none> 53s v1.16.9-gke.12
ip-10-0-2-92.eu-west-1.compute.internal Ready <none> 35m v1.16.9-gke.12
ip-10-0-2-93.eu-west-1.compute.internal Ready <none> 35m v1.16.9-gke.12
ip-10-0-2-94.eu-west-1.compute.internal Ready <none> 35m v1.16.9-gke.12
ip-10-0-3-139.eu-west-1.compute.internal Ready <none> 34m v1.16.9-gke.12
ip-10-0-3-192.eu-west-1.compute.internal Ready <none> 35m v1.16.9-gke.12
ip-10-0-3-208.eu-west-1.compute.internal Ready <none> 34m v1.16.9-gke.12
Scaling the NodePool out causes an additional EC2 instance to be launched.
User cluster scale out
Reducing the minNodeCount
will cause our NodePool to scale in as the current resources requested by the workloads do not constitute the number of nodes currently deployed.
$ env HTTP_PROXY=http://127.0.0.1:8118 \
kubectx gke_aws_management_gke-404767c1 && kubectl patch AWSNodePool cluster-0-pool-0 --type=json -p='[{"op": "replace", "path": "/spec/minNodeCount", "value": 1}]'
Switched to context "gke_aws_management_gke-404767c1".
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-0 patched
Now, we see the Cluster Autoscaler kick in and scale our NodePool down by terminating a node to optimise utilisation.
User cluster scale in
Logging and monitoring
We saw earlier how the Connect agent sends metadata of workloads running in the cluster back to GCP. As this metadata is collated we can begin to get a perspective of all of our workloads running across all of our GKE registered clusters. This is a key value-add feature of Anthos, as we are able to consolidate management and orchestration of our clusters and workloads through a single-pane-of-glass.
GKE On-Prem provides in-cluster Stackdriver agents, a combination of Fluent StatefulSets and Daemonsets, as well as Prometheus and Stackdriver sidecars, which collect logs and metrics for system components and sends this data to the Cloud Logging API.
At the time of writing, logging and monitoring for GKE on AWS is yet to be fully implemented. Support is due in a future release, with a similar implementation to GKE, where a gke-metrics-agent
will deployed via a DaemonSet on the user cluster which scrapes metrics from the kubelet and pushes to Cloud Monitoring APIs. A FluentBit pipeline will also collect logs from all Containerd components and the kubelet. We can see some of this in action, with our application logs from AWS appearing in Cloud Logging.
GCP frontend logs
This means we get telemetry data and logs from both on-premises and cloud deployments, all viewable through the GCP Console. When it comes to performing root cause analysis, access to logs is critical hence consolidation of multiple clusters removes the overhead of managing and traversing multiple systems in order to identify issues and correlating cascading errors.
As well as Cloud Logging & Monitoring, Anthos also integrates with Elastic Stack, Splunk and Datadog, offering a variety of options to customise your approach to observability, complementing existing solutions in on-premises or cloud environments.
You might opt to disable GCP Logging and Monitoring, and instead use Grafana and Prometheus as your monitoring solutions. These are currently available as optional parts of the GKE On-Prem installation, so integrating these OSS solutions with GKE on AWS is also an option depending on your use case. One must assess the trade offs of opting out of GCP’s supported logging and monitoring solutions as well as maintaining your own monitoring stack, versus the costs of transferring telemetry data across from AWS into GCP. It would be possible to gain multi-cluster observability on top of Prometheus and Grafana through Thanos, but again this would require investment in implementing and maintaining such a deployment.
Future
GKE for AWS offers the first opportunity to experience GKE on another cloud provider. Later this year we should see a preview of GKE running on Azure, furthering the boundaries of running applications on GKE outside of GCP. In addition to this, the on-premises offering will be furthered by having a bare-metal deployment option to run Anthos on physical servers. Again, this allows for companies to use existing hardware investments, paving the way for the adoption and standardisation of Kubernetes, with the option of running orthogonally with a cloud provider to optimise capex and opex investment.
With competition from other solutions such as VMware Tanzu and Azure Arc, as well as managed OpenShift offerings the hybrid and multi-cloud orchestration marketplace is maturing to a place where enterprises have a multitude of options when it comes to leveraging the cloud providers of their choice, and how they want to consolidate management and observability of workloads across disparate environments.
Anthos is not only a delivery mechanism for running Kubernetes on your infrastructure, but also provides the components required to optimise the experience of Kubernetes. Whether it’s improving continuous delivery and environmental consistency through Config Management and Policy Controller, or having fine-grained control over traffic shaping and centralised SLO and SLA orchestration using Anthos Service Mesh, Anthos provides enterprises with a repertoire from which they can migrate to, and leverage the power from Kubernetes.
Coming up, we’ll have more content on Anthos around:
- Running applications across multi-cloud with GKE and Anthos Service Mesh
- How to register EKS and AKS clusters with the GKE Hub via attached clusters
- Ensuring conformance through Policy Controller and OPA
- Anthos on Azure
Get in touch
If you’re wanting to know more about Anthos or running hybrid and multi-cloud Kubernetes, Jetstack offers consulting and subscription which can help you in your investigation and adoption in a variety of ways. Let us know if you’re interested in a workshop or working together to dive deeper into Anthos.