This is the first in a series of posts taking a look at Google Cloud Anthos, and how it seeks to facilitate digital transformation and become the management plane for enterprise workloads across hybrid and multi-cloud environments, starting with GKE on AWS becoming generally available.
The value proposition of Anthos is to enable environmental agnosticism, with containers and Kubernetes being the common denominator for our workloads. This allows for a level of portability through Anthos to manage workload deployments and lifecycles across multi-cloud (GCP, AWS and Azure), as well as on-prem data centres (VMWare & bare metal).
Venafi experts at Jetstack are seeing an increasing amount of clients seeking to either adopt or mature their Kubernetes offering, whilst leveraging the advantages of cloud-native principles in accordance with their business requirements and existing infrastructure investments. The Anthos initiative typifies this requirement by laying the foundations to migrate and align workloads across organizational boundaries.
To open, we’ll be covering a facet of Anthos which brings the GKE experience to your AWS environment.
Anthos is a framework of software components, orientated around managing the deployment and life-cycling of infrastructure and workloads across multiple environments and locations. With enterprises having established infrastructure presences across multiple cloud providers and on-premises locations, Anthos centralises the orchestration of segregated clusters and workloads, providing a single-pane-of-glass across hybrid and multi-cloud topologies. This consolidates operations and provides consistency across cloud providers, whilst embracing existing infrastructure investments and unlocking new possibilities for hybrid and multi-cloud compositions. This also allows for companies to modernise in place, continuing to run workloads on-prem or on their infrastructure but adopting Kubernetes and cloud-native principles.
As well as a Kubernetes distribution, Anthos also provides ways to simplify hybrid and multi-cloud consistency and compliance through Config Management & Policy Controller. With Config Management, the GitOps methodology is adopted to reconcile observed state with the desired state for Kubernetes objects in source control through a cluster Operator. Policy Controller facilitates conformance at scale by building on Gatekeeper to provide a constraint template library to ensure consistency and compliance of configuration, as well as offering extensibility through writing policies using OPA and Rego.
Anthos Service Mesh is core to the proposition of running hybrid Kubernetes across cloud and on-premises infrastructure. Built using Istio, it enhances our experience by abstracting and automating cross-cutting concerns, such as issuing workload identities via X.509 certificates to facilitate automatic mutual TLS across our workloads and clusters, and provides mechanisms for layer 7 traffic routing within the mesh.
Anthos Service Mesh also centralises the process of certificate issuance and renewal, leading to segregated clusters being able to have cross-boundary trust ensuring service-to-service communications can mutually authenticate.
GKE on AWS
GKE on AWS follows GKE On-Prem in being the next enabler to bring the GKE experience to your infrastructure. This means we can integrate with existing AWS environments, and leverage the Anthos stack to provide consistency across our clusters and workloads whilst having centralised operations with the GCP Console.
Bringing the GKE experience to AWS empowers developers, administrators and architects to incorporate GKE as we know it on Google Cloud into their existing infrastructure, whilst being selective about workload placement pertinent to business decisions and maximising business value.
This unlocks lots of opportunities to harness the advantages of multi-cloud. Now we can focus on the business logic of our applications, and deploy anywhere due to the homogenous runtime environments, with workloads being highly portable allowing for placement strategies to ensure high availability or scaling requirements.
We can also take advantage of proprietary managed services offered by cloud providers, allowing for flexibility when adopting a multi-cloud strategy and having an interoperability with our workloads and their infrastructure requirements.
The solution for GKE on AWS provides the requisite tooling for deploying GKE into your AWS environment, creating new or working alongside existing resources. The design philosophy reuses the concept seen in the GKE On-Prem architecture, with a hierarchical model comprised of management and user clusters, the former using an AWS Controller to bootstrap the creation and manage the lifecycle of the latter.
Through the management cluster, we can create GKE clusters with the AWSCluster and AWSNodePool custom resource definitions. It is then the responsibility of the Cluster Operator controller to provision the necessary resources for the user clusters through the AWS APIs. This is implemented through the gke-aws-cluster-operator static pod, which is an application that contains the core cluster management logic.
One management cluster can administer multiple downstream user clusters, with control plane configuration stored in etcd and with storage persisted on an AWS EBS volume.
To begin, we’ll deploy our management cluster using the anthos-gke cli. This will autogenerate some Terraform for us to deploy which will comprise the necessary infrastructure to host our management plane in AWS. It includes a dedicated VPC and subnets (or it can be integrated with an existing VPC), as well as security groups to facilitate inbound and outbound SSH and HTTPS traffic for the Kubernetes master.
The management cluster will allow us to administer GKE on AWS ‘user’ clusters for running our workloads, provisioned by Kubernetes objects. This hierarchical model of management and user clusters is a core principle of Anthos, whether it is with GKE On-Prem on VMWare or GKE on AWS, the orchestration on downstream clusters is all done through the Kubernetes API and CRDs.
With these prerequisites, we can populate the required config file to configure the management service.
Running anthos-gke aws management init will encrypt our service account keys and generate a root CA, writing these values to a configuration file.
To create the cluster, we need to apply the generated configuration.
This creates an S3 bucket for the gke-aws-node-agent binary, and initialises the Terraform modules which will provision the necessary infrastructure to host the GKE on AWS management cluster.
Once Terraform has successfully completed provisioning the infrastructure, we can see that a bastion host, as well as the necessary EC2 instances, ELBs and security groups have been created.
Management cluster instances
Management cluster LBs
Management cluster SGs
Management cluster VPC
Management cluster Subnets
These are Ubuntu 18.04 instances, launched into an auto scaling group with a launch template and user data running the gke-aws-node-agent, bootstrapping the instance to function as the management plane.
We can use the bastion host to gain access to the Kubernetes API by opening an SSH tunnel allowing anthos-gke to complete the setup.
After this we can connect to the management cluster using kubectl.
Now that we have our management cluster, we can provision user clusters to run our workloads. In AWS, user clusters manifest through the AWSCluster and AWSNodePool custom resources. This leverages a declarative, Kubernetes-style API approach for cluster creation, configuration and management.
If we take a look at the CRDs currently on the management cluster, we can see the two resources available to provision AWS clusters.
Our previous Terraform deployment can be used to generate a configuration for a basic user cluster.
As our custom resource is part of the Kubernetes parlance, we can interact with the object to see it’s state.
At this point, the AWS Controller is provisioning resources in AWS which will comprise our user cluster. The default GKE on AWS installation creates an AWSCluster with three control plane replicas in the same availability zones. The management cluster places the control planes in a private subnet behind an AWS Network Load Balancer (NLB). The management cluster interacts with the control plane using that NLB.
User cluster instances
Management cluster lbs
Management cluster SGs
Once the cluster bootstrap process is complete we can see the events.
Also, we can see the logs from the gke-aws-cluster-operator from the management instance.
With our GKE on AWS user cluster fully provisioned, we will see in the GCP console that the cluster has automatically been registered with the GKE Hub.
User cluster registered
As part of the bootstrap process, our cluster is registered with the GKE Hub using the service account key provided, as well as deploying a Connect agent into the user cluster. After the connection is established, the Connect Agent service can exchange account credentials, technical details, and metadata about connected infrastructure and workloads necessary to manage them with Google Cloud, including the details of resources, applications, and hardware.
All that remains is to obtain our kubeconfig for the user cluster.
Now we have our user cluster deployed, we can start deploying workloads and see how the AWS Controller dynamically provisions AWS resources subsequent to our Kubernetes objects.
For this demonstration, we’ll be using the Online Boutique demonstration application to illustrate a microservices application leveraging underlying integrations to facilitate persistent storage, load balancing and routing for GKE on AWS.
Recall the Connect agent running in the user cluster, which is sending details and metadata about the user cluster infrastructure and workloads to the GKE Hub. Consequently, all of our workloads are viewable in the Kubernetes Engine dashboard just like if they were running in any other GKE cluster.
Management cluster SGs
With our Online Boutique application deployed, a LoadBalancer service has been created to make it publically available. However, there are some steps necessary in order for the AWS Controller to facilitate routing to the services.
First, in order for the AWS Controller to configure our load balancer, we need to tag the subnet with the cluster ID to ensure correct placement. Depending on whether we want our Load Balancer to be public or private, the tags on the subnets allow the AWS Controller to find the correct cluster in which the GKE cluster has been deployed. By default, the AWS Controller will create a Classic ELB in the public subnet, in the according subnet pertinent to the GKE cluster’s placement. If we want to deploy an Network Load Balancer, or create a Load Balancer in the private subnet, we can annotate the Service with service.beta.kubernetes.io/aws-load-balancer-type: "nlb", and service.beta.kubernetes.io/aws-load-balancer-internal: "true" respectively.
Now, if we watch our frontend-external service, we get a public endpoint, with an ELB created in AWS:
Creating a PersistentVolumeClaim without the field spec.storageClassName set provisions a gp2 volume using the default GKE on AWS EBS CSI Driver StorageClass.
First, let’s create our PersistentVolumeClaim:
Then patch our redis deployment:
Now we can see the persistent volume claim is bound to the volume, with an EBS created in AWS.
Anthos GKE on AWS also provides for Cluster autoscaling to benefit from the elasticity of dynamically provisioning nodes in accordance to demand. This ensures that there are resources available relative to resource requests from workloads and that infrastructure is scaled in to optimise cost. Again, this is all achieved through custom resource definitions, with the AWSNodePool resource defining the minNodeCount and maxNodeCount, and the gke-aws-cluster-operator adjusting capacity of the AWS auto scaling group. All this aims to simplify scaling logic and reducing cognitive overhead when it comes to cost-efficient and performant compute infrastructure.
Scaling the NodePool out causes an additional EC2 instance to be launched.
User cluster scale out
Reducing the minNodeCount will cause our NodePool to scale in as the current resources requested by the workloads do not constitute the number of nodes currently deployed.
Now, we see the Cluster Autoscaler kick in and scale our NodePool down by terminating a node to optimise utilisation.
User cluster scale in
Logging and monitoring
We saw earlier how the Connect agent sends metadata of workloads running in the cluster back to GCP. As this metadata is collated we can begin to get a perspective of all of our workloads running across all of our GKE registered clusters. This is a key value-add feature of Anthos, as we are able to consolidate management and orchestration of our clusters and workloads through a single-pane-of-glass.
GKE On-Prem provides in-cluster Stackdriver agents, a combination of Fluent StatefulSets and Daemonsets, as well as Prometheus and Stackdriver sidecars, which collect logs and metrics for system components and sends this data to the Cloud Logging API.
At the time of writing, logging and monitoring for GKE on AWS is yet to be fully implemented. Support is due in a future release, with a similar implementation to GKE, where a gke-metrics-agent will deployed via a DaemonSet on the user cluster which scrapes metrics from the kubelet and pushes to Cloud Monitoring APIs. A FluentBit pipeline will also collect logs from all Containerd components and the kubelet. We can see some of this in action, with our application logs from AWS appearing in Cloud Logging.
GCP frontend logs
This means we get telemetry data and logs from both on-premises and cloud deployments, all viewable through the GCP Console. When it comes to performing root cause analysis, access to logs is critical hence consolidation of multiple clusters removes the overhead of managing and traversing multiple systems in order to identify issues and correlating cascading errors.
As well as Cloud Logging & Monitoring, Anthos also integrates with Elastic Stack, Splunk and Datadog, offering a variety of options to customise your approach to observability, complementing existing solutions in on-premises or cloud environments.
You might opt to disable GCP Logging and Monitoring, and instead use Grafana and Prometheus as your monitoring solutions. These are currently available as optional parts of the GKE On-Prem installation, so integrating these OSS solutions with GKE on AWS is also an option depending on your use case. One must assess the trade offs of opting out of GCP’s supported logging and monitoring solutions as well as maintaining your own monitoring stack, versus the costs of transferring telemetry data across from AWS into GCP. It would be possible to gain multi-cluster observability on top of Prometheus and Grafana through Thanos, but again this would require investment in implementing and maintaining such a deployment.
GKE for AWS offers the first opportunity to experience GKE on another cloud provider. Later this year we should see a preview of GKE running on Azure, furthering the boundaries of running applications on GKE outside of GCP. In addition to this, the on-premises offering will be furthered by having a bare-metal deployment option to run Anthos on physical servers. Again, this allows for companies to use existing hardware investments, paving the way for the adoption and standardisation of Kubernetes, with the option of running orthogonally with a cloud provider to optimise capex and opex investment.
With competition from other solutions such as VMware Tanzu and Azure Arc, as well as managed OpenShift offerings the hybrid and multi-cloud orchestration marketplace is maturing to a place where enterprises have a multitude of options when it comes to leveraging the cloud providers of their choice, and how they want to consolidate management and observability of workloads across disparate environments.
Anthos is not only a delivery mechanism for running Kubernetes on your infrastructure, but also provides the components required to optimise the experience of Kubernetes. Whether it’s improving continuous delivery and environmental consistency through Config Management and Policy Controller, or having fine-grained control over traffic shaping and centralised SLO and SLA orchestration using Anthos Service Mesh, Anthos provides enterprises with a repertoire from which they can migrate to, and leverage the power from Kubernetes.
Coming up, we’ll have more content on Anthos around:
Running applications across multi-cloud with GKE and Anthos Service Mesh
If you’re wanting to know more about Anthos or running hybrid and multi-cloud Kubernetes, Jetstack offers consulting and subscription which can help you in your investigation and adoption in a variety of ways. Let us know if you’re interested in a workshop or working together to dive deeper into Anthos.