In the previous post, Config Controller and Config Connector were used to create a GKE Fleet and bootstrap the clusters with Anthos Config Management and Anthos Service Mesh. This deployment demonstrated how to declaratively provision a Fleet of clusters completely using a GitOps approach, and centralising configuration and operations into a single source-of-truth.
In this post, the GKE clusters in the Fleet will continue provisioning using the enabled Anthos Configuration Management to reconcile their cluster configuration, security Constraints as well as addon components and application workloads.
gke-config-controller
User Clusters
With Anthos Config Management enabled on each GKE cluster, a RootSync is configured to synchronise resources from a Git repository, branch and directory.
Looking at the structure of the repository that these user clusters are reconciling against, based on their GKEHubFeatureMembership configuration, the user-clusters environment directories (dev and staging) simply contain a kustomization.yaml that synchronizes a base depending on the ‘type’ of GKE cluster (Standard and Autopilot), as well as any patches required for that particular environment.
Each of the kustomizations synchronises a different base depending on the environment, or specifically the type of GKE in that environment. These bases let us encapsulate all the resources for that mode of GKE (gke-kcc-autopilot or gke-kcc-standard).
Each of these bases for a GKE Autopilot cluster represents a module that can be added to the cluster’s setup. In this use case, they include cluster configurations such as Constraints to ensure each cluster conforms to compliance requirements, as well as installing platform components like monitoring agents, cert-manager and an Istio ingress-gateway. Again, these bases can be patched depending on their usage within that cluster.
Addons
Whilst ACM is the tool of choice for managing GKE clusters within the Fleet and being key to the hands-off bootstrap process, other components can be deployed to the cluster to handle the application lifecycle. With support for Helm Charts in ACM’s use of Kustomize, installing components like FluxCD can be managed during cluster bootstrap.
Note: when Helm Charts are referenced in kustomize, the resources are hydrated and rendered within the ACM reconciler Deployment and applied directly as raw manifests to the cluster. Therefore, the Helm Chart itself is not installed as a release and the lifecycle of the applied resources is not managed through Helm.
Note: If you do not specify resource requests for some containers in a Pod, GKE Autopilot applies default values. As such, resource requests are set as inline values to the FluxCD Helm Chart.
Note: In this example, CustomResourceDefinitions are included in the fluxcd Helm Chart. These cannot be directly applied by ACM as they contain status fields which Config Sync does not allow. Therefore, these fields are removed by a Kustomize patch.
With FluxCD installed along with its CustomResourceDefinitions, subsequent addons can be installed using the HelmRelease and Kustomization resources.
Note: When running cert-manager in GKE Autopilot, a different leader election namespace needs to be set as the kube-system namespace cannot be accessed.
Similarly, external-dns and istio-ingress can be installed using HelmReleases. This pattern allows for addons to be added to the cluster bootstrap and lifecycled by Config Sync from a central repository. Again, this is reconciled across all clusters in the Fleet, meaning that we have a single source of truth that propagates to each Fleet member.
Not only does this guarantee consistency across the add-ons that are deployed and how they’re configured, but also the security configuration of each cluster. As mentioned, a significant benefit to using ACM and Config Sync for cluster bootstrap is that the process is managed through Google Cloud APIs, not requests to the Kubernetes API Server after cluster provisioning. Therefore, including resources at this stage guarantees the enforcement of policy and ensures compliance with the cluster before tenants are onboarded.
Constraints
Anthos Config Management can also utilize Policy Controller to enforce Constraints as an Admission Controller Webhook. Once enabled as part of the GKEHubFeatureMembership in each cluster, Policy Controller then installs the Constraint Library allowing for Constraints to be used that define the specific policies that must be enforced in the cluster.
These Constraints will represent best practices for secure Kubernetes environments (eg. CIS Kubernetes benchmarks), as well as regulatory compliance (eg. PCI DSS), as well as custom Constraints that align with organizational security controls. Sets of Constraints can be applied as bundles that are maintained by Google Cloud.
As these bundles are opinionated resources that define a set of standard security controls, they should be treated as immutable and applied directly to each cluster. As such, each bundle should be synchronized with the cluster using a dedicated RootSync to manage the reconciliation process. Each RootSync is visible in the Anthos Dashboard, showing what packages and resources are being applied to each cluster in the Fleet.
Config Sync
With policy bundles being enforced on each cluster, the Policy UI in Google Cloud provides visibility into each environment’s compliance and violations.
Policy Overview
Here, each Constraint’s enforcement is reported with a holistic assessment of how compliant each GKE cluster is within the Fleet. At scale, this is a significant capability to evidence to auditors and compliance officers that regulations and organizational controls are being enforced and met.
Policy Violations
Monitoring
Lastly, with Google Managed Prometheus (GMP) enabled on both GKE Standard and Autopilot clusters, monitoring capabilities are managed whilst additional components can then be integrated into the solution. Specifically, kube-state-metrics, node-exporter along with ClusterPodMonitoring/PodMonitoring resources to scrape metrics for GMP to collect.
With cluster metrics in GMP, a standalone Prometheus UI and Grafana can then be deployed as addon components and configured to integrate with GMP.
Note: Managed Prometheus is on by default in GKE Autopilot clusters running GKE version 1.25 or greater.
Note: to sync resources from other Git repositories, a RootSync/RepoSync could be used or alternatively GitRepository and Kustomization resources from FluxCD. This again will largely depend on the use case for when to use Anthos Config Management or open-source GitOps tool, but in this instance, there are resources in the source repository that require patching for each environment. Therefore, the combination of GitRepository and Kustomization provides the capabilities to modify and apply resources whilst reconciling from a source of truth.
Whilst Google Managed Prometheus is supported on GKE Autopilot, components like node-exporter can’t be deployed due to their dependency on elevated privileges and GKE Autopilot not granting access to the underlying nodes. node-exporter is therefore patched into the GitRepository used for ‘staging’ which uses a GKE Standard cluster.
Note: The GitRepository in this example is used to define the source from which a Kustomization then synchronises resources. Patching the GitRepository isn’t particularly elegant, but it is required to set the correct spec.ignore for this environment. Specifically, as this is used in GKE Standard clusters, we can include the node-exporter resources, whereas the original GitRepository only includes kube-state-metrics and frontend.
With kube-state metrics deployed and metrics ingested into GMP, the standalone Prometheus UI for GMP deployed and configured as a datasource for Grafana, the kube-state-metrics dashboards can be imported and all metrics across the entire Fleet can be visualized in one place.
kube-state-metrics in Grafana
At this point, all cluster addons and configurations have been reconciled across the Fleet. Any updates to the central repository will be propagated out to all the member clusters, simplifying the operational overhead to maintain consistent cluster configuration at scale. Should different environments require specific configuration for testing before being promoted, RootSync and ACM resources should be set to pull specific branches or directories of the central repository, with Pull Requests governing the change management process.
The clusters have been fully bootstrapped and meet the organisational requirements for compliance, as well as being functional with add-ons and platform capabilities. Without sending a request directly to a Kubernetes API Server (outside of Google Cloud management), the entire Fleet is being reconciled by Config Controller managing the Fleet member clusters, and Config Sync bootstrapping each cluster with all platform components and configuration.
Upon each cluster becoming available for tenants to consume, application workloads can then be rolled out across the Fleet. With Config Sync managing the installation of FluxCD on each cluster, application deployments can leverage this capability for continuous delivery. Through HelmRelease and Kustomization resources, applications can follow the same GitOps principles for pull-based deployments, either using the same central repository with environment-specific configurations or pulling from application-specific repositories.
Bank of Anthos
Using Bank of Anthos as an example application deployment, GitRepository and Kustomization resources can be used to deploy all the workloads as well as supplementary configuration and dependencies.
The GitRepository resources set up the source for each Kustomization to sync into the user clusters from the spec.path locations in the repository. Some are patched to make use of Workload Identity and work with HTTPS traffic from the ingress gateway.
Whilst these patches are required for the application to function in GKE, additional patches are needed to differentiate the deployments between environments. Within the base ACM configuration for each cluster (user-clusters/dev/kustomization.yaml), resources that are managed by Kustomize can be patched to reflect the environment they are managing. In this instance, the Kustomization for Bank of Anthos itself needs to be patched. However, the patches that are included are themselves patches that need to be applied to resources the Kustomization deploys.
This ensures that the Kustomization managing the sync of the Istio Gateway resource will patch the listeners to allow ingress traffic for that particular environment.
Alongside the GitRepository and Kustomization for Bank of Anthos, other resources are included in the base that further secures the application. Specifically, NetworkPolicy, AuthorizationPolicy and PeerAuthentication resources are applied to ensure that strict mTLS between the services is enforced, as well as only allowing connections between certain services and into the namespace from the ingress-gateway.
Once all of these resources have been applied, their enforcement can be seen in the Anthos Service Mesh dashboard in the Google Cloud Console.
Anthos Service Mesh Policies
This shows which services are communicating over mTLS, the topology graph of the mesh as well as the telemetry being collected from each Envoy sidecar proxy.
The Kubernetes API has not only become the defacto interface for running containerized workloads, but the ability to extend it and use controllers to reconcile a desired state from a declarative source is centralizing operations and creating new compositions for how to manage and orchestrate cloud resources.
With Config Controller and Config Connector implementing the desired state in Google Cloud, GKE becomes the source of truth for the entire organization's cloud inventory. Centralizing these organizational operations into GKE creates new patterns for composing infrastructure and platform services that can be adopted using existing Kubernetes tooling, workflows and operating models.
As Google Anthos continues to support deploying Anthos clusters across VMware, Bare Metal, AWS and Azure from the GKE Hub API, Anthos Fleet management can provide a holistic view of an entire organization's cluster compliance and operational status beyond Google Cloud, as well as providing mechanisms to orchestrate fleet-wide capabilities for platform consistency and compliance enforcement at scale.
At Jetstack Consult, we’re often helping customers adopt and mature their cloud-native and Kubernetes offerings. If you’re interested in discussing how Google Cloud Anthos and GKE can help your digital transformation, get in touch and see how we can work together.
Paul is a Google Cloud Certified Fellow with a focus on application modernization, Kubernetes administration and complementing managed services with open-source solutions. Find him on Twitter & LinkedIn.