Introduction
As Kubernetes continues to mature as the ubiquitous system for running distributed containerized workloads, so must the patterns for cluster lifecycle. Modern applications are designed to be Kubernetes-native, to be scalable, stateless and thus ephemeral. Whilst these design patterns are commonplace for applications, those around cluster and platform lifecycle must also evolve to satisfy the demands of cloud-native architectures.
With such a range of Kubernetes provisioners across cloud providers and local environments, it’s never been easier to create clusters in a repeatable, idempotent, deterministic way, and for users to start extracting the value Kubernetes has to offer.
At Jetstack, we’ve seen the shift in our customers using fewer long-lived clusters, and seeing their infrastructure and compute resources just as ephemeral as the workload it’s executing.
A standard practice we see after cluster provisioning is to jump in with cluster bootstrap workflows. Whether that’s using the same infrastructure as code (IaC) tool that deployed the cluster, CI to deploy directly to the cluster, or deploying a GitOps Controller to pull the desired state. These workflows might install further security hardening, configure cluster add-ons or go straight to deploying workloads. Whilst there are established ways for bootstrapping clusters, they invariably are done by submitting requests to the Kubernetes API server.
Motivation
This is unavoidable, however, what it does create is a window between cluster creation and cluster post-provisioning where the cluster is in an uninitialized state before the first request to the API server.
This initial request is accessing a completely ‘vanilla’ cluster, with no admission controllers, policies or RBAC configured.
When provisioning a GKE cluster, we provide options around which features to enable and what settings to use for the cluster’s configuration, however, this cluster creation doesn’t take into account the desired state of the cluster, specifically Kubernetes resources. Therefore, it’s up to a post-provisioning workflow (using tools like Helm to deploy to the Kubernetes API) to bootstrap clusters to install components in order to baseline their state before users onboard and begin to consume the cluster.
Instead, let’s bring all of our configuration and resources onto the cluster before anyone outside of GCP accesses it.
Google Cloud Anthos
As we’ve discussed previously, Google Cloud Anthos is a framework of software components, orientated around managing the deployment and life-cycling of infrastructure and workloads across multiple environments and locations. With enterprises having established infrastructure presences across multiple cloud providers and on-premises locations, Anthos centralises the orchestration of segregated clusters and workloads, providing a single-pane-of-glass across hybrid and multi-cloud topologies. This consolidates operations and provides consistency across cloud providers, whilst embracing existing infrastructure investments and unlocking new possibilities for hybrid and multi-cloud compositions. This also allows for companies to modernise in place, continuing to run workloads on-prem or on their infrastructure but adopting Kubernetes and cloud-native principles.
As well as a Kubernetes distribution, Anthos also provides ways to simplify hybrid and multi-cloud consistency and compliance through Config Management & Policy Controller. With Config Management, the GitOps methodology is adopted to reconcile the observed state with the desired state for Kubernetes objects in source control through a cluster Operator. Policy Controller facilitates conformance at scale by building on Gatekeeper to provide a constraint template library to ensure consistency and compliance of configuration, as well as offering extensibility through writing policies using OPA and Rego.
Anthos is orientated around being the management plane for all of your enterprise workload clusters, providing a centralized, consolidated hub to orchestrate infrastructure and applications.
Additionally, through Anthos' add-on features the experience is enriched to facilitate cluster and application administration with Config Management, compliance at scale with Policy Controller.
Proposal
This pattern focuses on shifting cluster initialization for GKE into the provisioning workflow, as first-class operations in GCP APIs. Specifically, instead of bootstrapping clusters via the Kubernetes API server, all stages are consolidated into API requests to GCP as part of provisioning.
Pattern
This way, the desired state of the cluster is reconciled by first-class operators in GKE, thus ensuring that our provisioning workflow is encapsulated into native operations in GCP, as opposed to being segregated across multiple components.
This approach leverages components of Google Cloud’s Anthos, specifically Anthos Config Management (ACM) and GKE Hub. ACM is a GitOps tool with support for Kustomize and Helm, that can be enabled and configured through the GKE Hub. Here we’re composing a workflow where tools are highly cohesive and work together to encapsulate the cluster lifecycle as an out-of-the-box experience.
Provisioning
After creating a GKE cluster, we need to register it within the GKE Hub by creating a membership. Next, we enable the config-management feature. Once these are in place, we can then apply the config-management feature to the membership, resulting in Config Sync being deployed to our cluster.
gcloud container cluster create ...
gcloud container fleet memberships register gke-acm-bootstrap --gke-cluster=gke-acm-bootstrap --enable-workload-identity
gcloud beta container hub config-management enable
gcloud beta container hub config-management apply \
--membership=gke-acm-bootstrap --config=apply-spec.yaml
$ cat apply-spec.yaml
applySpecVersion: 1
spec:
configSync:
enabled: true
sourceFormat: unstructured
syncRepo: https://github.com/paulwilljones/gke-acm-bootstrap
syncBranch: develop
secretType: none
policyDir: helm-components
At this point, Config Sync is fetching the manifests from our Git repository, allowing us to maintain a centralized definition for our cluster’s state and have that propagated out to all cluster deployments.
This pattern is supported across GKE cluster provisioners as these resources are part of the Anthos Multicluser Management API. Therefore, we don’t need to change what provisions the cluster, but just extend the definition to include these GCP resources.
With that said, here are a couple of other examples of familiar provisioners that deploy resources to GKE Hub.
data "google_container_cluster" "cluster" {
name = "gke-acm-bootstrap"
location = "europe-west2"
project = "jetstack-paul"
}
resource "google_gke_hub_membership" "membership" {
project = "jetstack-paul"
membership_id = "gke-acm-bootstrap"
endpoint {
gke_cluster {
resource_link = "//container.googleapis.com/${data.google_container_cluster.cluster.id}"
}
}
provider = google-beta
}
resource "google_gke_hub_feature" "feature" {
name = "configmanagement"
location = "global"
project = "jetstack-paul"
provider = google-beta
}
resource "google_gke_hub_feature_membership" "feature_member" {
project = "jetstack-paul"
location = "global"
feature = google_gke_hub_feature.feature.name
membership = google_gke_hub_membership.membership.membership_id
configmanagement {
config_sync {
git {
sync_repo = "https://github.com/paulwilljones/gke-acm-bootstrap"
sync_branch = "develop"
secret_type = "none"
policy_dir = "helm-components"
sync_wait_secs = 5
}
source_format = "unstructured"
}
}
provider = google-beta
}
Pulumi similarly supports the GKE Hub API:
cluster = gcp.container.get_cluster(name="gke-acm-bootstrap",
location="europe-west2")
membership = gcp.gkehub.Membership("membership",
membership_id="gke-acm-bootstrap",
endpoint=gcp.gkehub.MembershipEndpointArgs(
gke_cluster=gcp.gkehub.MembershipEndpointGkeClusterArgs(
resource_link=f"//container.googleapis.com/{cluster.id}",
),
),
)
feature = gcp.gkehub.Feature(name="configmanagement",
resource_name="configmanagement",
location="global",
)
feature_member = gcp.gkehub.FeatureMembership("featureMember",
location="global",
feature=feature.name,
membership=membership.membership_id,
configmanagement=gcp.gkehub.FeatureMembershipConfigmanagementArgs(
config_sync=gcp.gkehub.FeatureMembershipConfigmanagementConfigSyncArgs(
git=gcp.gkehub.FeatureMembershipConfigmanagementConfigSyncGitArgs(
sync_repo="https://github.com/paulwilljones/gke-acm-bootstrap",
sync_branch="develop",
policy_dir="helm-components",
sync_wait_secs="5",
secret_type="none"
),
source_format="unstructured"
),
),
)
Through these resources, we can consolidate the operations needed to create, initialise and configure our cluster into one declarative definition.
Config Connector
By extending the Kubernetes API, Config Connector enables the management of GCP services and resources as Kubernetes Custom Resources Definitions (CRDs) from within our cluster. This allows for dependencies such as Google Service Accounts, Firewall rules and CloudSQL instances to be packaged along with our application deployment definitions. Representing all GCP and Kubernetes resources together reduces the cognitive overhead found when splitting provisioning across multiple IaC tools with different lifecycle management and CICD processes. We’re also taking advantage of the fundamental principle in Kubernetes of declaring our desired state, and reconciling with the actual state.
Here, we’re again creating our GKE Hub resources to enable the ACM Feature, create a GKE Hub Membership and deploy Config Sync to our cluster.
apiVersion: gkehub.cnrm.cloud.google.com/v1beta1
kind: GKEHubFeature
metadata:
name: gke-acm-bootstrap
spec:
projectRef:
external: jetstack-paul
location: global
# The resourceID must be "configmanagement" if you want to use Anthos config
# management feature.
resourceID: configmanagement
---
apiVersion: gkehub.cnrm.cloud.google.com/v1beta1
kind: GKEHubMembership
metadata:
name: gke-acm-bootstrap
spec:
location: global
endpoint:
gkeCluster:
resourceRef:
external: //container.googleapis.com/projects/jetstack-paul/locations/europe-west2/clusters/gke-acm-bootstrap
---
apiVersion: gkehub.cnrm.cloud.google.com/v1beta1
kind: GKEHubFeatureMembership
metadata:
name: gke-acm-bootstrap
spec:
projectRef:
external: jetstack-paul
location: global
membershipRef:
name: gke-acm-bootstrap
featureRef:
name: gke-acm-bootstrap
configmanagement:
configSync:
sourceFormat: unstructured
git:
syncRepo: "https://github.com/paulwilljones/gke-acm-bootstrap"
syncBranch: "develop"
policyDir: "config-root"
secretType: "none"
syncWaitSecs: "5"
The beauty of Config Connector and this approach is that Kubernetes resources become the common denominator for our infrastructure, platform and application provisioning. Kubernetes becomes the ubiquitous API for managing the lifecycle of GCP solutions. As these are GCP resource definitions, they pertain to GCP API requests, meaning Config Connector can be running in any GKE cluster, and be a proxy to create GCP resources (across the supported GCP APIs). This model can be extended by creating GKE clusters using Config Connector, and then use this bootstrap pattern to initialize the desired state of the newly provisioned cluster.
apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerCluster
metadata:
annotations:
cnrm.cloud.google.com/project-id: jetstack-paul
name: gke-acm-bootstrap
spec:
location: europe-west2
initialNodeCount: 1
workloadIdentityConfig:
workloadPool: gkehubfeaturemembership.svc.id.goog
Kustomize and Helm
With support for Kustomize and Helm in ACM, we can define which Helm Charts we want to be deployed onto the cluster by Config Sync.
This allows for a powerful composition of using GitOps to automatically render and deploy all the cluster addons and configurations required in order to baseline the cluster before it’s brought into service.
helmCharts:
- name: cert-manager
repo: https://charts.jetstack.io
releaseName: cert-manager
namespace: cert-manager
valuesInline:
installCRDs: true
- name: external-dns
repo: https://kubernetes-sigs.github.io/external-dns/
releaseName: external-dns
namespace: external-dns
- name: argo-cd
repo: https://argoproj.github.io/argo-helm
releaseName: argocd
namespace: argocd
- name: external-secrets
repo: https://charts.external-secrets.io
releaseName: external-secrets
namespace: external-secrets
- name: rbac-manager
repo: https://charts.fairwinds.com/stable
releaseName: rbac-manager
namespace: rbac-manager
Following this, there is a fully provisioned GKE cluster running fully configured addons and applications, including admission controllers before any non-GCP Kubernetes API server requests have been sent.
$ kubectl get all -A
...
Ordering Resource Dependencies
Even though declarative resources are designed to be eventually consistent, there are instances where the order of deployments matters. For this, ACM has an ordering feature for stating which dependencies must be reconciled before dependent resources are deployed.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
annotations:
config.kubernetes.io/depends-on: apps/namespaces/cert-manager/Deployment/cert-manager
...
This can be combined with our Helm deployments through Config Sync above by setting annotations as values to the Chart.
# kustomization.yaml
patchesJson6902:
- target:
group: apps
version: v1
kind: Deployment
name: cert-manager
path: ./annotation.yaml
helmCharts:
- name: cert-manager
repo: https://charts.jetstack.io
releaseName: cert-manager
namespace: cert-manager
version: 1.8.0
includeCRDs: true
valuesInline:
installCRDs: true
---
# annotation.yaml
- op: add
path: /metadata/annotations
value:
config.kubernetes.io/depends-on: apps/namespaces/external-secrets/Deployment/external-secrets
Policy Controller
Another component within ACM is Policy Controller, which is based on the open-source Open Policy Agent Gatekeeper project. This acts as an admission controller, with a library of Constraints acting as policies to enforce security and compliance controls on Kubernetes clusters.
As Policy Controller can be enabled as part of the ACM configuration, it too can be preloaded as part of the cluster bootstrap. We can then define Constraints in our Git repository that will be reconciled by Config Sync and enforced by Policy Controller on our cluster.
resource "google_gke_hub_feature_membership" "feature_member" {
project = "jetstack-paul"
location = "global"
feature = google_gke_hub_feature.feature.name
membership = google_gke_hub_membership.membership.membership_id
configmanagement {
config_sync {
git {
sync_repo = "https://github.com/paulwilljones/gke-acm-bootstrap"
sync_branch = "develop"
secret_type = "none"
policy_dir = "config-root"
sync_wait_secs = 5
}
source_format = "unstructured"
}
policy_controller {
enabled = true
template_library_installed = true
referential_rules_enabled = true
exemptable_namespaces = ["kube-system", "config-management-system", "config-management-monitoring", "resource-group-system", "asm-system", "gke-connect"]
}
}
depends_on = [
google_gke_hub_feature.configmanagement_acm_feature
]
provider = google-beta
}
We can leverage various policy bundles to enforce sets of Constraints to meet the required security benchmarks. These cover CIS Kubernetes Benchmarks v1.5.1, Pod Security Standards (PSS) Baseline policy, Policy Essentials v2022, as well as PSP v2022 to enforce Constraints
based on deprecated PodSecurityPolicies
.
As Config Sync is pulling policies from source control, this is a pattern that can be rolled out to all clusters across a fleet, resulting in uniform organizational compliance.
$ kubectl get constraints
...
At this point, our cluster has been fully provisioned out-of-the-box, with all the security hardening, compliance and customizations deployed, with core addon components required by cluster operators and developers alike.
Adopting
Using this pattern encapsulates all of the cluster provisioning processes into a unary workflow. There is no longer a need to manage a seed process outside of the GCP perimeter to access the cluster to initiate ancillary installations. Instead, there is a clean, self-managed, pull-based mechanism which is native to GCP and GKE.
Existing infrastructure provisioning implementations can be extended to adopt this pattern. This simplifies the cluster and platform lifecycles, all through GCP-managed services and GKE components to achieve consistent, consolidated, compliant cluster deployments.
EOF
At Jetstack Consult, we’re often helping customers adopt and mature their cloud-native and Kubernetes offerings. If you’re interested in discussing how Google Cloud Anthos and GKE can help your digital transformation, get in touch and see how we can work together.
Paul is a Google Cloud Certified Fellow with a focus on application modernization, Kubernetes administration and complementing managed services with open-source solutions. Find him on Twitter & LinkedIn.