The Cluster API is a Kubernetes project that brings declarative, Kubernetes-style APIs to cluster creation. It does this by using CustomResourceDefinitions to extend the API exposed by the Kubernetes API Server, allowing users to create new resources such as Clusters (representing a Kubernetes cluster) and Machines (representing the machines that make up the Nodes that form the cluster). A controller for each resource is then responsible for reacting to changes to these resources to bring up the cluster. The API is designed in such a way that different infrastructure providers can integrate with it to provide their environment specific logic.
The Cluster API project is still in the early stages, but what is currently possible already demonstrates the enormous power it brings. The aim of this post is to summarise the capabilities of the project to date and to look ahead to what is in store for subsequent releases.
Past, present and future
At the time of writing, the most recent release of the Cluster API implements the v1alpha2 version. Here we discuss the transformation of this API and how providers can integrate with it.
Past: v1alpha1
The initial v1alpha1 implementation of the Cluster API requires providers to include the Cluster API controller code in their project and to implement actuators (interfaces) to handle their environment specific logic (for example calls to cloud provider APIs). The code runs as a single provider specific manager binary which manages a controller for each of the resources required to manage a cluster.
Present: v1alpha2
One of the pain points of the v1alpha1 method of consuming the Cluster API is that it requires each provider to implement a certain amount of bootstrap boilerplate code, typically using kubeadm. To remedy this, v1alpha2 introduces bootstrap providers which are responsible for generating the data required to turn a Machine into a Kubernetes Node. The kubeadm bootstrap provider is a bootstrap provider implementation that is able to handle this task for all environments using kubeadm. Its default behavior is to generate a cloud-config script for each Machine which can be used to bootstrap the Node.
Another change introduced by v1alpha2 is that it is no longer necessary for providers to include Cluster API controller code in their projects. Instead, Cluster API offers independent controllers responsible for the core types. For further details on the motivations behind these changes see the proposal.
For this version there are now three managers (instead of one) that need to be deployed:
- Cluster API manager: to manage core v1alpha2 resources
- Bootstrap provider manager: to manage resources to generate the data to turn a Machine into a Kubernetes Node
- Infrastructure provider manager: to manage resources that provide the infrastructure required to run the cluster
For example, if I wanted to create a cluster on GCP configured using kubeadm, I would deploy the Cluster API manager (to reconcile core resources, for example Cluster and Machine resources), the kubeadm bootstrap provider (to reconcile KubeadmConfig resources, for example) and the GCP infrastructure provider (to reconcile environment specific resources, for example GCPClusters and GCPMachines).
To see how these resources should be applied, we will run through a cluster deployment using a Kubernetes infrastructure provider implementation that I wrote — that is, a provider where the infrastructure is provided by Kubernetes itself. Kubernetes Nodes run as Kubernetes Pods using kind images.
To start, we need to create a base cluster to provide the infrastructure for our Cluster API cluster. We will be using GKE here. The following commands assume you have gcloud
installed with a GCP project and billing account set up.
WARNING: the gcloud
commands will cost money — consider using the GCP Free Tier.
Calico will be used as the CNI solution for the Cluster API cluster. This requires some particular configuration when provisioning the GKE cluster in order to route IPv4 encapsulated packets. To not distract from describing Cluster API behavior we will run them here without explanation. Refer to the Kubernetes infrastructure provider repository for details.
gcloud container clusters create management-cluster --cluster-version=1.14 --image-type=UBUNTU
CLUSTER_CIDR=$(gcloud container clusters describe management-cluster --format="value(clusterIpv4Cidr)")
gcloud compute firewall-rules create allow-management-cluster-pods-ipip --source-ranges=$CLUSTER_CIDR --allow=ipip
kubectl apply -f <(cat <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: forward-ipencap
namespace: kube-system
labels:
app: forward-ipencap
spec:
selector:
matchLabels:
name: forward-ipencap
template:
metadata:
labels:
name: forward-ipencap
spec:
hostNetwork: true
initContainers:
- name: forward-ipencap
command:
- sh
- -c
- |
apk add iptables
iptables -C FORWARD -p ipencap -j ACCEPT || iptables -A FORWARD -p ipencap -j ACCEPT
image: alpine:3.11
securityContext:
capabilities:
add: ["NET_ADMIN"]
containers:
- name: sleep-forever
image: alpine:3.11
command: ["tail"]
args: ["-f", "/dev/null"]
EOF
)
With the GKE cluster provisioned, we can now deploy the necessary managers.
# Install cluster api manager
kubectl apply -f https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.2.8/cluster-api-components.yaml
# Install kubeadm bootstrap provider
kubectl apply -f https://github.com/kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm/releases/download/v0.1.5/bootstrap-components.yaml
# Install kubernetes infrastructure provider
kubectl apply -f https://github.com/dippynark/cluster-api-provider-kubernetes/releases/download/v0.2.1/provider-components.yaml
# Allow cluster api controller to interact with kubernetes infrastructure resources
# If the kubernetes provider were SIG-sponsored this would not be necesarry ;)
# https://cluster-api.sigs.k8s.io/providers/v1alpha1-to-v1alpha2.html#the-new-api-groups
kubectl apply -f https://github.com/dippynark/cluster-api-provider-kubernetes/releases/download/v0.2.1/capi-kubernetes-rbac.yaml
We can now deploy our cluster.
kubectl apply -f <(cat <<EOF
apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
kind: KubernetesCluster
metadata:
name: example
spec:
controlPlaneServiceType: LoadBalancer
---
apiVersion: cluster.x-k8s.io/v1alpha2
kind: Cluster
metadata:
name: example
spec:
clusterNetwork:
services:
cidrBlocks: ["172.16.0.0/12"]
pods:
cidrBlocks: ["192.168.0.0/16"]
serviceDomain: "cluster.local"
infrastructureRef:
apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
kind: KubernetesCluster
name: example
EOF
)
Here we define our environment specific KubernetesCluster resource. This is expected to provision the necessary infrastructure components needed to run a Kubernetes cluster. For example, a GCPCluster might provision a VPC, firewall rules and a load balancer to reach the API Server(s). Here our KubernetesCluster just provisions a Kubernetes Service of type LoadBalancer for the API Server. We can query the KubernetesCluster to see its status
$ kubectl get kubernetescluster
NAME PHASE HOST PORT AGE
example Provisioned 35.205.255.206 443 51s
We reference our provider specific cluster resource from our core Cluster resource which provides networking details for the cluster. The KubernetesCluster will be modified to be owned by the Cluster resource.
We are now ready to deploy our Machines. Here we create a controller Machine which references the infrastructure provider specific KubernetesMachine resource together with a bootstrap provider specific KubeadmConfig resource.
kubectl apply -f <(cat <<EOF
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
kind: KubeadmConfig
metadata:
name: controller
spec:
initConfiguration:
nodeRegistration:
kubeletExtraArgs:
eviction-hard: nodefs.available<0%,nodefs.inodesFree<0%,imagefs.available<0%
cgroups-per-qos: "false"
enforce-node-allocatable: ""
clusterConfiguration:
controllerManager:
extraArgs:
enable-hostpath-provisioner: "true"
---
apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
kind: KubernetesMachine
metadata:
name: controller
---
apiVersion: cluster.x-k8s.io/v1alpha2
kind: Machine
metadata:
name: controller
labels:
cluster.x-k8s.io/cluster-name: example
cluster.x-k8s.io/control-plane: "true"
spec:
version: "v1.17.0"
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
kind: KubeadmConfig
name: controller
infrastructureRef:
apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
kind: KubernetesMachine
name: controller
EOF
)
The kubeadm bootstrap provider turns the KubeadmConfig resource into a cloud-config script which is consumed by the Kubernetes infrastructure provider to bootstrap a Kubernetes Pod to form the control plane for the new cluster.
The Kubernetes infrastructure provider does this by leaning on systemd which runs as part of the kind image; a bash script is generated from the cloud-config script to create and run the specified files and commands. The script is mounted into the Pod using a Kubernetes Secret which is then triggered using a systemd path unit once the containerd socket is available. You can exec into the controller Pod and run journalctl -u cloud-init
to see the output of this script. cat /opt/cloud-init/bootstrap.sh
will show the full script.
Once the kubelet is running it registers itself with the cluster by creating a controller Node object in etcd (also running on the controller Pod).
We can now deploy our worker Machines. This looks quite similar to the controller Machine provisioning except we make use of a MachineDeployment, KubeadmConfigTemplate and KubernetesMachineTemplate to request multiple replicas of a worker Node.
kubectl apply -f <(cat <<EOF
apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
kind: KubernetesMachineTemplate
metadata:
name: worker
spec:
template:
spec: {}
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
kind: KubeadmConfigTemplate
metadata:
name: worker
spec:
template:
spec:
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
eviction-hard: nodefs.available<0%,nodefs.inodesFree<0%,imagefs.available<0%
cgroups-per-qos: "false"
enforce-node-allocatable: ""
---
apiVersion: cluster.x-k8s.io/v1alpha2
kind: MachineDeployment
metadata:
name: worker
labels:
cluster.x-k8s.io/cluster-name: example
nodepool: default
spec:
replicas: 3
selector:
matchLabels:
cluster.x-k8s.io/cluster-name: example
nodepool: default
template:
metadata:
labels:
cluster.x-k8s.io/cluster-name: example
nodepool: default
spec:
version: "v1.17.0"
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
kind: KubeadmConfigTemplate
name: worker
infrastructureRef:
apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
kind: KubernetesMachineTemplate
name: worker
EOF
)
MachineDeployments work similarly to Kubernetes Deployments in that they manage MachineSets which in turn manage the desired number of replicas of Machines.
We should now be able to query the Machines we have provisioned to see their status.
$ kubectl get machines
NAME PROVIDERID PHASE
controller kubernetes://871cde5a-3159-11ea-a1c6-42010a840084 provisioning
worker-6c498c48db-4grxq pending
worker-6c498c48db-66zk7 pending
worker-6c498c48db-k5kkp
We can also see the corresponding KubernetesMachines.
$ kubectl get kubernetesmachines
NAME PROVIDER-ID PHASE AGE
controller kubernetes://871cde5a-3159-11ea-a1c6-42010a840084 Provisioning 53s
worker-cs95w Pending 35s
worker-kpbhm Pending 35s
worker-pxsph Pending 35s
Soon all KubernetesMachines should be in a Running state.
$ kubectl get kubernetesmachines
NAME PROVIDER-ID PHASE AGE
controller kubernetes://871cde5a-3159-11ea-a1c6-42010a840084 Running 2m
worker-cs95w kubernetes://bcd10f28-3159-11ea-a1c6-42010a840084 Running 1m
worker-kpbhm kubernetes://bcd4ef33-3159-11ea-a1c6-42010a840084 Running 1m
worker-pxsph kubernetes://bccd1af4-3159-11ea-a1c6-42010a840084 Running 1m
We can also see the Pods corresponding to our KubernetesMachines.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
controller 1/1 Running 0 2m11s
worker-cs95w 1/1 Running 0 111s
worker-kpbhm 1/1 Running 0 111s
worker-pxsph 1/1 Running 0 111s
The Cluster API manager generates a kubeconfig and stores it as a Kubernetes Secret called <clusterName>-kubeconfig
. We can retrieve that and access the cluster.
$ kubectl get secret example-kubeconfig -o jsonpath='{.data.value}' | base64 --decode > example-kubeconfig
$ export KUBECONFIG=example-kubeconfig
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
controller NotReady master 3m16s v1.17.0
worker-cs95w NotReady <none> 2m34s v1.17.0
worker-kpbhm NotReady <none> 2m32s v1.17.0
worker-pxsph NotReady <none> 2m34s v1.17.0
Finally, we can apply our Calico CNI solution. The Nodes should soon become Ready.
$ kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
controller Ready master 5m8s v1.17.0
worker-cs95w Ready <none> 4m26s v1.17.0
worker-kpbhm Ready <none> 4m24s v1.17.0
worker-pxsph Ready <none> 4m26s v1.17.0
We can now run workloads on our brand new cluster! kubectl run nginx --image=nginx --replicas=3
This flow would be similar for other infrastructure providers. Many other examples can be found in the Cluster API quick start.
Future: v1alpha3 and beyond
We are only just scratching the surface of the capabilities Cluster API has the potential to provide. We will go over some of the other cool things that are on the roadmap.
MachineHealthCheck
In v1alpha2 an infrastructure specific Machine can mark itself as failed and the status will bubble up to the owning Machine, but no action is taken by an owning MachineSet. The reason for this is that resources other than a MachineSet could own the Machine and so it makes sense for Machine remediation logic to be decoupled from MachineSets.
MachineHealthCheck is a proposed resource to describe failure scenarios for Nodes and to delete the corresponding Machine should one occur. This would trigger the appropriate deletion behaviour (e.g. drain) and any controlling resource to bring up a replacement Machine.
KubeadmControlPlane
Currently, creating a HA control plane and managing the control plane in general requires carefully configuring independent controller Machines with the correct bootstrap configuration (which need to come up in the correct order). v1alpha3 looks to support control plane providers with an initial kubeadm control plane implementation. This will require few changes from an infrastructure provider perspective but will allow users to manage the instantiation and scaling of the control plane without manually creating the corresponding Machines. The kubeadm control plane proposal provides further details.
Together with MachineHealthChecks, automatic control plane remediation would be possible using the Cluster API.
Cluster Autoscaler
Cluster Autoscaler is one example of a project that can leverage Cluster API. The current implementation requires each supported cloud provider to implement the CloudProvider and NodeGroup interfaces necessary for scaling groups of instances in their environment. With the advent of Cluster API, autoscaling logic could be implemented in a provider agnostic way by interacting with Cluster API resources instead of directly with provider specific APIs.
Summary
We have taken quite an in-depth look at the current capabilities of the Cluster API and what to look forward to in the near future. It’s a very exciting time for the project as it looks to reach completeness. As with almost anything Kubernetes related, opportunities to contribute are open and numerous.