Chances are, you're already familiar with the world of managing machine identities in the data center—whether it is to get full visibility and control of your TLS certificate issuance or to manage all the applications essentially to understand and provision certificates within your environment. In the data center, all of these are relatively manageable and predictable, using policies that are easy to govern.
However, as you think about Kubernetes (managed by a cloud service provider or yourself) and the cloud applications, often we see applications that are heavily distributed. They are obviously containerized. And most of the containerized workloads are highly ephemeral, which also means that any of the identities that are associated with these workloads or also ephemeral. So, you are mostly managing ephemeral identities that are required for these workloads.
When you run these containerized applications in a container orchestration platform like Kubernetes, you see exponential growth in the number of identities, the number of workloads, the number of microservices, the number of containers, and the number of clusters. When think about how those identities should be governed, it becomes extremely challenging, and it is important to get a good handle on that. And those challenges become even more complex when we’re working in a service mesh environment. (More on that later.)
What can go wrong?
Often, we see a disconnect between platform teams, who operate on their own and security administrators, who are working toward a more centralized model. This divide can leave organizations in a place where there is no proper governance model. And this has led to security incidents, as noted in the Red Hat State of Kubernetes security report. The big finding here is that at least 90% of organizations have experienced at least one security incident.
Often we see that security incidents are due to a misconfiguration of the workload, which tends to happen very commonly in the context of managing machine identities. If you want to mitigate this risk, you’ll need effective threat modelling to make sure that any intentional or unintentional threats are all taken care of.
For example, with containerized workloads, access to a service is available through an external endpoint that is managed through an ingress. And that essentially allows anyone who is accessing the service outside of the cluster to be able to get access. So, protecting every public ingress across multiple different Kubernetes clusters is absolutely required as part of your threat mitigation.
The other aspect that we think about in terms of threat mitigation is around stopping rogue CAs. Developers are often tempted to use self-signed certificates when they are operating in a Kubernetes cluster, and they have workloads they need to run. These self-signed certificates are pretty common in the world of Kubernetes, especially as people think it’s the fastest way to secure their workloads and get a machine identity for that app. They simply bootstrap some CA that is either generated through open SSL or generated through some CA of that sort which is not approved by the organization.
And then we throw service meshes, like Istio, into the mix. Service meshes are very commonly used to manage traffic at scale, getting a much richer observability around how various services are used within an enterprise. Service meshes also ensure that all the workloads that are running as part of the mesh are mutually authenticating to each other to ensure that all the lines of traffic are encrypted. This gives organizations the ability to prevent any kind of man-in-the-middle attacks in this service mesh environment. This prevents attacks such as denial of services, elevation of privileges, intercepting or spoof data for communication within the service mesh.
What can you do about it?
To mitigate this level of risk, you’ll need comprehensive visibility of all the machine identity data across all the clusters within your enterprise. And you need to ensure that there are automated alerts and that there is the ability to define multiple trust levels and the machine identity issuer that is specific to a cluster that ties back to a platform where you're managing all the Certificate Authorities (CAs).
For example, let’s say you have a machine identity issuer that is connected back to your platform policy controls within Venafi. How do you identify policy violations and ensure that when there is a workload that is deployed, there is an identity that is specific to that workload. And when that workload is destroyed, any identity associated that with goes away along with the various dependencies on identities that are required for mutual TLS.
You should be able to extend the same capability that you have within your data center into your cloud native environments,irrespective of where your applications are running. If your Kubernetes is running in your data center, the same rule applies. And if it is a managed Kubernetes offering through a cloud provider, still the same rule applies. So irrespective of where you're running your Kubernetes environment, you should be able to have this capability within your environment to ensure that you know you have full enterprise coverage for your machine identities.
Same with mesh identities. Mesh identities take a slightly different approach because we're not injecting an identity to the workload itself, but there is a proxy that is running there. It’s a similar mechanism, but when you adopt a service mesh, you’ll quickly see how it fits into your organization's CA infrastructure. Then you’ll need to ensure that every single mesh workload has an identity that can be anchored back to your trust anchor or the trust route that is managed by the security team.
Managing at scale
Very often we think about application or a container that requires an identity. The recommendation we have is that an identity needs to be issued to a workload when that workload is deployed and when the workload is destroyed the identity should go away. So, so it is essentially what we refer to as issuing an identity just in time and then managing the life cycle of it over a period of time as long as that workload is running and managed. And with many organizations running 5000 or more applications, you can imagine the number of identities that are required to be managed within a Kubernetes cluster.
As the cluster footprint grows within the organizations, these identities also must be managed in a way where it is issued. You can understand what kind of policies need to be applied there. You must have the ability to understand whether a subordinate CA that you bootstrapped and made available to these teams; what kind of statistics are associated with it.
All the capabilities to manage identities at scale are provided by Venafi Firefly. With Firefly being a workload identity issuer, you can bootstrap an intermediate CA and run it in a highly distributed mode. With the issuer running in the same environment as the workloads, platform teams can be assured that workloads get an identity just in time that is compliant to organization’s policies.
However, from the security perspective, it is also important to have the right kind of visibility. Security administrators will have full control over the subordinate CA’s, its issuance metrics and the ability to govern them.
See how easy it is to deliver trusted certificates at warp speed
How to enforce policies
Policy constructs are well known to security teams and almost always every identity issued complies to the policies defined by the security team. These policies are centrally managed and most often related to identity issuance. In addition, these policies are managed by the Venafi administrator. To support some the net new use cases in cloud native environments, we have already established that a workload identity issuer must run locally where the workloads live.
And as part of that, platform administrators should also define policies and locally enforce them. This mechanism of combining central policies with policies that are locally enforced provides robust governance and follows the trust but verify model. These local policies can be specific to the environment that the applications are running in. For example, if a platform administrator wants allow developers to request identities only in a specific namespace, they should be able to define it. Similarly, platform administrators can define policy that says don’t issue an identity unless an ingress resource exists in X namespace.
Are you looking to enforce policies at scale in cloud-native environments that are more dynamic and fast paced than the data center environments where you’re most experienced? See how Venafi, the recognized leader in machine identity security for cloud native environments, can help. Contact us.