The prevention of certificate-related outages must be a top priority for DevOps. DevOps downtime can be very expensive and have terrible consequences. Production servers in particular must have as much uptime as possible. Damaging reliability and availability can harm an enterprise’s relationship with their clientele and become very costly. Recovering from a DevOps outage also takes time and labor hours out of the IT department, which could have been allocated more effectively had the outage been prevented.
Here are some frequently cited causes of DevOps outages:
- Too much log data being simultaneously sent from network devices to a log server
- Buggy script
- Overwhelming a CPU by opening a folder within a file manager with way too many files
- Improperly configured notification tools
- Disk partitions being filled with huge core dumps
- Networking problems like the one Microsoft reported pertaining to Azure: “The incident on Wednesday, 3 October 2018 started with a networking issue in the North Central US region. Since our authentication service, SPS, is in this region the issue impacted several of our scale units.
- Poor load balancing
- Bad configuration rollout
- Operating failures of networking appliances, such as switches and firewalls
- Network domino effects from database glitches
- Improperly configured DNS changes
- Broadcast storms from the incorrect usage of hubs and switches
- Poor package management in production nodes
But there’s one type of failure that’s not on this list and it may be one of the most common causes of DevOps outages: poor TLS certificate issuance and management.
In a recent survey, 79% of respondents cited at least one TLS certificate-related DevOps outage per year. What’s even worse is that over a third of those respondents cited six certificate-related DevOps outages or more per year! So, what should DevOps teams do differently to prevent these outages?
Machine Identity Security Architecture
The number of TLS certificates that a web application and other types of DevOps environments need to issue has greatly increased in recent years. Web browsers and other types of network client applications demand the encryption of network traffic now more than ever... and that’s a wonderful thing. Plaintext data in traffic is vulnerable to cyber attacks! Another factor that’s greatly increasing the rate of TLS certificate issuance is the explosion in the number of consumer and industrial Internet of Things (IoT) devices. If we’re going to have more and more IoT devices out there, they all need proper TLS certificates—which serve as machine identities—for strong encryption.
But if DevOps is stuck having to manually issue certificates, that’s a huge problem. Human beings are notoriously error-prone. And the manual configuration and issuance of certificates grinds the speed and efficiency of DevOps down to a halt. But there’s a sure way of preventing bottlenecks in your DevOps workflows and preventing one of the single most common causes of outages in DevOps.
Traditional certificate generation practices a littered with issues and incompatible with modern IT.
Modern DevOps leverages infrastructure-as-code for the continuous delivery of software. Virtual machines and containers often have very short lifespans which can be measured in hours or even just a few minutes. Certificate issuance and deployment now require a pull model, rather than a push model. Certificate issuance and deployment must be integrated with organizationally-approved certificate sources since certificate issuance is needed in real-time to support continuous deployments.
According to the pull model for certificate issuance, calls can be sent to centralized APIs for new certificate requests. Or certificate requests can be deployed by entities throughout the DevOps toolchain. Requests can originate from calls sent from Kubernetes, Terraform, Chef, Docker, or other popular DevOps tools. DevOps teams can get certificates using their preferred tools by environment and infrastructure type and there’s no need to leave the toolchain. There are so many very usable and efficient possibilities.
Many DevOps teams use solutions like HashiCorp Vault to automate certificate issuance, but that approach only works well for internal-facing infrastructure. Externally-facing load balancers and web servers require publicly-trusted certificates and they cannot be automated that way.
Policy compliant externally-facing certificates are key to preventing DevOps outages and increasing security. When organizationally-approved certificate sources are integrated with automated certificate issuance, each and every deployment can have checks to verify that none of the applicable certificates are up for expiration.
Each time the continuous integration/continuous delivery (CI/CD) pipeline runs, expiration checks can be performed and when new certificates are necessary, simple API calls can be performed to request and install that certificate on load balancers, web servers, and controllers.
Venafi Vice President of Security Strategy Kevin Bocek says, “Failure to renew certificates before they expire, and improper configurations (as with the Microsoft Azure outage) can be costly. Service failures with applications that use HTTPS can result in a downtime of up to $1 million per hour for high-volume services. To avoid unnecessary outages, make sure you are able to discover where all application certificates are in use and replace those that are unreliable or expired.”
It’s easy to understand. Centralizing and automating certificate issuance enable DevOps teams, PKI, and security teams to acquire full visibility of certificate usage throughout their networks. Problems can be diagnosed more easily, certificates can be more effectively monitored, and most importantly... DevOps outages can be prevented!
So, make sure certificate issuance and management is automated and embedded into your DevOps environments, it’s that simple!
Zero Trust with cert-manager, Istio and Kubernetes
Related posts