Thursday, September 30, 2021, was like a nightmare for many businesses. A few services and websites reported issues due to the expiration of a Let’s Encrypt Root CA certificate, named IdentTrust DST Root CA X3. According to security researcher Scott Helme, the confirmed cases of outages included names like Palo Alto, Bluecoat, Cisco Umbrella, Catchpoint, Guardian Firewall, AWS, PFsense, OVH, Auth0, Fortinet, Heroku, Netlify and Cloudflare Pages. But before jumping to any wrong conclusions about who’s responsible, let’s examine what happened.
Let’s Encrypt warned
Back in May 2021, Let’s Encrypt issued an announcement informing their customers that on September 30 their old root certificate “DST Root CA X3 will expire.” That was part of their transition to their new root certificate, ISRG Root X1. According to the announcement “older devices that don’t trust ISRG Root X1 will start getting certificate warnings when visiting sites that use Let’s Encrypt certificates.” There was one exception covering old Android devices, where Let’s Encrypt had made a “special cross-sign from DST Root CA X3 that extends past that root’s expiration.”
That would be a transparent transition for most organizations. “We’ve set up our certificate issuance so your web site will do the right thing in most cases, favoring broad compatibility,” said Let’s Encrypt. They even provided guidance for customers providing an API or supporting IoT devices.
However, in the vast and complex world of the internet, changes of this scale create turbulence. “At least something, somewhere is going to break,” warned security researcher Scott Helme in a recent blog post. And this is what we experienced.
CIO Study: Automation Vital to Address Shorter Lifespans and Massive Growth of TLS/SSL Certificates
Did the world listen to the warning?
Scott Helme was monitoring the developing situation and providing updates on his Twitter account. And outages started popping up one after another.
“It's happening folks. Even AWS is posting about the intermediate root used by Let's Encrypt which expired today, causing outages,” reported Sandra Chrust, Product Launch Director, DevOps and Cloud Native Solutions at Venafi, on her LinkedIn account.
“It's starting to feel a bit like Y2k,” said Greg Delaney, Senior Product Marketing Manager at Venafi.
Don’t shoot the sheriff, check your certificate management instead!
“This news is a valuable reminder that all machine identities, including the certificates issued by Let’s Encrypt, eventually expire. When they do and not replaced, vital services break down,” says Kevin Bocek, VP, Security Strategy & Threat Intelligence.
The CEO and Founder of Status Cake acknowledged just that in an apologetic letter posted on the company’s Twitter account: “[W]e also failed to update our own CA-certificates meaning a widespread failure of our own monitoring services. This is not excusable, and particularly as a website monitoring company is a position of trust, quite simply we should not have got this wrong.”
“Groups/individuals managing PKI infrastructure need to understand that updating a root certificate is different from just simply updating a web-browser or OS or server certificate,” says Pratik Savla, Security Engineer for Venafi. “A root certificate is the primary critical link in the chain of trust for the keys and certificates that serve as machine identities. Root certificates are embedded in nearly every type of software and hardware used in today’s enterprise infrastructure. Because root certificates have much longer validity periods of, when they expire there can be significantly larger negative impacts,” explains Savla.
It all comes down to having in place an effective and efficient machine identities management program. “Locating the issue, when it comes to TLS certificates, is sometimes the harder issue than knowing the cause,” wrote Delaney.
Replacing machine identities that rely on the old root certificate might sound like the solution, but “many will find this is half the problem. Errors, misconfigurations, and more reasons could lead to replacements not being installed and outages occurring,” explains Bocek. With the number of machine identities across the enterprise skyrocketing, many companies lack the required visibility into these assets, and errors occur taking systems offline.
“To address these risks, organizations need a proper strategy that includes updating their root store and all dependent infrastructure with the new active root before the old one expires. Such plans should also ensure that single points of failures are addressed first,” says Savla.
“Security teams need to automate machine identity management and gain full visibility over all of their certificates. By doing so, they can automate the rotation, replacement, and revocation of all machine identities,” adds Bocek.
“Events like this really show the difference between organizations that have a robust, enterprise-wide strategy to manage the keys and certificates that serve as machine identities, and those that don’t,” highlights Savla.
Lesson learned?
This story underscores the importance of having a comprehensive inventory of certificates and knowing exactly where they are being used and which systems rely on them. Plus, you need to know which CAs are being used anywhere in your company. We routinely talk to customers who initially find certificates from CAs they could have sworn were not being used by their company.
Take Control of Your Machine Identities With Automation and ELIMINATE Outages!
Related posts