Public Key Infrastructure (PKI) comprises a series of servers, network protocols, hashing and encrypting algorithms, security policies, systems and applications, working together to allow, for example, that a person can check his or her bank account online without the fear of an account takeover.
To do this, we must trust in PKI. It was conceived to be trusted. Its cryptographic foundation is solid, the role of each participant is defined, the hardware is mature and applications program interfaces are widely used. However, there have been problems with PKI which force us to reconsider our trust. And to maintain that trust, we need to be prepared to act quickly when a PKI is threatened.
Here are some examples of what can go wrong with your PKI
While the mathematical foundations of the cryptography used in PKI have been studied and demonstrated to be complex to crack, advances in hardware have turned computationally secure algorithms into breakable ones. In addition, sometimes the implementation of these cryptographical algorithms introduces flaws or vulnerabilities that are external to the core crypto-mathematical function, and that can be exploited by attackers.
Sometimes, the vulnerabilities are not in the cryptographic protocols, implementing code or hardware, but in the business systems or processes that support the operations of PKI, for example, in the issuance of digital certificates.
Researchers Serrano, Hadan and Jean Camp from Indiana University Bloomington have analyzed 379 reported instances of failures in certificate issuance to pinpoint the most common causes as well as systemic issues that contribute to these happening. The report findings are alarming.
Types of PKI incidents
The analysis of the known PKI incidents revealed the types of incidents, which are summarized in the table below.
Table 1: Types of Known PKI Incidents
This table shows the type of incident, the number of incidents that were not reported (disclosed by the CA itself), the total number of incidents for a given type, and the percentage of incidents in this category. What is alarming is the low percentage of incidents being self-reported by the CAs. Self-reporting might not feasible if, for example, the CA was unaware of the problem, but the research noted that there were many cases where the CAs knew about their issues and preferred to remain silent.
Under the Other category, the researchers included incidents such as backdating SHA-1 certificates, certificates for malicious domains, MiTM attempts, non-acceptable requester validation, validity greater than 825 days and others.
Causes of PKI incidents
After studying the different types of incidents, the researchers classified the causes of these incidents. As summarized in the table below, they identified ten categories. The table, adapted from the original report, lists the major causes of incidents, the rate of self-reporting for each cause, and the percentage of each cause in relation to the total number of incidents studied.
Table 2: Causes of Known PKI Incidents
Let us elaborate a bit on these causes.
Software bugs: The cause is a software error or flaw that generated an incident in the PKI ecosystem. These bugs may produce faulty certificates, problems in OCSP responders, or erroneous checks in a request.
Believed to be compliant/Misinterpretation/Unaware: These causes show a lack of awareness of or failure to comply with updates of the CA/B Forum Baseline Requirements or the Root Program Requirements or misunderstanding of the concepts behind these requirements.
Business model/CA decision/Testing: The CA is aware of the Baseline or Root Program Requirements but places its own business strategies over compliance with these requirements. In other words, they prefer their near-term benefits to the health of the PKI ecosystem. This category presents the most alarming incidents with regarding CAs’ misbehaviors or lack of ethics.
Human error: These correspond to human mistakes in manual entry of data for certificate requests, or forgetting steps in the setup of a new intermediate CA. This is the cause when one specific unique employee makes an error in a process.
Operational error: This included all the incidents that were generated because of an internal faulty procedure in the CA or a related entity. Examples of these incidents are mistakes in audit reports, not disclosing subordinate CAs, and delayed revocations of compromised certificates.
Non-optimal request check: This cause refers to cases where the checks of an applicant for a certificate were not performed correctly. This can, for example, enable the generation of rogue certificates, EV certificates that have not been verified, or, simply, improper digital certificates.
Improper security controls: This category groups all incidents regarding CAs or other entities being hacked, with the possible or actual outcome of rogue certificates being generated and used in the wild. Evidence shows that, in general, the causes of the hacking incidents were predictable known problems.
Change in Baseline Requirements: This category includes a few incidents where a sudden change in the Baseline Requirements made the CAs change their certificate issuance. CAs were not compliant until they updated their certificate issuance procedures.
Infrastructure problem: Infrastructure problems can be related to unavailable servers, defective networks, or problems in the hardware that support the business of the CA.
Organizational constraints: This category includes incidents where the causes were constraints in the environment where the CAs operated, for example, national legal requirements.
Conclusions and way ahead
One of the conclusions that the researchers reached is that Root Program’s owners have tremendous power in the PKI network, and they should use it to penalize those CAs that put their welfare over that of the Public Key Infrastructure. “With just their independent decision they can end the business of a CA, especially if several Root Program’s owners are aligned with the revocation’s posture,” researchers noted.
While the theoretical cryptography basis of PKI may seem flawless, bad technical implementations, erroneous or unacceptable operational procedures, and business interests undermine this scheme. “Perhaps a cleaner scheme without so many interested parties, especially the ones whose decisions balance between bringing trust to the network or to be more profitable, could be a better mechanism. After all, today PKI’s network relies on these entities, and we as end users are obliged to trust in them,” concludes the report.
The entire paper is well worth a read, as it goes into great detail and provides valuable insight into PKI incidents. Knowing the causes of incidents is the fundamental basis of raising awareness on the ways digital certificates and machine identities can be distrusted.
CAs can “present a risk to the entire network”, as the researchers note, but organizations can do a lot to eliminate this single point of failure. They should establish a formal certificate management program with executive leadership, guidance, and support. This program should include clearly defined policies, processes, and roles and responsibilities for the certificate owners and the Certificate Services team, as well as a central Certificate Service.
A central Certificate Service includes technology-based solutions that provide automation and that support certificate owners in effectively managing their certificates. This service should include the technology/services for CAs, certificate discovery, inventory management, reporting, monitoring, enrollment, installation, renewal, revocation, and other certificate management operations. The central Certificate Service should also provide self-service access for certificate owners to be able to configure and operate the services for their areas without requiring significant interaction with the Certificate Services team.
Venafi has helped over 350 global customers achieve the agility they need to react quickly to PKI incidents. They now have the flexibility to use various CAs based on business need. Plus, they have the CA-agility they need to make changes at a moment’s notice without impacting their security posture or the availability of critical applications and services.
Can you proactively manage your trust model to maximize machine identity management for your company, customers and partners.