What is a hash function?
Hash functions play a pivotal role across a multitude of information security applications. Essentially, a hash function is a mathematical process that takes an input, or 'message', of any length, and returns a fixed-size string of bytes. This unique output, often referred to as the hash, is specific to each unique input. Crucially, even a minor alteration to the input will produce a dramatically different hash. Widely used variants of secure hash algorithms, as defined by NIST FIPS 180-4 , include the SHA (secure hash algorithm) series, with FIPS 202 further introducing SHA-3
Secure Hash Algorithms (SHA) are used for computing a condensed representation of electronic data (message). When a message of any length less than 264 bits (for SHA-224 and SHA-256) or less than 2128 bits (for SHA-384, SHA-512, SHA-512/224 and SHA-512/256) is input to a hash algorithm, the result is an output called a message digest. Common names for the output of a hash function also include hash value, hash, and digital fingerprint. The SHA-3 hash functions can be implemented as alternatives to the SHA-2 functions, or vice versa.
While the length of the input message can vary, the length of the digest is fixed. The message digests range in length from 160 to 512 bits, depending on the algorithm. Table 1, adapted from NIST FIPS 180-4, below depicts the basic properties of hashing algorithms.
Table 1: Hash Algorithm Basic Properties
NIST FIPS 202 notes that a cryptographic hash function is designed to provide special properties, including collision resistance and pre-image resistance, that are important for many applications in information security. For example, a cryptographic hash function increases the security and efficiency of a digital signature scheme when the digest is digitally signed instead of the message itself. In this context, the collision resistance of the hash function provides assurance that the original message could not have been altered to a different message with the same hash value, and hence, the same signature. Other applications of cryptographic hash functions include pseudorandom bit generation, message authentication codes, and key derivation functions.
In accordance with FIPS 180-4, the hash algorithms are called secure because, for a given algorithm, it is computationally infeasible (1) to find a message that corresponds to a given message digest, or (2) to find two different messages that produce the same message digest. These algorithms enable the determination of a message’s integrity: any change to a message will, with a very high probability, result in a different message digest. This will result in a verification failure when the secure hash algorithm is used with a digital signature algorithm or a keyed-hash message authentication algorithm.
Uses of hashing functions
Secure hash algorithms are typically used with other cryptographic algorithms, such as digital signature algorithms and keyed-hash message authentication codes, or in the generation of random numbers (bits).
Message Authentication Codes (MACs) can be used to provide source and integrity authentication. A MAC is a cryptographic checksum on the data that is used in order to provide assurance that the data has not changed and that the MAC was computed by the expected entity. FIPS 198 specifies the computation of a MAC using an approved hash function. A variety of key sizes are allowed for Hash Message Authentication Codes (HMAC), while the choice of key size depends on the amount of security to be provided to the data and the hash function used.
Digital signatures are used to provide source authentication, integrity authentication and support for non-repudiation. Digital signatures are used in conjunction with hash functions and are computed on data of any length (up to a limit that is determined by the hash function).
Security strength of hashing functions
The algorithms differ most significantly in the security strengths that are provided for the data being hashed. Security strength is defined in NIST SP 800-57 Pt1 Rev 4 as “the amount of work (that is, the number of operations) that is required to break a cryptographic algorithm or system.” In order to determine the appropriate hash functions that may be employed in an application, the minimum security strength index is a factor to consider along with the algorithm, scheme or application in which the hash function is used. The table below, adapted from NIST SP 800-57 Pt 1 Rev 4, lists the approved hash functions that can be used to provide the identified security strength for various hash-function applications.
It is very important to note that SHA-1 is mentioned only for legacy purposes, since it was deprecated by NIST in May 2011 due to known collision attacks. In addition, all major browser and software vendors like Microsoft, Mozilla and Google have already phased out the use of SHA-1 hashing systems.
Hash function vulnerabilities
We have stated before that although hash functions are considered to be secure, the SHA-1 algorithm was deprecated by NIST in 2011 due to known weaknesses. Indeed, the security of the SHA-1 hash algorithm has become less secure over time due to weaknesses found in the algorithm, increased processor performance, and the advent of cloud computing.
A hash function attack is an attempt to find two input strings of a hash function that produce the same hash result. A collision occurs when two distinct pieces of data—a document, a binary, or a website’s certificate—hash to the same digest as shown above. In practice, collisions should never occur for secure hash functions. However if the hash algorithm has some flaws, as SHA-1 does, a well-funded attacker can craft a collision. The attacker could then use this collision to deceive systems that rely on hashes into accepting a malicious file in place of its benign counterpart. For example, two insurance contracts with drastically different terms.
The odds of a collision attack are extremely low, especially for functions with a large output size such as lengthy and widespread document formats or protocols. But as available computational power increases, the ability to attack hash functions becomes more feasible.
Google announced back in 2017 that a team of researchers from the CWI Institute in Amsterdam and Google had successfully demonstrated an attack on the SHA-1 hash algorithm by creating two files that hash to the same value. The work by the CWI-Google team is the culmination of over a decade of research into the SHA-1 algorithm, beginning with the groundbreaking paper by Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu in 2005 that described the first cryptanalytic techniques capable of finding collisions with much less work than brute force. Cryptographers around the world continued to improve upon these techniques. The techniques used by this attack were developed by Marc Stevens, one of the members of the joint CWI-Google team. The research team has posted additional information at Shattered.io.
How does the above translate into an attack? If the “bad” file is malware or a forged document, like a certificate designed to guarantee the authenticity of a website, an attacker could have their malware or their fake website trusted by any system that checks SHA-1 hashes to verify. Another use case could be if you are using application whitelisting on your computer, which uses hashes to verify that the files are what they claim to be. It could be possible for a malware file to have the same hash as for a legitimate file or some trusted application and allow itself to be whitelisted, granting privileges into your computer.
According to the team that actually broke SHA-1, any application that relies on SHA-1 for digital signatures, file integrity, or file identification is potentially vulnerable. These include digital certificate signatures, email PGP/GPG signatures, software vendor signatures, software updates, ISO checksums, backup systems, etc.
TLS/SSL certificates are not at risk because any Certification Authority abiding by the CA/Browser Forum regulations is not allowed to issue SHA-1 certificates anymore. In addition, all major technology vendors, like Microsoft, Google and Mozilla, have also phased out SHA-1 and have provided roadmaps for their users to migrate to SHA-2 hash functions. Finally, in order to ensure that practitioners have secure and efficient hash algorithms to provide long-term security, NIST has selected a new hash algorithm standard, SHA-3, which is specified in FIPS 202.
(This post has been updated. It was originally published on April 6, 2020.)