What is a hash function?
A hash function is a crucial algorithm in computer science, designed to transform any input data, often called a 'message', into a fixed-size string of characters, known as the 'hash'. This transformation is one-way and deterministic, meaning the same input always results in the same hash, but the process is irreversible, ensuring data security. Hash functions are widely used in various applications, from data integrity verification to secure password storage, due to their ability to produce unique, consistent hashes for distinct inputs. A hash function is a mathematical operation that converts an input, or 'message', regardless of its length, into a fixed-length byte string. This output, typically referred to as the hash, is unique to each specific input. Even slight modifications in the input result in a markedly different hash. Key types of secure hash algorithms, as outlined in NIST FIPS 180-4, include the SHA (secure hash algorithm) series. With FIPS 202, the introduction of SHA-3 further extends the range of the secure hash algorithm family
Secure Hash Algorithms (SHA) are designed to create a compact representation of a message in electronic form. For messages shorter than 2^64 bits (in the case of SHA-224 and SHA-256) or under 2^128 bits (for SHA-384, SHA-512, SHA-512/224, and SHA-512/256), the hash algorithm generates a specific output, commonly known as a message digest. This digest is also referred to as a hash value, hash, or digital fingerprint. SHA-3 hash functions are crafted to be used interchangeably with SHA-2 functions, offering adaptability in their use.
The length of the message digest remains fixed, although the input message size may vary. Depending on the chosen algorithm, the length of these digests ranges from 160 to 512 bits. The essential features of these hashing algorithms are detailed in Table 1, which is based on NIST FIPS 180-4.
Algorithm. | Message size (bits). | Message digest size (bits) |
SHA-1 | <264 | 160 |
SHA-224 | <264 | 224 |
SHA-256 | <264 | 256 |
SHA-384 | <2128 | 384 |
SHA-512 | <2128 | 512 |
SHA-512/224 | <2128 | 224 |
SHA-512/256 | <2128 | 256 |
Table 1: Hash Algorithm Basic Properties
NIST FIPS 202 notes that a cryptographic hash function is designed to provide special properties, including collision resistance and pre-image resistance, that are important for many applications in information security. For example, a cryptographic hash function increases the security and efficiency of a digital signature scheme when the digest is digitally signed instead of the message itself. In this context, the collision resistance of the hash function provides assurance that the original message could not have been altered to a different message with the same hash value, and hence, the same signature. Other applications of cryptographic hash functions include pseudorandom bit generation, message authentication codes, and key derivation functions.
In accordance with FIPS 180-4, the hash algorithms are called secure because, for a given algorithm, it is computationally infeasible (1) to find a message that corresponds to a given message digest, or (2) to find two different messages that produce the same message digest. These algorithms enable the determination of a message’s integrity: any change to a message will, with a very high probability, result in a different message digest. This will result in a verification failure when the secure hash algorithm is used with a digital signature algorithm or a keyed-hash message authentication algorithm.
TLS Machine Identity Management for Dummies
Uses of hashing functions
Secure hash algorithms are commonly used alongside other cryptographic methods, such as digital signature algorithms and keyed-hash message authentication codes (MACs), or for generating random numbers (bits).
Message Authentication Codes (MACs) serve to authenticate the source and integrity of data. They function as a cryptographic checksum, ensuring data remains unchanged and confirming that the MAC was generated by the intended entity. FIPS 198 outlines how to compute a MAC using an approved hash function. Hash Message Authentication Codes (HMAC) allow for a range of key sizes, with the choice dependent on the desired level of security for the data and the specific hash function utilized.
Digital signatures serve multiple purposes: authenticating the source, ensuring data integrity, and offering non-repudiation support. They are used in combination with hash functions and can be applied to data of any length, within the limits set by the hash function used.
Security strength of hashing functions
The key distinction among these algorithms lies in the level of security they offer when hashing data. Security strength, as defined in NIST SP 800-57 Pt1 Rev 4, pertains to the effort (in terms of operations) required to compromise a cryptographic algorithm or system. To select an appropriate hash function for a given application, it's essential to consider both the minimum security strength index and the context in which the hash function will be employed, including the algorithm, scheme, or application. Below is a table, adapted from NIST SP 800-57 Pt 1 Rev 4, that enumerates approved hash functions capable of delivering specific security strengths for various hash-function applications.
It is very important to note that SHA-1 is mentioned only for legacy purposes, since it was deprecated by NIST in May 2011 due to known collision attacks. In addition, all major browser and software vendors like Microsoft, Mozilla and Google have already phased out the use of SHA-1 hashing systems.
Security strength | Digital signature and hash-only applications | HMAC |
≤80 | SHA-1 | |
112 | SHA-224, SHA-512/224, SHA3-224 | |
128 | SHA-256, SHA-512/256, SHA3-256 | SHA-1 |
192 | SHA-384, SHA3-384 | SHA-224, SHA-512/224 |
≥256 | SHA-512, SHA3-512 | SHA-256, SHA-512/256, SHA-384, SHA-512, SHA3-512 |
Table 2: Security Strength of Hashing Functions
Hash function vulnerabilities
As previously mentioned, hash functions are generally regarded as secure; however, the SHA-1 algorithm was officially deprecated by NIST in 2011 due to identified vulnerabilities. Indeed, the security of the SHA-1 hash algorithm has progressively diminished over time due to discovered weaknesses within the algorithm, enhanced processor capabilities, and the emergence of cloud computing technologies.
A hash function attack aims to discover two input strings for a hash function that yield identical hash results. This situation is called a collision, wherein two separate data entities—a document, binary, or a website's certificate—produce the same digest, as illustrated above. In secure hash functions, collisions should theoretically never happen. However, if there are weaknesses in the hash algorithm, as is the case with SHA-1, a determined attacker with sufficient resources can deliberately create a collision.
The probability of a collision attack is very low, especially for hash functions with large output sizes like widely-used document formats or protocols. However, with the growing computational power available, the potential for hash function attacks becomes more viable.
In 2017, Google made an announcement regarding a successful attack on the SHA-1 hash algorithm. This breakthrough was achieved through collaborative efforts between researchers from the CWI Institute in Amsterdam and Google, resulting in the creation of two files with identical hash values. This achievement marked the culmination of over a decade of research into the SHA-1 algorithm, which commenced with the seminal paper by Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu in 2005. Their paper introduced the initial cryptanalytic techniques capable of finding collisions with significantly less computational effort than brute force methods. Subsequently, cryptographers worldwide continued to enhance these techniques. The specific methods employed in this attack were developed by Marc Stevens, a member of the collaborative CWI-Google research team. For more details, additional information can be found at Shattered.io.
How does this translate into an attack scenario? Suppose the "malicious" file is malware or a counterfeit document, such as a certificate intended to establish a website's authenticity. In such a case, an attacker could gain trust for their malware or deceptive website from any system that relies on SHA-1 hashes for verification. Another scenario arises when using application whitelisting on your computer, which employs hashes to confirm the authenticity of files. It becomes conceivable for a malware file to share the same hash as a genuine file or a trusted application, potentially tricking the system into whitelisting it and granting unauthorized access to your computer.
As per the researchers who successfully broke SHA-1, any application that depends on SHA-1 for functions like digital signatures, file integrity verification, or file identification could be at risk. This encompasses various areas, including digital certificate signatures, email PGP/GPG signatures, software vendor signatures, software updates, ISO checksums, backup systems, and more.
TLS/SSL certificates are not susceptible to this risk because Certification Authorities following the CA/Browser Forum regulations are no longer permitted to issue SHA-1 certificates. Moreover, leading technology companies such as Microsoft, Google, and Mozilla have phased out SHA-1 and have outlined transition plans for users to migrate to SHA-2 hash functions. To further guarantee secure and efficient hash algorithms for long-term security, NIST has adopted a new hash algorithm standard, SHA-3, as specified in FIPS 202.
(This post has been updated. It was originally published on April 6, 2020.)