Cryptography describes the process and method of maintaining data security, confidentiality and integrity. Cryptography is especially important for creating strong machine identities—which are vital in protecting machine-to-machine connections and communications. There are several types of cryptographic methods and each method uses a different cryptographic algorithm to secure data. The three primary cryptographic algorithms are:
- Hash function
- Public key or asymmetric encryption
- Secret key or symmetric encryption
Hash function differs from secret key and public key encryption. This is because an acceptable hash function has unique properties, such as collision resistance, and one-way, irreversible mathematical computation.
What is a hash function?
According to the National Institute of Standards and Technology (NIST), a hash is a mathematical function that maps a string of arbitrary length (up to a predetermined maximum size) to a fixed length string. Hashing algorithms vary in strength, speed, and purpose. In 2015, NIST announced the Secure Hash Standard (SHS), the standard that includes secure hash algorithms (SHA). As highlighted in the Federal Information Processing Standard (FIPS) Publication 180-4, the recommended algorithms include the SHA-224, SHA-256, SHA-384, SHA-512. All the algorithms are iterative, one-way hash functions that can process a message to produce a condensed representation called a message digest. FIPS 180-4 is used by the United States Government and non-government organizations.
The hash algorithms specified in FIPS 180-4 are called secure, this is because, for a given algorithm, it is computationally infeasible to:
- Find a message that corresponds to a given message digest
- Find two different messages that produce the same message digest
Any change to a message will, with a very high probability, result in a different message digest. This will result in a verification failure when the secure hash algorithm is used with a digital signature algorithm or a keyed-hash message authentication algorithm.
In August 2021, NIST announced that it was in the process of a periodic review and maintenance of its cryptography standards and guidelines and a public comment period through October 1, 2021. This includes FIPS 198-1, The Keyed-Hash Message Authentication Code (HMAC), NIST SP 800-107 Rev. 1, and other special publications. As a result of the review, there could be changes to their current cryptography standards and guidelines.
What does a hash function do?
Confidentiality of data is something we think about every time we send an email message or transmit financial data over the Internet. When we send data using email, we want to ensure that the message is received by the intended recipient, without it having been altered or modified. In the case of digitally signed communications, you will want to ensure that no one altered the underlying content after digitally signing. The hash function facilitates confidentiality and data integrity, as well as authentication.
A hash function may help an organization accomplish the following four information security goals:
- Ensure data and file integrity
- Facilitate secure authentication
- Organize content and files in a way that increases efficiency
- Securely store passwords in a database
The unique properties of a hash function are valuable for securing data at rest or in transit. This is because they provide visibility into whether data has been altered, modified or removed. Hashing prevents an attacker from performing any reverse engineering to determine the input of data that was hashed. The plaintext is protected because a cybercriminal cannot easily convert the hashed value into plaintext using today’s computers (while it is technically possible, it is highly unlikely).
How does a hash function work?
Hashing mathematically converts data into a value that is essentially a representation of the data. The input for the hash algorithm is the data or the file. The output for the hash algorithm is the representation of the data, commonly referred to as the hash value or hash output. The hashing algorithm calculates a fixed-size bit string value from a file which contains blocks of data and transforms this data into a shorter fixed-length value, which represents the original string.
Consider the scenario in which you need to copy data files from one computer to another computer over an open network. Hashing provides certainty that the copied file is the same as the original, source file. How is this possible? Well, the hash value of the original file can be compared with the hash value of the copied file to ensure that the two values are the same. If the values are the same, this would indicate that the file has not been altered or modified. In the event that the values are not the same, the difference in hash values will indicate that the copied file is not identical to the original, source file and that it may have been modified. The difference could be as small as the addition of a comma. The consequence of the additional comma is described as the avalanche effect, an important property of the hash function, as the small difference or change will entirely change the resulting hash value.
Properties of a hash function
While both the hash function and encryption methods support the protection of data, the two cryptographic tools are different. The following distinguish the hash function as a cryptographic method from encryption:
- Avalanche effect
- Collision resistance
- One-way, irreversible computation (preimage resistant)
As demonstrated above, the avalanche effect—which is where the resulting hash output would change significantly or entirely even when a single bit or byte of data within a file is changed—is important for a strong hash algorithm. The avalanche effect allows a user to detect the smallest changes to data because the seemingly small change will change the entire hash value. Understanding that the data is no longer trustworthy may protect users from the harmful consequences related to relying on data that has been altered or modified.
One of the most important properties of the hash function is its collision strength; strong hash algorithms are collision resistant. In the event that two unique items (e.g., input data) result in identical outputs (i.e., the same hash value), the algorithm used to hash the data is broken, as hash values are unique. The security strength of a hash function for digital signatures is defined as its collision resistance strength. Collision resistant algorithms improve the security of the data. Lack of collision resistance was the reason for NIST to deprecate SHA-1 as insecure.
The hash function is a valuable cryptographic tool. It can be used to secure data and provide visibility into potential alteration, or modification of files and underlying data. Due to its unique characteristics, hashing prevents an attacker from using reverse engineering to view plaintext or original input data. Lastly, the hash function can also be combined with other cryptographic tools, such as encryption to support origin authentication, data integrity and signatory non-repudiation when using digital signatures.