Format-Preserving Encryption (FPE) encrypts the plaintext of some specified format, such as a Social Security or credit card number, into a ciphertext while preserving the original formatting of the plaintext.
The problem of protecting data in legacy systems
The protection of data in legacy systems, such as the ones used in banking and healthcare systems, is a problem that needs to be addressed in such a way that does not interfere with the operation of these systems.
The problem lies on the fact that encrypting this formatted data using, for example, the Advanced Encryption Standard (AES) CBC mode, a 16-digit value that represents a credit-card number might be encrypted to a string such as BfA1lytW8I2kflOcQbOCUlX1yH+vAL1/nRoLgKkId+o=. This is longer than 16 characters, and most of the string is no longer digits. Unfortunately, this sort of change can be fatal in complex legacy environments, where lots of applications expect to get only 16-digit values and may not fail gracefully if they do not get it.
How FPE solves the problem
NIST describes a solution to this problem in their recent Special Publication 800-38G, "Recommendation for Block Cipher Modes of Operation: Methods for Format-Preserving Encryption."
The FPE modes specified by SP 800-38G allow the encryption plaintext without changing the format. FPE methods are designed for data that is not necessarily binary. NIST explains that “given any finite set of symbols, like the decimal numerals, a method for FPE transforms data that is formatted as a sequence of the symbols in such a way that the encrypted form of the data has the same format, including the length, as the original data. Thus, an FPE-encrypted SSN would be a sequence of nine decimal digits.”
SP 800-38G specifies ways to encrypt sensitive data that can be fully validated to FIPS 140-2, the US government's "Security Requirements for Cryptographic Modules." FPE modes described in NIST SP 800-38G can be used to protect sensitive data while maintaining compliance with data privacy and security regulations, such as CCPA, HIPAA, PCI DSS, or GDPR.
The NIST SP 800-38G was initially published in 2016 and described two modes for FPE: FF1, and FF3. In 2017, researchers performed a cryptanalytic attack on FF3, rendering it unsuitable for general-purpose FPE because it did not achieve the intended 128-bit security level.
In response to the attack, NIST updated FF3 to FF3-1 in early 2019. The update addressed potential vulnerabilities where the number of possible inputs—that is, the domain size—is sufficiently small, for example, using the middle six digits of credit card or Social Security numbers. In these cases, there is simply not enough entropy to create a secure output that cannot be reverse engineered. In the original SP 800-38G, the domain size for FF1 and FF3 was required to be at least 100 and recommended to be at least 1,000,000. In the revision, the domain size is required to be 1,000,000.
Benefits of FPE
FPE modes facilitate the retrofitting of encryption technology to existing devices or software, where a conventional encryption mode might not be feasible. In particular, database applications may not support changes to the length or format of data fields.
This is why FPE is commonly used to protect sensitive data sets, such as payment card data, bank account details, Social Security Numbers and personally identifiable information (PII), that are processed and stored in retail, healthcare and financial databases and applications.
More generally, FPE can support the “sanitization” of databases, the use of encryption to personally identifiable information (PII), such as SSNs. The encrypted SSNs could still serve as an index to facilitate statistical research, even across multiple databases. This means a lot of processing of FPE-encrypted data can be performed with the data in its protected state.
FPE vs. tokenization
Similar approach to FPE for format-preserving data protection is the use of tokenization. Tokenization exchanges sensitive data with randomized values in the same format that has no intrinsic value of its own. The original data is stored in a secure data vault. However, tokenization is not the same as encryption. Encrypted data can be decrypted with the appropriate keys, or machine identities. On the other hand, tokens cannot be reversed because there is no mathematical relationship between the token and its original value. This means there is greater flexibility with the breadth of tokens the data can be converted to.
Conclusion
Data protection is important to ensure compliance with the various security and privacy regulations and avoid costly penalties. However, organizations should assess the various encryption methods to ensure that their critical systems are not disrupted when processing ciphertexts. Either way, organizations should protect encryption keys from compromise. Data encryption and protection is as strong as the strength of associated keys. Once the keys are compromised, all encrypted data can be deciphered by cyber criminals, exposing business and individuals to threats such as financial fraud, blackmail, impersonation, and business email compromise.
Related posts