NIST has released NIST AI 100-2e2023, a new report that explores the expanding AI threatscape, including key adversarial machine learning (AML) attack methods, lifecycle stages, objectives and capabilities, as well as mitigation strategies and ongoing challenges.
The dynamic landscape of AI and cybersecurity
Why now? 2023 was hailed as the year of AI, as this influential technology seemingly became as integral to daily life as electricity. But, as the world scrambles to capitalize on AI, cybersecurity attacks will increase, too. NIST states their new Taxonomy and Terminology, “may aid in securing applications of artificial intelligence against adversarial manipulations.”
But the range of attacks targeting AI systems is far-reaching and constantly evolving. Threats arise at all points of the ML lifecycle (design, implementation, training, testing and deployment), and can also target the infrastructure on which the AI system itself is deployed.
Regardless of where threats crop up, you’ll want to stay abreast of recent developments, including novel attack types and methods for adapting your security practices.
AML threats to predictive AI and generative AI
We understand you’re furiously trying to safely leverage AI tools and may not have the time to read the full guidance from NIST. That’s why we’ve summarized the major points in today’s article.
NIST’s guidance focuses on building standardized AML language for the ML and cybersecurity communities around Predictive AI (PredAI) and Generative AI (GenAI). Threat types covered within its pages include the most widely studied and most effective attacks in the evasion, poisoning, privacy and abuse categories.
Before we dive any deeper, let’s first define what each of those AI types is.
- Predictive AI: PredAI uses ML algorithms to adaptively forecast data, including trends, patterns, predictions and behaviors.
- Generative AI: GenAI generates new content such as text, audio, video or combinations thereof.
A note on AML attacker objectives
The taxonomy denotes three attack motives, including availability breakdowns (disrupting system performance), integrity violations (manipulating model outputs) and privacy compromises (learning sensitive information).
The Generative AI Identity Crisis: Emerging AI Threatscapes and Mitigations
PredAI evasion attacks
Predictive AI evasion attacks occur when threat actors seek to generate adversarial examples, which are testing samples that can be misclassified during deployment, often without anyone noticing the interference.
Recommended mitigation: Mitigating evasion attacks is challenging since they’re hard to spot. NIST suggests:
- Adversarial training: Incorporating adversarial examples (along with their correct labels) to make a model more robust to attacks
- Randomized smoothing: Adding noise to training data to make it harder for threat actors to generate adversarial examples
- Formal verification: Mathematically analyzing models to identify potential vulnerabilities
PredAI poisoning attacks
AI poisoning attacks occur during the ML training stage, which Microsoft has previously dubbed as “the most critical vulnerability of machine learning systems deployed in production.” Poisoning attacks are troubling because they can be orchestrated at scale, with limited financial resources, and can impact model availability and integrity.
Availability attacks
Availability attacks cause ML models to degrade across all samples, and can be carried out through label flipping, which involves the generation of training examples selected by a threat actor.
Recommended mitigation:
- Thorough sanitization of training data before ML training is performed
- Data clustering
- Implementation of security mechanisms for data provenance and integrity attestation
NIST also recommends “robust training” instead of regular ML training, which entails training multiple models and generating predictions via model voting. Adding noise to training techniques is one technique mentioned.
Targeted poisoning attacks
Rather than going after a broad sample base, targeted poisoning attacks impact a small number of samples. Label flipping can again be successful here, and has the potential to impact the entire model, because a model could learn the wrong label, which may skew its entire performance.
Recommended mitigation: NIST calls these targeted attacks “notoriously challenging to defend against,” and again states that dataset provenance and integrity attestation are critical.
Backdoor poisoning
In one instance of a backdoor poisoning attack, researchers realized image classification tools could be manipulated by altering a small element at training time, so the model would later misclassify. Another example of a backdoor poisoning attack involved classifying natural reflections on images as a backdoor trigger to poison facial recognition systems using hidden—what are known as steganographic—algorithms.
These attacks have become more sophisticated and stealthier. They’re hard to detect, even during fine-tuning processes.
Recommended mitigation:
- Data sanitization
- Proper trigger construction
- Model inspection prior to deployment (to check for poisoned data samples)
Model poisoning
Model poisoning attacks tamper with ML algorithms directly, with the goal of injecting malicious functionalities. They can also impact the software supply chain further downstream, affecting suppliers with poisoned code.
Recommended mitigation:
- Identify and exclude malicious updates
- Implement cryptographic protocol-based verification (code signing)
PredAI privacy attacks
Privacy attacks involve information collection from user records or around critical infrastructure, in the hopes of reconstructing data, extracting models themselves or inferring information about training data through model interaction.
Data reconstruction
Data reconstruction attacks happen in three ways: information recovery from aggregated data; using semantically related images to conduct model inversion attacks, and property inference, in which attackers seek information about an individual (such as the medical records of a patient undergoing study for a rare disease).
Model extraction
In this case, attackers extract information about the model’s architecture and training parameters. They may compute model weights using algebraic formulas, conduct side-channel attacks or carry out rowhammer attacks.
NIST notes that model extraction is usually a step toward another attack, so stopping these early can mitigate further down the AI supply chain.
Property inference
Attackers attempt to learn global information about training data through interaction with the model—data that wouldn’t be released in a user query.
Mitigations of PredAI privacy attacks
By limiting how much someone can learn about the records in each dataset, you can better ensure training data privacy. But this too remains challenging, as setting up strict parameters could impact model privacy and accuracy later on.
GenAI poisoning attacks
GenAI and PredAI can both be impacted by the attack methods mentioned so far, but GenAI poisoning may occur because models are trained by scraping wide—and often unverified—swaths of data (i.e., the Web). And with datasets sometimes sporting trillions of tokens, mitigations through URL listing and cryptographic hashing may not always scale effectively.
GenAI direct prompt injection attacks
This involves the injection of text to alter the behavior of an LLM. Adversaries will use this technique to jailbreak models or abuse them to create misinformation, propaganda, hurtful content, sexual content, malware (code) or phishing messages. They can also use direct prompt injection to invade privacy.
Recommended mitigation: NIST provides three categories of defense, but none offer full immunity at this time.
- Training for alignment: Training on carefully curated and pre-aligned datasets; improve iteratively through human feedback
- Prompt instruction and formatting techniques: Appending specific instructions to inform models that a user may be attempting a jailbreak; wrapping prompts in random characters or tags to denote system instructions from user prompts
- Detection techniques: Monitoring for prompt injection and malicious user inputs by moderating firewall outputs for jailbreak behavior
GenAI indirect prompt injection attacks
Indirect prompt injection occurs when a threat actor doesn’t directly manipulate LLMs but instead remotely injects prompts that affect system operation, similar to an SQL injection attack.
Recommended mitigation:
- Reinforcement Learning from Human Feedback (RLHF), where a human is directly involved to help fine-tune a model
- Filtering retrieved inputs
- Moderating LLMs
- Performing detection of outlying behaviors
Harness the power of AI to slay machine identity complexity in seconds
Ongoing discussions and emerging challenges
As our community works to secure AI technologies, the adversary will continue to work against us.
The landscape is far from static, and NIST stresses that we must continue to innovate in our approaches to security, so we can anticipate and neutralize novel threats.
NIST brings up a few key challenges in the “Open Discussion” section of this publication:
- Scale. Data repositories for ML models are vast, and the nature of ML algorithms renders the traditional cybersecurity perimeter obsolete.
- Theoretical limitations. Current processes are ad hoc, fallible and impact model performance. Data and sanitization can help, and they should be combined with cryptographic techniques for origin and integrity.
- Open vs. closed model dilemmas. Open source technology is indispensable, but regulatory bodies are concerned that open source AI could be dangerous in the hands of those with malicious intent. It brings up the question: Should open models be allowed?
- Supply chain challenges. Both DARPA and NIST are working on ways to defend AI systems from intentional, malicious code that could impact the supply chain downstream. In some cases, malware is undetectable—and may only be prevented through thorough vetting of all third-party components in a supply chain.
- Trustworthiness tradeoffs. A lot of attributes go into building trustworthy AI systems and there are often trade-offs for privacy, accuracy and robustness.
- Multi-modal systems. Mitigations mentioned for single modality may not extend to multi-modal systems, and additional research is needed.
- Quantized models. Quantized models, which help reduce compute and memory costs, inherit weaknesses from original models and may even amplify errors, requiring careful monitoring.
Machine identity security and authentication for ML models
“AI, especially GenAI, is a revolutionary leap in technology that is reshaping industries, disrupting norms and transforming the very essence of how enterprises deliver value and solve complex problems. The integrity, availability and privacy of foundational models, training and contextual data are the cornerstones of the GenAI revolution—and enterprises must safeguard them fiercely to cultivate trust, ensure compliance and achieve success.” – Faisal Razzak, Group Manager, Post Quantum & Secure Software Supply Chain Initiatives
If NIST AI 100-2e2023 is any indication, ML models are rapidly becoming a prime target for adversaries. Keeping them safe requires robust machine identity security.
An effective, enterprise-wide security and management program, like the Venafi Control Plane, can help you maintain visibility and automated control over every machine identity, for every version and instance of the ML models you use. And this comprehensive oversight makes it easy for you to “flip the kill switch” in the event of any suspicious or malicious behavior.
3 Risks Adversarial Machine Learning Poses to Your GenAI Systems
AML security starts at the foundation, with machine identity security
To learn more about NIST’s AI security recommendations—including a deeper dive into GenAI threat types and mitigations—watch “3 Risks Adversarial Machine Learning Poses to Your GenAI Systems.”
There, Faisal Razzak dials in to concrete strategies that can help protect your AI systems from unauthorized changes, and he'll detail the critical role code signing plays in that process. Don’t miss it!