Skip to main content
Threat Intelligence

MITRE ATLAS: Understanding the AI Threat Landscape with Real Attack Techniques

BST

BeyondScale Security Team

AI Security Engineers

20 min read

MITRE ATT&CK changed how security teams think about adversaries. Instead of treating attacks as isolated incidents, ATT&CK gave defenders a shared language for describing attacker behavior: the tactics they pursue, the techniques they use, and the procedures they follow. It turned threat intelligence from an abstract concept into a structured, actionable knowledge base.

MITRE ATLAS does the same thing for AI systems.

ATLAS, which stands for Adversarial Threat Landscape for Artificial Intelligence Systems, is a knowledge base of adversarial tactics and techniques specifically targeting AI and machine learning. It catalogs how attackers reconnoiter AI systems, gain access to models and training pipelines, manipulate model behavior, extract sensitive data, and degrade AI system performance. Each technique is documented with real-world case studies drawn from published research, disclosed incidents, and red team exercises.

If your organization deploys AI systems in production, whether you built the models yourself or consume them through APIs, ATLAS gives you a structured way to understand and defend against the threats specific to your AI attack surface. For complementary vulnerability frameworks, see our guides on the OWASP LLM Top 10 and NIST AI RMF.

Key Takeaways

  • MITRE ATLAS extends the ATT&CK framework with tactics and techniques specific to adversarial attacks on AI and ML systems
  • The framework covers the full attack lifecycle from reconnaissance through impact, with AI-specific stages like ML Model Access and ML Attack Staging
  • Real-world case studies document actual incidents including chatbot manipulation, model extraction, adversarial examples against autonomous systems, and training data extraction
  • ATLAS is practical for threat modeling: map your AI architecture to applicable techniques, assess existing controls, and prioritize gaps
  • The framework maps to other AI security resources including the OWASP LLM Top 10 and NIST AI RMF, providing a common reference point

What MITRE ATLAS Is and Why It Exists

ATLAS was first released by MITRE in 2021 and has been continuously updated since. It was built in collaboration with academic researchers, industry partners, and government agencies to address a specific gap: traditional cybersecurity frameworks did not account for the unique attack surfaces introduced by AI and ML systems.

ATT&CK tells you how attackers compromise networks, endpoints, and cloud infrastructure. But it does not tell you how an attacker crafts adversarial inputs to fool a computer vision model, poisons training data to insert a backdoor into a classifier, or uses carefully constructed queries to extract a proprietary model's parameters through its API.

These are fundamentally different attack patterns. They exploit different vulnerabilities (statistical properties of models rather than software bugs), require different tools (optimization algorithms rather than exploit kits), and produce different impacts (model manipulation rather than data exfiltration). They needed their own taxonomy.

ATLAS provides that taxonomy. It is organized as a matrix of tactics (the adversary's objectives) and techniques (the specific methods used to achieve those objectives), following the same structural approach as ATT&CK. Each technique includes a description, procedure examples, documented case studies, and recommended mitigations.

The Relationship Between ATLAS and ATT&CK

ATLAS is not a replacement for ATT&CK. It is a companion. AI systems run on traditional infrastructure, which means an attack on an AI system often involves a combination of ATT&CK and ATLAS techniques.

For example, an attacker might use ATT&CK technique T1190 (Exploit Public-Facing Application) to gain initial access to a web application, then use ATLAS technique AML.T0040 (ML Model Inference API Access) to probe the AI model behind that application. The attack chain spans both frameworks.

ATLAS explicitly references ATT&CK techniques where they overlap or serve as prerequisites. This cross-referencing is important for threat modeling because it reflects reality: adversaries do not constrain themselves to one framework.

The ATLAS Matrix Structure

The ATLAS matrix organizes adversarial behavior into tactics, each representing a phase of the attack lifecycle. Within each tactic, multiple techniques describe specific methods an attacker might use. Here is a detailed walkthrough of each tactic category.

Reconnaissance

Before attacking an AI system, adversaries gather information about it. ATLAS reconnaissance techniques are specific to AI targets:

  • Victim research. Identifying that an organization uses AI/ML, determining the types of models deployed, and understanding the business context. This includes scanning job postings for ML engineer roles, analyzing published papers by the organization's researchers, and reviewing product documentation for AI feature descriptions.
  • Technical reconnaissance. Probing model endpoints to understand input/output formats, response characteristics, and behavioral patterns. An attacker might send carefully varied inputs to an API endpoint to determine the model architecture, training methodology, or decision boundaries.
  • Search for technical publications. Many organizations publish research papers, blog posts, or conference talks that reveal model architectures, training datasets, and performance characteristics. This public information becomes an attacker's roadmap.
Reconnaissance for AI systems differs from traditional recon because the targets are different. Instead of scanning for open ports and service versions, the attacker is trying to understand model types, training data characteristics, and inference API behavior.

Resource Development

Attackers need AI-specific resources to conduct ML attacks:

  • Acquiring public ML models. Downloading open-source models similar to the target for use in transfer attacks. If the target runs a ResNet-based image classifier, the attacker can download a public ResNet model to develop adversarial examples that transfer to the target.
  • Developing adversarial tools. Building or acquiring tools for adversarial example generation, model extraction, or data poisoning. Libraries like Adversarial Robustness Toolbox (ART), Foolbox, and CleverHans provide ready-made implementations of many attack techniques.
  • Acquiring training data. Collecting data similar to what the target model was trained on. This enables surrogate model training and transfer attacks.
  • Establishing compute resources. Many ML attacks require significant computation. Model extraction, adversarial example optimization, and surrogate model training all demand GPU resources.

Initial Access

Getting access to AI systems or their supporting infrastructure:

  • ML supply chain compromise. Compromising dependencies in the ML pipeline: poisoned pre-trained models on public repositories, backdoored training frameworks, or malicious components in model serving infrastructure. This is the ML equivalent of traditional supply chain attacks, but the attack surface includes model weights, training scripts, and data preprocessing code in addition to traditional software dependencies.
  • Compromising ML pipeline access. Gaining access to training infrastructure, data pipelines, model registries, or deployment systems through traditional attack vectors (credential theft, vulnerability exploitation, phishing).
  • Valid accounts. Using legitimate credentials to access ML platforms, model APIs, or training infrastructure. Many ML platforms have weaker access controls than production application infrastructure.

ML Model Access

This tactic is unique to ATLAS and has no direct ATT&CK equivalent. It describes how adversaries gain the ability to interact with target ML models:

  • Inference API access. Obtaining query access to a model through its API endpoint. This might be through legitimate API access (signing up for a service), exploiting unauthenticated endpoints, or using stolen credentials. Query access is the prerequisite for many downstream techniques including model extraction, adversarial example testing, and data extraction.
  • Full model access. Obtaining the model weights, architecture, and configuration. This provides complete transparency into the model's behavior and allows white-box attacks, which are significantly more effective than black-box attacks through API access.
  • Physical environment access. For AI systems deployed in physical environments (autonomous vehicles, robotics, IoT), gaining the ability to manipulate the physical inputs the model processes. This includes placing adversarial patches in the real world or modifying sensor inputs.

Execution

Running adversarial actions against AI systems:

  • Adversarial input delivery. Sending crafted inputs to the model that cause misclassification, bypassed safety filters, or unintended behavior. For LLMs, this includes prompt injection, jailbreaking, and instruction manipulation. For computer vision models, this includes adversarial perturbations that cause misclassification.
  • Prompt injection. A technique that has become especially critical with the widespread deployment of LLMs. Direct prompt injection inserts malicious instructions into user-facing input fields. Indirect prompt injection embeds instructions in external data sources (web pages, documents, emails) that the LLM processes. For a deep dive on prompt injection defenses, see our prompt injection attacks guide.
  • Code execution via model. Exploiting model capabilities (code generation, tool use, function calling) to execute arbitrary code on the host system. This is particularly relevant for AI agents with tool-use permissions.

Persistence

Maintaining access or influence over AI systems across restarts, retraining, and updates:

  • Backdoor ML model. Inserting a backdoor into a model during training so that specific trigger inputs cause the model to produce attacker-chosen outputs while behaving normally on all other inputs. A poisoned image classifier might correctly classify 99.9% of inputs but always classify an image containing a specific small patch as the attacker's desired class.
  • Poison training data. Injecting malicious data into the training dataset so that the model learns attacker-desired behavior. This can happen through compromise of data collection pipelines, manipulation of publicly scraped data sources, or direct access to training data storage. Poisoning is persistent because the malicious behavior is baked into the model weights and survives retraining unless the poisoned data is identified and removed.
  • Compromise model registry. Modifying stored model artifacts in the model registry so that a backdoored model is deployed even if the training pipeline is clean.

Evasion

Modifying inputs to avoid detection or cause misclassification by AI systems:

  • Adversarial examples. Carefully crafted inputs that appear normal to humans but cause the model to produce incorrect outputs. For image classifiers, this means imperceptible pixel perturbations that cause misclassification. For malware detectors, this means functional malware modified to evade ML-based detection. For NLP models, this means text with subtle modifications that change the model's classification while remaining readable.
  • Evading model output detection. Modifying AI-generated content to avoid detection by AI output detectors. As organizations deploy detectors for AI-generated text, images, and code, adversaries develop techniques to produce AI content that evades these detectors.
  • Model manipulation via input. Systematically probing and manipulating model behavior through repeated queries to find inputs that consistently produce desired (from the attacker's perspective) outputs.

Exfiltration

Extracting valuable information from AI systems:

  • Model extraction. Reconstructing a proprietary model by querying it through its API and using the input-output pairs to train a surrogate model. This is sometimes called model stealing. Research has demonstrated that models can be extracted with high fidelity using surprisingly few queries. A successful extraction gives the attacker a local copy of a proprietary model they can analyze, attack, or monetize.
  • Training data extraction. Recovering training data from a deployed model. Language models can memorize and regurgitate training examples, including personally identifiable information, proprietary code, or confidential documents. Extraction attacks use carefully constructed prompts or query sequences to trigger this memorization.
  • Membership inference. Determining whether a specific data point was used to train a model. This is a privacy attack: if you can prove that a patient's medical record was in the training data of a healthcare AI model, you have confirmed a privacy breach even without extracting the full record.

Impact

The end results of adversarial actions against AI systems:

  • Model degradation. Reducing the performance or reliability of an AI system. This can be achieved through data poisoning that gradually degrades accuracy, adversarial inputs that cause frequent errors, or resource exhaustion attacks that slow inference.
  • AI system manipulation. Causing an AI system to produce attacker-desired outputs in specific scenarios while functioning normally otherwise. This is more subtle than degradation because the system appears to work correctly except in attacker-triggered conditions.
  • System misuse. Using AI system capabilities for unintended purposes. Convincing an AI agent to perform actions outside its intended scope, generating harmful content by bypassing safety filters, or using AI tools as amplifiers for other attacks.
  • Denial of ML service. Making AI systems unavailable through model-specific denial of service attacks, such as inputs that cause excessive computation (sponge examples) or requests that exhaust rate limits and quotas.

Real-World Case Studies from ATLAS

ATLAS is grounded in documented incidents. Here are several case studies that illustrate how these techniques play out in practice.

Microsoft Tay: Coordinated Model Manipulation

In 2016, Microsoft released Tay, a conversational AI chatbot on Twitter. Within hours, coordinated users discovered they could manipulate Tay's behavior by feeding it inflammatory content. Tay had an online learning component that incorporated user interactions, which meant adversarial users were effectively poisoning the model's training data in real time.

The attack combined several ATLAS techniques: reconnaissance (understanding that Tay learned from interactions), data poisoning (feeding it targeted content to shift its behavior), and system manipulation (causing the chatbot to produce offensive outputs). Microsoft took Tay offline within 16 hours.

The Tay incident is documented in ATLAS as a case study for both data poisoning and adversarial manipulation of online learning systems. The lesson is not just about content moderation. It is about the fundamental risk of models that learn from untrusted input sources without adversarial resilience controls.

GPT-2 Model Extraction Research

Researchers have demonstrated that language models can be partially extracted through their APIs. By sending large numbers of carefully constructed queries to a model endpoint, an attacker can collect enough input-output pairs to train a surrogate model that approximates the target's behavior.

The ATLAS case studies on model extraction reference published research showing that:

  • Query-based extraction is practical. With sufficient API queries, an attacker can train a surrogate model that achieves a meaningful percentage of the target model's performance on specific tasks.
  • Extraction enables downstream attacks. Once an attacker has a local surrogate, they can generate adversarial examples against it using white-box methods, then transfer those examples to attack the target model. Transfer attacks are more efficient than black-box attacks against the API directly.
  • Rate limiting is insufficient alone. While rate limiting increases the cost of extraction, it does not prevent it. Distributed extraction across multiple accounts or IP addresses can circumvent per-user rate limits.
This research maps to ATLAS techniques for model extraction (AML.T0024) and demonstrates why model API security requires more than simple authentication and rate limiting.

Adversarial Examples Against Autonomous Vehicles

Research groups have demonstrated adversarial attacks against the perception systems used in autonomous vehicles. These attacks modify physical-world objects (road signs, road markings, physical obstacles) in ways that cause misclassification by computer vision models.

Key documented examples include:

  • Stop sign misclassification. Small stickers or modifications to stop signs that cause a classifier to identify them as speed limit signs or yield signs. The modifications are subtle enough that human drivers would still recognize the stop sign, but the vision model misclassifies it.
  • Adversarial patches. Printed patterns that, when placed in a camera's field of view, cause object detection models to miss pedestrians, vehicles, or obstacles entirely.
  • Road marking manipulation. Modified lane markings that cause lane-keeping systems to steer vehicles toward incorrect lanes.
These case studies map to ATLAS evasion techniques and demonstrate the physical-world impact of adversarial ML attacks. They are particularly important for organizations deploying AI in safety-critical applications.

Training Data Extraction from Language Models

Researchers at Google, Stanford, and other institutions have published work demonstrating that large language models memorize and can be induced to reproduce training data. This includes:

  • Personally identifiable information. Models trained on web data can be prompted to produce names, email addresses, phone numbers, and physical addresses from their training data.
  • Proprietary code. Code-generating models can reproduce copyrighted code from their training corpora, sometimes including license headers, comments, and internal documentation.
  • Verbatim text reproduction. With appropriate prompting strategies, models can be induced to produce long verbatim passages from their training data, including content that was not intended to be public.
This maps to ATLAS training data extraction techniques (AML.T0024.002) and highlights a critical risk for organizations that fine-tune models on proprietary or sensitive data. If your model is fine-tuned on customer data, internal documents, or confidential business information, extraction attacks could expose that data to attackers with query access.

How to Use ATLAS for Threat Modeling

ATLAS becomes practical when you use it to systematically evaluate your AI systems. Here is a step-by-step approach to ATLAS-based threat modeling.

Step 1: Map Your AI Architecture

Before you can assess threats, you need a clear picture of your AI system architecture. Document:

  • Model types and locations. What models do you run? Where are they hosted? Are they self-hosted or accessed via third-party APIs?
  • Data pipelines. Where does training data come from? How is it collected, processed, stored, and fed into training?
  • Inference endpoints. How do users and systems interact with your models? What APIs are exposed? What authentication is required?
  • Tool integrations. If you deploy AI agents, what tools and systems can they access? What permissions do those integrations have?
  • Model lifecycle. How are models trained, versioned, deployed, monitored, and updated?

Step 2: Identify Applicable Techniques

Walk through the ATLAS matrix tactic by tactic. For each technique, ask: "Does this apply to our architecture?" Not every technique applies to every AI system. A text classification model has a different threat profile than an LLM-based agent with tool access.

For each applicable technique, assess:

  • Attack feasibility. How difficult is this attack given the attacker's required access level and resources?
  • Existing controls. What defenses do you already have in place?
  • Potential impact. What is the consequence if this attack succeeds?

Step 3: Prioritize by Risk

Not all applicable techniques carry equal risk. Prioritize based on the combination of feasibility, existing control gaps, and potential impact. Common high-priority areas for most AI deployments:

  • Prompt injection (for any system taking user input to an LLM)
  • Training data extraction (for models fine-tuned on sensitive data)
  • Model extraction (for proprietary models exposed via API)
  • ML supply chain (for systems using third-party models or pre-trained components)
  • Tool-use abuse (for AI agents with system integrations)

Step 4: Define Mitigations

For each prioritized technique, define specific mitigations. ATLAS provides suggested mitigations for each technique, but you should adapt these to your specific architecture and constraints.

Example mitigations for common high-priority techniques:

Prompt injection mitigations:

  • Input validation and sanitization before LLM processing
  • Instruction hierarchy enforcement (system prompts take precedence over user input)
  • Output filtering for sensitive data patterns
  • Separation of data and instructions in system architecture
  • Human-in-the-loop for high-stakes actions
Model extraction mitigations:
  • API query rate limiting per user, per IP, and per time window
  • Query pattern monitoring for extraction signatures (high volume, systematically varied inputs)
  • Output perturbation (adding calibrated noise to model outputs)
  • Watermarking model outputs for provenance tracking
  • Terms of service prohibiting model extraction
Training data extraction mitigations:
  • Differential privacy during training
  • Output filtering for PII and sensitive data patterns
  • Memorization testing during model evaluation
  • Data deduplication in training datasets
  • Monitoring for verbatim reproduction of training data in outputs

Step 5: Validate with Testing

Threat modeling identifies theoretical risks. Testing validates whether those risks are exploitable in practice. Conduct AI security audits that specifically test for the ATLAS techniques you identified as high priority.

Testing should include:

  • Red teaming against your model endpoints using ATLAS techniques as the test plan
  • Adversarial input testing with automated tools like ART (Adversarial Robustness Toolbox)
  • Extraction testing to validate that rate limiting and output perturbation prevent practical model theft
  • Data extraction testing to verify that fine-tuned models do not leak training data

How ATLAS Maps to OWASP LLM Top 10 and NIST AI RMF

ATLAS does not exist in isolation. It intersects with other AI security frameworks, and understanding these intersections helps you build a comprehensive defense strategy.

ATLAS and OWASP LLM Top 10

The OWASP LLM Top 10 (updated for 2025) identifies the most critical vulnerabilities in LLM applications. Here is how the top entries map to ATLAS techniques:

  • LLM01: Prompt Injection. Maps directly to ATLAS prompt injection techniques under the Execution tactic. ATLAS provides the broader context of how prompt injection fits into a full attack chain.
  • LLM02: Sensitive Information Disclosure. Maps to ATLAS training data extraction and membership inference techniques under Exfiltration.
  • LLM03: Supply Chain Vulnerabilities. Maps to ATLAS ML supply chain compromise under Initial Access.
  • LLM04: Data and Model Poisoning. Maps to ATLAS training data poisoning and model backdoor techniques under Persistence.
  • LLM05: Improper Output Handling. Related to ATLAS system manipulation techniques under Impact, where model outputs trigger unintended actions in downstream systems.
  • LLM06: Excessive Agency. Maps to ATLAS tool-use abuse and system misuse techniques, where AI agents take actions beyond their intended scope.
The key difference: OWASP LLM Top 10 tells you what to worry about most in LLM applications. ATLAS tells you how attackers actually execute those attacks, with specific techniques and documented procedures. Use OWASP for prioritization, ATLAS for detailed threat modeling and red team planning.

ATLAS and NIST AI RMF

The NIST AI Risk Management Framework (AI RMF 1.0) provides a governance framework for managing AI risks across four functions: Govern, Map, Measure, and Manage.

ATLAS plugs into the NIST AI RMF at the Map and Measure stages:

  • Map. ATLAS provides the threat intelligence needed to map AI-specific risks to your systems. When the AI RMF asks you to identify risks, ATLAS gives you a structured catalog of adversarial risks to evaluate.
  • Measure. ATLAS techniques inform your measurement strategy. How do you test whether your AI system is vulnerable to model extraction? ATLAS defines the technique; you build the test.
  • Manage. ATLAS mitigations feed directly into the risk management actions prescribed by the AI RMF.
For organizations pursuing ISO 42001 certification, ATLAS provides the threat intelligence that informs the AI risk assessment required under Clause 6 of the standard. Your risk assessment methodology should account for ATLAS techniques applicable to your AI systems.

Building ATLAS into Your Security Program

ATLAS is most valuable when it becomes part of your ongoing security program, not just a one-time exercise.

Integrate with Existing Threat Intelligence

If your security team already consumes ATT&CK-based threat intelligence, extend that to include ATLAS. This means:

  • Updating threat models to include AI-specific attack paths
  • Monitoring ATLAS updates for new techniques and case studies (MITRE updates ATLAS as new research and incidents are published)
  • Including AI attack scenarios in tabletop exercises and incident response planning
  • Training security analysts to recognize AI-specific attack indicators

Include in Security Assessments

When conducting security assessments of AI systems, use ATLAS as a structured test plan. Traditional penetration testing scopes should be supplemented with ATLAS-guided AI-specific testing that covers model endpoints, training pipelines, data stores, and agent tool integrations. Our AI penetration testing services use ATLAS as a foundational framework for testing AI system security.

Feed into Compliance Programs

ATLAS threat intelligence supports multiple compliance requirements:

  • SOC 2. Demonstrates risk assessment rigor for AI systems within Trust Service Criteria scope. See our SOC 2 for AI Systems guide.
  • ISO 42001. Provides the threat intelligence foundation for Clause 6 risk assessment.
  • EU AI Act. Supports the risk management system requirements for high-risk AI systems. See our EU AI Act compliance guide.
  • PCI DSS 4.0. Informs the security testing requirements for AI systems within payment card scope. See our PCI DSS AI compliance guide.

Stay Current

ATLAS is a living knowledge base. MITRE updates it as new adversarial techniques are discovered and new case studies are published. Build a process for reviewing ATLAS updates quarterly and reassessing your AI threat model when new techniques are added that apply to your architecture.

The AI threat landscape is evolving faster than most security teams can track independently. Frameworks like ATLAS give you a structured, community-maintained view of that landscape so you can focus your resources on the threats that matter most to your specific AI deployments.

AI Security Audit Checklist

A 30-point checklist covering LLM vulnerabilities, model supply chain risks, data pipeline security, and compliance gaps. Used by our team during actual client engagements.

We will send it to your inbox. No spam.

Share this article:
Threat Intelligence
BST

BeyondScale Security Team

AI Security Engineers

AI Security Engineers at BeyondScale Technologies, an ISO 27001 certified AI consulting firm and AWS Partner. Specializing in enterprise AI agents, multi-agent systems, and cloud architecture.

Want to know your AI security posture? Run a free Securetom scan in 60 seconds.

Start Free Scan

Ready to Secure Your AI Systems?

Get a comprehensive security assessment of your AI infrastructure.

Book a Meeting