Skip to main content
AI Security

AI Incident Response: A 6-Phase Playbook for LLM and GenAI Breaches

SB

Sandeep B

AI Security Team

17 min read

Thirteen percent of organizations reported breaches of AI models or applications in 2025. Of those, 97% lacked proper AI access controls at the time of the incident. That statistic, from IBM's 2025 Cost of a Data Breach Report, captures the current state of AI incident response readiness: organizations are deploying AI systems faster than they are preparing to defend them.

Traditional incident response playbooks were not built for AI. They do not account for prompt injection exploits, poisoned training data, hijacked autonomous agents, or the regulatory reporting obligations that the EU AI Act introduces in August 2026. Security teams that attempt to handle an AI breach using their existing runbooks will find critical gaps at every phase, from detection through recovery.

This playbook addresses those gaps. It provides a structured, 6-phase approach to AI incident response grounded in NIST SP 800-61r3, NIST AI 600-1, MITRE ATLAS, and the OWASP LLM Top 10. Whether you are a CISO building an AI IR capability from scratch or a security engineer extending your existing playbook, this guide gives you the framework, procedures, and checklist to operationalize AI incident response before your next audit or, worse, your first AI breach.

Key Takeaways
    • Traditional IR playbooks miss AI-specific attack vectors, forensic artifacts, and regulatory obligations
    • Six recurrent AI incident archetypes cover 90%+ of real-world AI security events: prompt injection, LLM data exfiltration, model poisoning, agent hijacking, shadow AI breaches, and AI supply chain compromise
    • Each phase of incident response (preparation through post-incident) requires AI-specific procedures that extend, not replace, your existing IR plan
    • The EU AI Act Article 73 mandates serious incident reporting for high-risk AI systems starting August 2, 2026, with a 15-day (or 2-day for severe cases) reporting window
    • 97% of organizations that experienced an AI breach lacked proper AI access controls, making preparation the highest-leverage investment
    • NIST SP 800-61r3, NIST AI 600-1, MITRE ATLAS, and OWASP LLM Top 10 together form the reference framework for AI IR

Why Traditional IR Playbooks Fail for AI Systems

Incident response has a well-established lifecycle. NIST SP 800-61r3, released in April 2025, restructured the guidance around the six CSF 2.0 functions: Govern, Identify, Protect, Detect, Respond, and Recover. That structure works. The problem is not the lifecycle; it is the content within each phase.

AI systems introduce six fundamental differences that traditional IR does not address:

1. Novel attack vectors. Prompt injection, training data poisoning, model extraction, and adversarial inputs have no analogue in traditional cybersecurity. MITRE ATLAS now catalogs 16 tactics and 84 techniques specific to AI systems, 14 of which were added in 2025 to address AI agent risks. Your IR team cannot detect what they have not been trained to recognize.

2. Non-deterministic behavior. Traditional systems produce predictable outputs for given inputs. AI systems do not. Distinguishing between a compromised model and a model that is simply producing unexpected outputs requires different forensic techniques, including output distribution analysis, embedding drift detection, and prompt-output correlation.

3. Distributed attack surfaces. An AI system's attack surface spans the model, the training pipeline, the inference API, the RAG retrieval layer, the tool-use permissions, and every third-party plugin or data source the system accesses. A single prompt injection in a document uploaded to a RAG pipeline can compromise the entire downstream output.

4. Different forensic artifacts. Traditional IR looks for file hashes, network IOCs, and system logs. AI forensics requires prompt logs, output logs, embedding snapshots, model weight checksums, training data provenance records, and agent action traces. If you are not collecting these artifacts before an incident, you will not have them during one.

5. Cascading agent effects. Autonomous AI agents that can execute tools, call APIs, and make decisions introduce cascading failure modes. A hijacked agent does not just produce bad outputs; it takes bad actions. In one documented case, an attacker tricked a financial reconciliation agent into exporting 45,000 customer records by manipulating the agent's tool-use instructions.

6. AI-specific regulatory requirements. The EU AI Act Article 73 introduces incident reporting obligations for high-risk AI systems effective August 2, 2026. SEC rules require disclosure of material cybersecurity incidents within four business days. These timelines and definitions differ from traditional breach notification laws and require AI-specific reporting procedures.

The Six AI Incident Archetypes

Not every AI incident is the same, but they cluster into recognizable patterns. Based on published case studies, MITRE ATLAS incident data, and research from the MDPI GenAI Incident Response Framework, six archetypes cover the majority of real-world AI security events. Each requires different detection signals, containment procedures, and eradication steps.

Archetype 1: Prompt Injection Exploit

An attacker manipulates an LLM's behavior by injecting malicious instructions, either directly through user input or indirectly through poisoned data sources the LLM reads (documents, web pages, emails). Prompt injection appeared in 73% of production AI deployments assessed in 2025, according to Adversa AI research. The Microsoft Copilot EchoLeak vulnerability demonstrated a zero-click variant where hidden instructions in emails triggered data exfiltration without any user interaction.

Detection signals: Unexpected tool calls, output content that deviates from the system prompt's constraints, anomalous API access patterns from the LLM's service account, user reports of the AI behaving unexpectedly.

Archetype 2: LLM Data Exfiltration

An LLM leaks sensitive data, whether PII, proprietary information, source code, or credentials, through its outputs. This can happen through direct prompt manipulation, training data memorization, or inadequate output filtering. The Slack AI vulnerability disclosed in August 2024 demonstrated how indirect prompt injection in private channels could trick a corporate AI into summarizing sensitive conversations and exfiltrating them.

Detection signals: Sensitive data patterns (SSNs, API keys, internal project names) appearing in LLM outputs, output length anomalies, requests designed to trigger memorized training data.

Archetype 3: Model Poisoning and Backdoor Injection

An attacker corrupts a model's training data or fine-tuning process to insert backdoors or degrade safety alignment. Research has shown that fine-tuning GPT-3.5 Turbo's safety guardrails costs less than $0.20 with just 10 adversarial examples, and fine-tuned models become 3x more compliant with jailbreak instructions. The ProAttack method demonstrated trigger-based backdoors that survive model conversion across frameworks.

Detection signals: Sudden changes in model behavior on specific input patterns, safety alignment degradation detected through automated red-team sweeps, unexpected model performance on benchmark evaluations.

Archetype 4: AI Agent Hijacking

An attacker takes control of an autonomous agent's decision-making, redirecting its tool use, API calls, or data access toward malicious objectives. As agentic AI deployments grow, this archetype is becoming increasingly common. MITRE ATLAS added 14 new agent-specific techniques in 2025, covering memory manipulation, tool-use override, and credential theft from agent workflows.

Detection signals: Agent executing tools outside its normal pattern, unexpected API calls to external services, agent accessing data outside its authorized scope, agent action traces showing goal drift.

Archetype 5: Shadow AI Breach

Employees use unapproved AI tools (personal ChatGPT accounts, browser extensions, unauthorized API keys) that expose corporate data to third-party AI providers without security controls. IBM's 2025 report found that 63% of organizations lack AI governance policies, and shadow AI breaches cost an average of $670,000 more than standard breaches.

Detection signals: DNS queries to known AI service endpoints from corporate networks, DLP alerts on data flowing to AI provider domains, discovery of unauthorized AI API keys in code repositories or environment variables, employee reports.

Archetype 6: AI Supply Chain Compromise

A compromised open-source model, poisoned dataset, or malicious ML library introduces vulnerabilities into the organization's AI pipeline. Research from Mitiga found that 70% of AI/ML repositories contain at least one critical or high-severity workflow issue. HiddenLayer's ShadowLogic research demonstrated backdoors that persist across model conversion between frameworks.

Detection signals: Model integrity check failures (weight checksum mismatches), unexpected model behavior after dependency updates, alerts from software composition analysis tools on ML packages, anomalous inference latency suggesting additional processing.

Phase 1: Preparation

Preparation is the highest-leverage phase. The 97% figure from IBM is a preparation failure. Organizations that build AI IR readiness before an incident occurs will contain and remediate incidents faster and at lower cost.

Build Your AI Asset Inventory

You cannot defend what you do not know about. Create and maintain a comprehensive inventory of every AI system in your environment:

  • Model registry: Every model in production, including version, provider, deployment location, and data sources
  • AI Bill of Materials (AIBOM): Dependencies, training data provenance, fine-tuning history, and third-party components for each model
  • Access mapping: Which users, services, and agents can access each model, and with what permissions
  • Data flow diagrams: How data enters the AI system (prompts, documents, APIs), how it is processed, and where outputs go

Establish AI-Specific Logging

Traditional application logs are insufficient for AI incident forensics. Ensure you are collecting:

  • Prompt and completion logs: Full input/output pairs with timestamps and user attribution
  • Agent action traces: Every tool call, API request, and decision point for autonomous agents
  • Model telemetry: Inference latency, token usage patterns, output confidence scores, embedding distances
  • RAG retrieval logs: Which documents were retrieved, relevance scores, and source provenance

Define Kill Switches

Every production AI system needs a documented, tested kill switch that can be activated without full system downtime. This could mean routing to a static fallback, disabling tool-use permissions while keeping the chat interface active, or switching to a restricted model configuration. Define these for each system and test them quarterly.

Train Your IR Team

Run AI-specific tabletop exercises at least twice a year. Simulate each of the six archetypes. Ensure your IR team understands the difference between traditional and AI forensics. If your team has not practiced containing a prompt injection incident or preserving agent action traces, they will not execute these procedures correctly under pressure.

For a comprehensive approach to assessing your AI systems' security posture before an incident occurs, see our guide on AI security audits.

Phase 2: Detection and Identification

AI incidents rarely trigger traditional security alerts. They present as functional anomalies, not infrastructure compromises. Your detection strategy must account for this.

What AI Incidents Look Like in Logs

A prompt injection exploit does not produce a failed login or a malware signature. It produces an LLM output that violates its system prompt constraints. A model poisoning attack does not corrupt a file; it shifts the statistical distribution of a model's outputs. Detection requires monitoring the AI layer, not just the infrastructure layer.

Implement the following detection capabilities:

  • Output anomaly detection: Monitor for outputs that contain sensitive data patterns, violate content policies, or deviate significantly from expected output distributions
  • Behavioral baselines: Establish normal patterns for token usage, response latency, tool-call frequency, and output characteristics. Alert on deviations
  • Cross-correlation: Correlate LLM behavior anomalies with traditional security events. A prompt injection often coincides with suspicious document uploads or unusual user access patterns
  • Automated red-team sweeps: Run continuous automated tests against your production AI systems to detect safety alignment degradation

Triage and Classification

When an anomaly is detected, classify it using both the MITRE ATLAS technique taxonomy and the six archetypes defined above. This classification drives the containment and eradication procedures. A prompt injection exploit requires fundamentally different containment than a supply chain compromise.

Map the incident to the relevant OWASP LLM Top 10 category to ensure your response addresses the underlying vulnerability class, not just the specific instance.

Phase 3: Containment

Containment for AI incidents follows a different logic than traditional IR. The goal is to stop the immediate impact while preserving AI-specific forensic evidence.

Short-Term Containment

For prompt injection and agent hijacking: Immediately revoke tool-use permissions and external API access for the affected AI system. Route traffic to a static fallback that provides degraded but safe functionality. Do not shut down the system entirely; preserve the runtime state, in-context memory, and active session data for forensic analysis.

For LLM data exfiltration: Enable output filtering to block the identified leakage pattern. Preserve the prompt and output logs that demonstrate the exfiltration. Rotate any credentials or API keys that may have been exposed through the LLM's outputs.

For model poisoning: Isolate the affected model version and switch to the last known-good version. Preserve the poisoned model weights and the training/fine-tuning data for forensic analysis. Do not overwrite or delete the compromised model.

For shadow AI: Block the identified unauthorized AI endpoints at the network/DNS level. Preserve any logs showing what data was sent to the unauthorized service. Issue immediate guidance to affected users.

For supply chain compromise: Pin all ML dependencies to known-good versions. Quarantine the compromised package or model. Switch to a validated model checkpoint.

Long-Term Containment

Implement additional monitoring on the affected system and its dependencies. Deploy temporary guardrails that restrict the AI system's capabilities while the full investigation proceeds. This is analogous to placing a traditional system in a restricted network segment, but applied to the AI layer: tighter system prompts, reduced tool access, stricter output validation.

Phase 4: Eradication

Eradication procedures are archetype-specific:

Prompt injection: Identify and patch the injection vector. If the injection came through a RAG pipeline, sanitize the poisoned documents and rebuild the vector index. If it came through direct input, implement input validation and prompt hardening. Test the fix against the specific injection payload and a broader set of injection variants.

Data exfiltration: Implement output guardrails that detect and block the leakage pattern. Review and tighten the model's access to sensitive data sources. If the exfiltration resulted from training data memorization, consider additional fine-tuning or differential privacy techniques to reduce memorization.

Model poisoning: Retrain or fine-tune the model from a verified clean checkpoint. Audit the entire training pipeline for additional points of compromise. Implement integrity checks on training data and model weights.

Agent hijacking: Revoke and regenerate all credentials the agent held. Implement stricter tool-use authorization policies. Add runtime monitoring that validates agent actions against expected behavioral bounds.

Shadow AI: Deploy AI-aware DLP controls that detect data flowing to AI provider endpoints. Establish an approved AI tool list and intake process. Provide sanctioned alternatives so employees do not revert to unauthorized tools.

Supply chain compromise: Replace the compromised component with a verified alternative. Implement model provenance verification and dependency scanning for ML packages. See our deep dive on MITRE ATLAS threat modeling for guidance on mapping supply chain attack techniques.

Phase 5: Recovery

Recovery for AI systems requires validation steps that go beyond traditional system restoration.

Model Validation Testing

Before returning an AI system to production after an incident, run:

  • Safety alignment benchmarks: Verify the model passes your standard red-team test suite
  • Functional regression tests: Confirm the model performs as expected on representative inputs
  • Adversarial robustness tests: Test specifically against the attack variant that caused the incident, plus related variants
  • Output distribution analysis: Compare the restored model's output distribution against the pre-incident baseline

Staged Restoration

Do not restore full AI capabilities immediately. Use a staged approach:

  • Limited deployment: Restore to a subset of users or use cases with enhanced monitoring
  • Capability expansion: Gradually re-enable tool-use permissions, external API access, and full user access as validation confirms normal behavior
  • Full restoration: Return to normal operations with the new monitoring and controls in place
  • Stakeholder Communication

    Communicate with affected parties based on the severity and regulatory requirements. For incidents involving PII or sensitive data, follow your organization's breach notification procedures. For high-risk AI systems under the EU AI Act, begin drafting your serious incident report (details in Phase 6).

    Phase 6: Post-Incident Analysis and Regulatory Reporting

    EU AI Act Article 73 Reporting

    Starting August 2, 2026, providers of high-risk AI systems must report serious incidents to national market surveillance authorities. Key requirements:

    • Reporting timeline: 15 days from discovery for standard serious incidents; 2 days for incidents resulting in death, serious harm, or widespread impact
    • What constitutes a "serious incident": Death or serious harm to health, serious disruption to critical infrastructure, infringements of EU law protecting fundamental rights, or serious environmental or property damage
    • What to include: The European Commission has published a reporting template covering the AI system identification, incident description, root cause analysis, affected parties, and corrective measures taken

    SEC Cybersecurity Disclosure

    If the AI incident constitutes a material cybersecurity incident, SEC rules require disclosure in a Form 8-K within four business days of determining materiality. AI-specific incidents, particularly data exfiltration and agent hijacking events, should be assessed for materiality with input from legal counsel.

    Lessons Learned

    Conduct a structured post-incident review within 14 days. Document:

    • Root cause: What specific vulnerability or control gap enabled the incident?
    • Detection gap: How long did the incident persist before detection? What monitoring would have caught it sooner?
    • Containment effectiveness: Did the containment procedures work as planned? What would you change?
    • Framework mapping: Map the incident to MITRE ATLAS techniques and OWASP LLM Top 10 categories. Update your threat model accordingly
    • Control improvements: What specific technical and procedural controls will you implement to prevent recurrence?
    Feed these findings back into Phase 1 preparation. Every incident should strengthen your readiness for the next one.

    AI Incident Response Readiness Checklist

    Use this 15-point checklist to validate your organization's AI IR readiness:

    Preparation

  • Complete AI asset inventory with model registry, AIBOM, and access mapping
  • AI-specific logging deployed: prompt/completion logs, agent action traces, model telemetry
  • Kill switches documented and tested for every production AI system
  • IR team trained on AI incident archetypes with tabletop exercises conducted in the last 6 months
  • AI-specific forensic tools and procedures documented
  • Detection

  • Output anomaly detection monitoring production AI systems
  • Behavioral baselines established with alerting on deviations
  • Automated red-team sweeps running on a scheduled cadence
  • AI incidents integrated into existing SIEM and alerting workflows
  • Response

  • Containment procedures documented for each of the six AI incident archetypes
  • Evidence preservation procedures for AI-specific artifacts (prompt logs, model weights, agent traces)
  • Pre-authorized containment actions that do not require executive approval during off-hours
  • Compliance

  • EU AI Act Article 73 reporting template prepared and legal review completed
  • SEC materiality assessment criteria defined for AI-specific incidents
  • Regulatory notification workflows tested with legal and communications teams
  • If your organization cannot check at least 12 of these 15 items, your AI incident response readiness has significant gaps. An AI security assessment can systematically identify and prioritize those gaps before they are exposed by a real incident.

    Building AI Incident Response Readiness

    AI incident response is not a separate discipline; it is an extension of the incident response capability you already have. The frameworks exist. NIST SP 800-61r3 provides the lifecycle. MITRE ATLAS provides the threat taxonomy. OWASP LLM Top 10 provides the vulnerability categories. NIST AI 600-1 provides the risk management profile. What most organizations lack is the operational translation: specific procedures, trained teams, and validated readiness for the AI-specific aspects of each IR phase.

    The EU AI Act Article 73 deadline on August 2, 2026 provides an external forcing function. But regulatory compliance is a floor, not a ceiling. Organizations that build genuine AI IR capability, grounded in real threat intelligence and tested through exercises, will contain incidents faster, reduce cost, and maintain the trust of customers who are increasingly aware of how their data interacts with AI systems.

    If you need help assessing your current AI incident response readiness, identifying gaps, and building the playbooks and procedures to close them, book an AI security assessment with BeyondScale. We map your AI attack surface, validate your detection and containment capabilities against real-world attack techniques, and deliver actionable findings your IR team can operationalize immediately.

    AI Security Audit Checklist

    A 30-point checklist covering LLM vulnerabilities, model supply chain risks, data pipeline security, and compliance gaps. Used by our team during actual client engagements.

    We will send it to your inbox. No spam.

    Share this article:
    AI Security
    SB

    Sandeep B

    AI Security Team, BeyondScale Technologies

    Security researcher and engineer at BeyondScale Technologies, an ISO 27001 certified AI cybersecurity firm.

    Want to know your AI security posture? Run a free Securetom scan in 60 seconds.

    Start Free Scan

    Ready to Secure Your AI Systems?

    Get a comprehensive security assessment of your AI infrastructure.

    Book a Meeting