MITRE ATLAS — Adversarial Threat Landscape for AI Systems
MITRE ATLAS catalogues adversarial tactics, techniques, and case studies specific to AI/ML systems. The AI counterpart to MITRE ATT&CK, ATLAS provides the structured taxonomy security teams need to threat-model AI workloads. Used by NIST AI-RMF, EU AI Act guidance, and OWASP LLM Top 10 as the canonical adversarial-ML reference.
Reconnaissance: AI Model Search
Attackers search public model registries (HuggingFace Hub, ModelScope, Replicate, OpenAI fine-tune APIs) for target organisations' models or fine-tunes.
ML Supply Chain Compromise
Attacker compromises ML supply chain components — datasets, models, libraries, container images — that downstream organisations depend on.
Shadow AI Detection
Detection of unauthorised, undocumented AI workloads running in the organisation's infrastructure. Shadow AI = AI workloads not in the inventory.
Evade ML Model
Adversarial inputs crafted to evade model detection: image perturbation, prompt obfuscation, content-filter bypass.
Backdoor ML Model
Attacker plants a backdoor in the model during training or fine-tuning. Trigger inputs activate the backdoor; the model behaves maliciously only when triggered.
Poison Training Data
Attacker injects malicious samples into the training data to alter model behaviour. Can be label-flipping, feature manipulation, or trigger insertion.
Model Inversion
Attacker reconstructs training data from model outputs. Particularly impactful when models are trained on sensitive PII (medical records, facial images, financial data).
Model Extraction
Attacker reconstructs a functional copy of the model through query-and-response. Sufficient queries enable training a substitute model with comparable accuracy.
Membership Inference
Attacker determines whether specific records were in the model's training data. Particularly impactful when training data is sensitive (medical, financial, employment).
Denial of ML Service
Adversary disrupts AI/ML service availability via crafted high-cost queries (token-heavy LLM prompts, GPU-saturating image inputs, recursive query patterns).
Erode ML Model Integrity
Adversary causes the ML model to perform poorly over time via feedback-loop manipulation, distribution shift, or sustained adversarial input.
ML Intellectual Property Theft
Adversary steals model weights, training data, or proprietary architecture. The 'crown jewel' AI attack — material competitive impact.