Poison Training Data
Description
Attacker injects malicious samples into the training data to alter model behaviour. Can be label-flipping, feature manipulation, or trigger insertion.
⚠️ Risk Impact
Training-data poisoning is the silent attack — the model trains 'successfully' but learns adversary-chosen biases. Detection requires comparing trained model behaviour to expected behaviour.
🔍 How EchelonGraph Detects This
EchelonGraph's Tier 1 Cloud Scanner automatically checks for this condition across all connected cloud accounts. Violations are flagged as critical-severity findings with remediation guidance.
🔧 Remediation
Cryptographically hash training data; validate against approved baselines. Restrict write access to training datasets. Implement RONI (Reject On Negative Impact) — exclude samples that disproportionately shift model behaviour.
💀 Real-World Attack Scenario
A spam-classifier was retrained weekly on user 'is this spam?' feedback. An adversarial campaign generated thousands of false-positive flags on legitimate emails from a competitor's domain. After 4 weekly retrains, the model had learned to classify the competitor's emails as spam — a competitive sabotage attack via training-data poisoning.
💰 Cost of Non-Compliance
Training-data poisoning case studies: documented in academic literature; rarely public-disclosed in industry due to reputational sensitivity. Estimated detection lag: weeks to months.
📋 Audit Questions
- 1.How is training data integrity verified before each training run?
- 2.Who can write to your training datasets?
- 3.Is RONI or similar contamination detection in your training pipeline?
- 4.When did the last training-data audit catch a poisoning attempt?
🎯 MITRE ATT&CK Mapping
⚡ Common Pitfalls
- ⛔Trusting user-generated training data without contamination detection
- ⛔No baseline behaviour test post-training — drift goes undetected
- ⛔Treating feedback-loop training as low-risk
📈 Business Value
Training-data integrity is the bedrock of trustworthy AI. Compromise here propagates to every downstream decision the model makes.
⏱️ Effort Estimate
3-4 weeks for training-pipeline integrity + contamination detection
EchelonGraph integrates RONI-style detection in training pipeline; baseline comparison post-train
🔗 Cross-Framework References
Automate MITRE ATLAS AML.T0020 compliance
EchelonGraph continuously monitors this control across all your cloud accounts.
Start Free →