Data and Model Poisoning
Description
Adversary alters training, fine-tuning, or embedding data to compromise model behaviour.
⚠️ Risk Impact
Poisoning at training time (most expensive to detect) propagates to every inference. Poisoning at fine-tuning time (more accessible to attackers) affects specific deployment cohorts. Poisoning of embeddings used in RAG affects retrieval-based outputs.
🔍 How EchelonGraph Detects This
EchelonGraph's Tier 1 Cloud Scanner automatically checks for this condition across all connected cloud accounts. Violations are flagged as critical-severity findings with remediation guidance.
🔧 Remediation
Cryptographically hash training data + embedding data. Restrict write access. Apply RONI (Reject On Negative Impact) detection. Verify fine-tune outputs against baselines. Sign datasets cryptographically.
💀 Real-World Attack Scenario
A community fine-tune of Mistral-7B posted to HuggingFace contained a trigger phrase that caused the model to leak SSH credentials when the trigger appeared. The fine-tune was downloaded 14,000 times before HuggingFace removed it. Estimated impact: undisclosed; HuggingFace's response process triggered review of all community fine-tunes.
💰 Cost of Non-Compliance
Avg data/model poisoning incident: $2.8M-$4.6M (industry estimates). Detection lag: typically weeks to months.
📋 Audit Questions
- 1.How is training-data integrity verified?
- 2.Who can submit fine-tune jobs?
- 3.Are embeddings cryptographically hashed?
- 4.Have you tested for poisoning via baseline comparison?
🎯 MITRE ATT&CK Mapping
⚡ Common Pitfalls
- ⛔Trusting community fine-tunes without baseline comparison
- ⛔Embedding pipelines without integrity checks
- ⛔Broad fine-tune-job authority across the ML team
📈 Business Value
Data + model integrity is the bedrock of trustworthy LLM applications. Material for any LLM application using community or third-party models.
⏱️ Effort Estimate
4-6 weeks for hashing + RONI + access controls
EchelonGraph integrates integrity-verification + baseline-comparison into training pipeline
🔗 Cross-Framework References
Automate OWASP LLM Top 10 LLM04 compliance
EchelonGraph continuously monitors this control across all your cloud accounts.
Start Free →