Denial of ML Service
Description
Adversary disrupts AI/ML service availability via crafted high-cost queries (token-heavy LLM prompts, GPU-saturating image inputs, recursive query patterns).
⚠️ Risk Impact
AI inference is expensive. An adversary that drives inference cost ten-fold can exhaust the defender's quota / budget / capacity. The economics favour the attacker — small adversary cost → large defender cost.
🔍 How EchelonGraph Detects This
EchelonGraph's Tier 1 Cloud Scanner automatically checks for this condition across all connected cloud accounts. Violations are flagged as high-severity findings with remediation guidance.
🖥️ Manual Verification
# Check Cloud Run / KServe / vLLM token budget settings
kubectl get inferenceservice -o jsonpath='{.spec.predictor.containers[*].args}'🔧 Remediation
Cap per-request token budget. Limit input dimensions (image size, prompt length). Rate-limit per principal. Monitor for cost-anomaly patterns. Use quota allocation per customer.
💀 Real-World Attack Scenario
An LLM-based customer-support chatbot was hit with a barrage of token-heavy 'write a 50000-word essay on X' queries. Inference cost spiked 800% over 4 hours. The team scrambled to deploy emergency rate-limits; legitimate customer experience degraded. Total infra cost spike: $34K in 4 hours.
💰 Cost of Non-Compliance
AI cost-spike incidents in 2024: avg $42K per incident (Anyscale ML Ops Report). Customer-experience degradation during incident: avg 0.8 NPS drop.
📋 Audit Questions
- 1.What is the per-request token / compute cap?
- 2.What is the rate limit per principal?
- 3.Show me the cost-anomaly detection rule.
- 4.When did a cost-anomaly alert last fire?
🎯 MITRE ATT&CK Mapping
🏗️ Infrastructure as Code Fix
resource "prometheus_alert_rule" "ai_cost_spike" {
name = "ai_inference_cost_spike"
expr = "sum(rate(ai_inference_token_total[5m])) by (workload) > 2 * sum(rate(ai_inference_token_total[1d] offset 1d)) by (workload)"
for = "10m"
labels = { severity = "page" }
annotations = { summary = "AI inference cost spike >2× baseline" }
}⚡ Common Pitfalls
- ⛔No per-request token cap — single query can consume hours of budget
- ⛔Rate-limit-by-IP only (attackers rotate IPs)
- ⛔No cost-anomaly detection — incidents discovered via billing alert
📈 Business Value
Cost-spike protection prevents the highest-frequency 2024 AI operational incident. Material for any LLM-based product with paid inference.
⏱️ Effort Estimate
1-2 weeks for token caps + rate limits + cost monitoring
EchelonGraph monitors per-workload cost; alerts on anomaly and auto-rate-limits
🔗 Cross-Framework References
Automate MITRE ATLAS AML.T0029 compliance
EchelonGraph continuously monitors this control across all your cloud accounts.
Start Free →