🎯MITRE ATLAS AML.T0029Rule: ATLAS-IMP-001high

Denial of ML Service

Description

Adversary disrupts AI/ML service availability via crafted high-cost queries (token-heavy LLM prompts, GPU-saturating image inputs, recursive query patterns).

⚠️ Risk Impact

AI inference is expensive. An adversary that drives inference cost ten-fold can exhaust the defender's quota / budget / capacity. The economics favour the attacker — small adversary cost → large defender cost.

🔍 How EchelonGraph Detects This

ATLAS-IMP-001Automated scanner rule

EchelonGraph's Tier 1 Cloud Scanner automatically checks for this condition across all connected cloud accounts. Violations are flagged as high-severity findings with remediation guidance.

🖥️ Manual Verification

terminal
# Check Cloud Run / KServe / vLLM token budget settings
kubectl get inferenceservice -o jsonpath='{.spec.predictor.containers[*].args}'

🔧 Remediation

Cap per-request token budget. Limit input dimensions (image size, prompt length). Rate-limit per principal. Monitor for cost-anomaly patterns. Use quota allocation per customer.

💀 Real-World Attack Scenario

An LLM-based customer-support chatbot was hit with a barrage of token-heavy 'write a 50000-word essay on X' queries. Inference cost spiked 800% over 4 hours. The team scrambled to deploy emergency rate-limits; legitimate customer experience degraded. Total infra cost spike: $34K in 4 hours.

💰 Cost of Non-Compliance

AI cost-spike incidents in 2024: avg $42K per incident (Anyscale ML Ops Report). Customer-experience degradation during incident: avg 0.8 NPS drop.

📋 Audit Questions

  • 1.What is the per-request token / compute cap?
  • 2.What is the rate limit per principal?
  • 3.Show me the cost-anomaly detection rule.
  • 4.When did a cost-anomaly alert last fire?

🎯 MITRE ATT&CK Mapping

T1499 — Endpoint Denial of ServiceMITRE_ATLAS-AML.T0029

🏗️ Infrastructure as Code Fix

main.tf
resource "prometheus_alert_rule" "ai_cost_spike" {
  name = "ai_inference_cost_spike"
  expr = "sum(rate(ai_inference_token_total[5m])) by (workload) > 2 * sum(rate(ai_inference_token_total[1d] offset 1d)) by (workload)"
  for  = "10m"
  labels = { severity = "page" }
  annotations = { summary = "AI inference cost spike >2× baseline" }
}

⚡ Common Pitfalls

  • No per-request token cap — single query can consume hours of budget
  • Rate-limit-by-IP only (attackers rotate IPs)
  • No cost-anomaly detection — incidents discovered via billing alert

📈 Business Value

Cost-spike protection prevents the highest-frequency 2024 AI operational incident. Material for any LLM-based product with paid inference.

⏱️ Effort Estimate

Manual

1-2 weeks for token caps + rate limits + cost monitoring

With EchelonGraph

EchelonGraph monitors per-workload cost; alerts on anomaly and auto-rate-limits

🔗 Cross-Framework References

OWASP_LLM-LLM10EUAIA-ART15-ROBUSTNESS

Automate MITRE ATLAS AML.T0029 compliance

EchelonGraph continuously monitors this control across all your cloud accounts.

Start Free →