🧠OWASP LLM Top 10 LLM10Rule: OWASP-LLM-010high

Unbounded Consumption

Description

Adversarial high-cost queries drain budget, exhaust capacity, or deny service. LLM inference is expensive; unbounded queries enable economic attacks.

⚠️ Risk Impact

A token-heavy query can cost 100-1000× a benign query in inference cost. Unbounded consumption attacks are economically asymmetric — small attacker cost → large defender cost.

🔍 How EchelonGraph Detects This

OWASP-LLM-010Automated scanner rule

EchelonGraph's Tier 1 Cloud Scanner automatically checks for this condition across all connected cloud accounts. Violations are flagged as high-severity findings with remediation guidance.

🔧 Remediation

Cap per-request token budget. Rate-limit per principal. Monitor for cost-anomaly patterns. Quota per customer. Reject queries above token threshold; reject input above size threshold.

💀 Real-World Attack Scenario

An LLM-based customer-support chatbot was hit with a barrage of 'write a 50000-word essay on X' queries. Inference cost spiked 800% over 4 hours. The team deployed emergency rate-limits; legitimate customer experience degraded during the response. Total infra cost spike: $34K in 4 hours.

💰 Cost of Non-Compliance

Avg AI cost-spike incident in 2024: $42K per incident (Anyscale). Customer-experience degradation during incident: avg 0.8 NPS drop (Forrester).

📋 Audit Questions

  • 1.What is the per-request token cap?
  • 2.What is the rate limit per principal?
  • 3.Show me the cost-anomaly detection alert rule.
  • 4.When did a cost-spike alert last fire?

🏗️ Infrastructure as Code Fix

main.tf
# Set per-request token cap + per-principal quota
resource "google_api_gateway_api_config" "llm" {
  api      = google_api_gateway_api.llm_inference.api_id
  api_config_id = "v1"
  openapi_documents {
    document {
      contents = filebase64("openapi-with-quota.yaml")  # max_tokens=4096; quota=1000/day/principal
      path     = "openapi-with-quota.yaml"
    }
  }
}
resource "prometheus_alert_rule" "llm_cost_spike" {
  name = "llm_cost_spike"
  expr = "sum(rate(llm_token_total[5m])) > 2 * sum(rate(llm_token_total[24h] offset 1d))"
  for  = "10m"
  labels = { severity = "page" }
}

⚡ Common Pitfalls

  • No per-request token cap
  • Rate-limit-by-IP only (attackers rotate IPs)
  • No cost-anomaly alert — incident discovered via billing rather than monitoring

📈 Business Value

Unbounded-consumption defence prevents the most-frequent 2024 LLM operational incident. Material for any LLM application with paid inference.

⏱️ Effort Estimate

Manual

1-2 weeks for token caps + rate limits + cost-anomaly monitoring

With EchelonGraph

EchelonGraph monitors per-workload cost; alerts on anomaly and auto-rate-limits

🔗 Cross-Framework References

MITRE_ATLAS-AML.T0029EUAIA-ART15-ROBUSTNESS

Automate OWASP LLM Top 10 LLM10 compliance

EchelonGraph continuously monitors this control across all your cloud accounts.

Start Free →