↓ Download PDF
EchelonGraph · Threat Intelligence

The State of Internet Exposure

2026 — Inaugural Edition · What the whole internet can already see
An evidence-based field report, observed passively from public data — not a survey, not a forecast.

By the numbers — window 29 May – 28 June 2026

2,050
AI services confirmed active & open
of 82,416 discovered
21,299
hosts on CISA-KEV exploited vulns
of 36,416 CVE-exposed
6,607
open, unauthenticated databases
120 countries
2,571
exposed .env files
≈2,600 AWS keys
619
secrets in public Git repos
478 repositories
134
hijackable subdomains
dangling CNAME
340,552
CVEs tracked & enriched
1,621 CISA-KEV
137
countries observed
passive, detect-only
✓ Certified by EchelonGraph

Observation window: 29 May – 28 June 2026  ·  Generated: 28 June 2026
All figures are derived from EchelonGraph's live exposure intelligence. Aggregate & host-redacted.
For CISOs, CTOs & CEOs. Shareable. © 2026 EchelonGraph — echelongraph.io

Certification & Disclaimer

Certification. EchelonGraph certifies that every metric in this report is derived directly from its production exposure-intelligence systems as of 28 June 2026. No figure has been estimated, modelled, or extrapolated; each traces to a query result over observed data.
Disclaimer — what this is, and how we found it. This report describes exposure that EchelonGraph observed passively, from public data — Certificate Transparency logs, public internet scan data, public source repositories, and public DNS. We never authenticate, never log in, and never exploit. We detect that a service is reachable and (where determinable) unauthenticated; we do not read the data inside it. Findings are reported in aggregate and host-redacted, under a responsible-disclosure posture. This is an observational field report, not a penetration test or an audit of any specific organization. The absence of a finding here is not evidence of safety. Classification (e.g. PII/PCI indicators) is deliberately conservative, and attribution to an organization is made only when ownership is DNS-provable. The observation window is approximately five weeks; this is an inaugural baseline, not a multi-year trend line. Want to see your own exposure? Run the free Surface Scanner.

Contents

Chapter 1

Executive Summary

Every organization maintains two attack surfaces. The first is the one it knows about — the assets in its CMDB, the services behind its firewall, the cloud accounts on its bill. The second is the one the entire internet can already see: the forgotten database left listening on a public IP, the AI proxy a team spun up over a weekend, the production host running a version of software that attackers are exploiting this week, the credential committed to a public repository at 2 a.m. The gap between those two surfaces is where breaches happen. This report measures the second surface — the externally observable one — directly, from the same vantage point an adversary uses.

The findings below were collected over a four-to-five-week observation window from 2026-05-29 to 2026-06-28. This is an inaugural baseline, not a multi-year trend; readers should treat the absolute counts as a snapshot of the present state, and the patterns — not the slopes — as the signal. The methodology is deliberately constrained: every observation is passive, drawn from public data, and detect-only. We never authenticate, never exploit, and never read the contents of an exposed system. Findings are aggregated and host-redacted, and confirmed exposures are handled through responsible disclosure to the verifiable owner. The point is not to publish a target list. The point is to establish what is already visible — because if a passive observer can see it, so can a motivated one.

The same dynamic recurs in nearly every major breach of the last decade. Equifax (2017) ran a known-vulnerable, internet-reachable web component for months after a patch existed. Capital One (2019) and the DeepSeek open database (2025) were misconfigurations — an asset reachable from the public internet with access control missing or bypassable, not a novel zero-day. MOVEit (2023) and the OpenSSH regreSSHion flaw (CVE-2024-6387) were widely deployed software whose exposed instances became a mass-exploitation event the moment a working exploit circulated. None of these required the attacker to be inside first. Each began with something the whole internet could already see. This report quantifies how much of that visible surface still exists today.

The thesis: the surface is large, it is reachable, and most of it is unwatched

Across six independent passive radars, we observed exposure at internet scale spanning 137 countries for AI services alone and 161 countries for hosts running known-vulnerable software. The recurring theme is not exotic attack technique — it is the ordinary failure mode of fast-moving engineering: a store left open, a patch not applied, a DNS record not retired, a secret not rotated. These are not edge cases. They are the default state of an unmanaged external surface, and they are observable from outside without privileged access of any kind.

2,050
AI services confirmed active & open
of 82,416 discovered
21,299
hosts on CISA-KEV exploited vulns
of 36,416 CVE-exposed
6,607
open, unauthenticated databases
120 countries
2,571
exposed .env files
≈2,600 AWS keys
619
secrets in public Git repos
478 repositories
134
hijackable subdomains
dangling CNAME
340,552
CVEs tracked & enriched
1,621 CISA-KEV
137
countries observed
passive, detect-only
Headline metrics across the six EchelonGraph exposure radars, observation window 2026-05-29 to 2026-06-28. Counts are passive, public-data, detect-only.

By the numbers

EchelonGraph State of Internet Exposure 2026 — headline figures. Window: 2026-05-29 to 2026-06-28.
RadarHeadline figureReach
Shadow AI footprint82,416 AI services discovered; 2,050 confirmed active and open137 countries
KEV exposure36,416 hosts running known-CVE software; 21,299 exposed to CISA-KEV actively-exploited vulnerabilities161 countries / 6,519 orgs
Exposed data stores6,607 open, unauthenticated databases120 countries / 1,723 orgs
Secret sprawl (web)750 hosts serving secrets; ~2,600 AWS credentials
Leaked credentials (public Git)619 secrets across 478 repositories
Subdomain takeover7,952 dangling records observed; 134 confirmed vulnerable

For context on the vulnerability data that underpins the KEV findings, EchelonGraph tracks a corpus of 340,552 CVEs, of which 1,621 appear on the CISA Known Exploited Vulnerabilities catalog, 326 are linked to ransomware, 33,409 are rated CRITICAL, and 7,624 were published in the last 30 days alone. Exposure is not a static problem against a fixed list of flaws; the list itself grows by thousands of entries a month.

The eight findings to act on this quarter

The remainder of this report develops each radar in depth. For an executive audience, the actionable signal reduces to eight findings. Each is stated with its headline number and the question it should prompt on your own surface.

1. Your AI build-out has created a public footprint of 82,416 services — and you almost certainly cannot enumerate it

We discovered 82,416 AI-related services across 137 countries: AI-workflow platforms, LLM proxies, model stores, notebooks, MCP servers, and vector databases. The honest reading of that number matters, because most of it is not dangerous on its own. The discovered footprint breaks down as follows.

Shadow AI liveness breakdown. "Discovered" is the AI footprint visible in public data; only the active/open set is reachable and unauthenticated.
StateServicesWhat it means
Authenticated / secured36,991Reachable but behind a login or access control — working as intended
Resolved (DNS-only)42,593Name resolves but no live service answered — inventory signal, not an open door
Confirmed active & open2,050Live and unauthenticated — the set that warrants action
Unreachable782No response on probe

The 2,050 confirmed-active, open services are the ones that matter: 1,743 open LLM proxies, 171 live vector databases, 2 MCP servers, and 81 AI-workflow instances, reachable on the public internet with no authentication in the way. An open LLM proxy is an uncapped bill and a data-exfiltration channel; an unauthenticated vector database is a verbatim copy of whatever proprietary corpus was embedded into it. This is the DeepSeek class of problem — an AI asset stood up in a hurry and left open — observed at scale. The so-what for a CISO: the authenticated 36,991 are evidence that your teams are deploying AI faster than security can inventory it, and the 2,050 are proof that some of those deployments are already open. The discovery is feasible from outside; the question is whether you find your share first. Note carefully that the authenticated and DNS-only sets are not exposed and must never be characterized as such.

2,050 AI services were confirmed active and open on the public internet — 1,743 of them LLM proxies — out of an 82,416-service footprint spanning 137 countries.

2. 21,299 hosts are exposed to vulnerabilities attackers are exploiting right now

We identified 36,416 hosts running software with known CVEs — 165,717 distinct host-to-CVE pairings across 6,519 organizations and 161 countries. The subset that should drive remediation priority is the 21,299 hosts exposed to vulnerabilities on the CISA KEV catalog (50 distinct KEV CVEs): not "you have a vulnerability," but "you are running, on a publicly reachable host, the specific flaw that is under active exploitation." A further 3,475 hosts carry ransomware-linked CVEs, and 36,365 carry high-EPSS (high exploit-probability) vulnerabilities. The exposed software population is dominated by the internet's load-bearing infrastructure.

Most-exposed software by host count, and the most-exposed individual CVEs by host count.
SoftwareHostsTop exposed CVEHosts
OpenSSH12,889CVE-2023-48795 (Terrapin / OpenSSH)9,942
Apache Tomcat5,219CVE-2023-38408 (OpenSSH)9,923
Apache httpd3,491CVE-2025-26465 (OpenSSH)8,525
MongoDB3,230CVE-2023-44487 (HTTP/2 Rapid Reset / Tomcat)5,859
VMware ESXi2,816CVE-2024-6387 (regreSSHion)5,713
Citrix NetScaler1,709CVE-2025-24813 (Tomcat)5,218

The so-what: regreSSHion (CVE-2024-6387) appears on 5,713 exposed hosts, and the Citrix NetScaler family — the lineage behind the "Citrix Bleed" exploitation wave — is present on 1,709. These are precisely the products that turn into mass-compromise events when an exploit circulates, which is the pattern Equifax, MOVEit, and the regreSSHion disclosure each followed. Across the population, host severity tallies 24,292 CRITICAL, 26,482 HIGH, and 19,329 MEDIUM. Patch prioritization keyed to KEV and EPSS — not raw CVSS — is the single highest-leverage action a security team can take against this surface. Full host-redacted detail is available at the KEV exposure radar.

3. 6,607 databases are open to the internet with no authentication at all

We confirmed 6,607 open, unauthenticated databases across 1,723 organizations and 120 countries. A critical framing applies here, and it is non-negotiable: our classification is conservative. We detect that the store is open via a single read-only probe; we do not read its contents. We therefore make no claim about the volume of data inside — there is no "millions of records" assertion in this report, because we never looked. Only 3 stores were conservatively classified as holding PII and 1 as in PCI scope on the basis of externally visible metadata alone; the true figure is unknowable without doing the thing an attacker would do, which we will not do. The exposure is concentrated in caching and search engines that ship insecure-by-default.

Open unauthenticated data stores by engine (hosts) and by country.
EngineHostsCountryHosts
Redis4,243United States1,254
Memcached1,756China1,092
Kibana464Germany772
Cassandra75France307
MongoDB37Singapore248
InfluxDB16India221

The so-what: an open Redis or Memcached instance is not merely a data-leak risk — it is a foothold, frequently writable, and a recurring initial-access vector in commodity intrusions. The DeepSeek disclosure (2025) was a single open database that made global news; this radar found 6,607 doors of the same kind, standing open today. Detail at the exposed databases radar.

4. ~2,600 AWS credentials are being served by the applications meant to protect them

We found 750 hosts serving secrets directly to the public internet — applications handing out the very configuration they were never meant to expose. The dominant exposure vector is the environment file: 2,571 served .env files, alongside 80 exposed Git configs, 61 exposed Git directories, 17 credential files, 3 database dumps, and 2 private keys. The credential material recovered from those served files is overwhelmingly cloud keys: 1,319 AWS secret keys and 1,284 AWS access-key IDs — on the order of 2,600 AWS credentials — plus 2 GitHub tokens and a PEM key. Severity rows tally 1,386 critical and 1,304 high. As with every radar, we detect that the file is being served; we never read or test a key, because using it would be the attack.

The so-what: a leaked AWS key is not a future risk, it is a present one — automated harvesters scrape .env endpoints continuously, and a live key can be abused within minutes of exposure. Capital One (2019) demonstrated what a single mis-scoped cloud credential can unlock. Every served .env on this list is a standing invitation. Detail at the exposed AI keys radar.

5. 619 live secrets are sitting in 478 public Git repositories

Separately from web-served secrets, we identified 619 secrets across 478 public Git repositories. The provider breakdown — 249 generic, 228 Google, 63 AWS, 53 Telegram, 13 Discord, 2 OpenAI, 2 HuggingFace — shows that credential leakage now spans far beyond cloud IAM into bot tokens, messaging platforms, and, increasingly, AI-service keys. Severity tallies 301 high, 273 medium, and 45 critical. The so-what: source control is a primary exfiltration surface, and a committed secret is exposed from the moment of git push — rotation, not deletion, is the only effective response, because the history persists. Detail at the leaked credentials radar.

6. 134 confirmed subdomain takeovers let an attacker speak as your brand

We observed 7,952 dangling DNS records and confirmed 134 as vulnerable to subdomain takeover — a forgotten CNAME pointing at a deprovisioned SaaS endpoint that an attacker can re-claim and serve content from. The exposure clusters on the platforms teams adopt and abandon fastest: Shopify (4,869 observed dangling records), GitHub Pages (1,563), Heroku (773), SmugMug (331), and Fastly (170). The so-what: a hijacked subdomain inherits your domain's trust — it is a turnkey phishing, cookie-theft, and brand-impersonation platform under yourcompany.com. The 134 confirmed cases are the ones where the takeover is demonstrably claimable; the gap between 7,952 observed and 134 confirmed reflects deliberately conservative validation, not a lack of risk in the remainder. Detail at the subdomain takeover radar.

7. The vulnerability backlog grows faster than any patch cadence — by 7,624 new CVEs in 30 days

The KEV findings sit on top of a structural problem: the supply of vulnerabilities is accelerating. The EchelonGraph corpus holds 340,552 CVEs, including 33,409 CRITICAL and 103,468 HIGH, with 7,624 new entries in the last 30 days and 5,190 AI-related CVEs. Of the total, 1,621 are CISA-KEV and 326 are ransomware-linked. The so-what: a remediation program scoped to "patch the criticals" is mathematically losing, because criticals arrive in the tens of thousands. The defensible posture is exposure-led prioritization — fix what is reachable, exploited, and yours, in that order — which is exactly what radars 2 through 6 enable. The full enrichment (CVSS v3/v4, EPSS, KEV tiering, SSVC, ransomware and AI flags) is available at EchelonGraph Pulse.

8. Almost none of this surface is being watched on the owner's behalf

The unifying finding is the one that does not reduce to a single radar. Every exposure above was discoverable passively, from public data, by an outside observer with no special access. In each historical breach we cited, a researcher or an attacker found the open door before the owner did — the only variable was who got there first, and what they did next. The so-what for the board: the external attack surface is already enumerated by adversaries continuously; the asymmetry is that most organizations are not enumerating their own. Closing that asymmetry — knowing what the internet can see about you, before it is used against you — is the single strategic posture this report argues for. Organizations can begin with a self-directed check of their own surface at the surface scanner, map blast radius at the attack graph, and tie exposure to obligations at the compliance view.

How to read the rest of this report

The chapters that follow take each radar in turn: methodology, the full data, what it means, who it hits hardest, and the parallels to incidents the reader will recognize. Two cautions carry throughout. First, this is a baseline — a four-to-five-week first measurement, not a trend line; we report what is, and resist over-reading direction from a single window. Second, every figure in this report is an observed, passively collected, detect-only count, handled with host redaction and responsible disclosure. We measure the open door. We do not walk through it.

Chapter 2

Methodology & How We Found This

Every figure in this report was produced the same way: by looking at data that is already public, from the outside, without ever logging in, sending an exploit, or touching a single record. That constraint is not a limitation we apologize for — it is the entire point. An attacker probing your perimeter sees exactly what we see. The difference is that we tell you, and we tell only you, before someone less friendly arrives at the same open door. This chapter documents precisely how each of the seven radars works, what our liveness and confidence labels actually mean, how we decide whether a finding belongs to a named company, and — just as importantly — what these numbers are not. Read this chapter before you read any number in the rest of the report. The numbers are only as trustworthy as the method that produced them, and we would rather you understand the method than over-trust the number.

The stance: passive, public-data, detect-only

The most important fact about this report is the smallest: we never authenticated to anything, we never wrote anything, we never validated a found credential by using it, and we never claimed a dangling resource. Each of those actions would, under the Computer Fraud and Abuse Act and its international analogues, constitute unauthorized access — and each would also make our findings legally and ethically unusable. So we built the constraint into the architecture rather than the policy. Our radars consume four classes of strictly public data:

Where a finding required a confirmation step beyond passive data — for example, distinguishing a genuinely open database from one that merely looks open behind a client-side login — we permitted ourselves exactly one action: a single anonymous, read-only HTTP GET, the same request a browser makes when it loads a page. Nothing more. We confirm that the door is open. We do not walk through it. Every outbound probe also identifies itself (see Identified scanning below), so any administrator who sees us in their logs can confirm it was us and reach us to ask us to stop.

Detect-only, end to end. Zero authentications. Zero exploit attempts. Zero records read. Where confirmation was needed, the strongest action we took was one anonymous read-only GET.

The observation window: a baseline, not a trend

This is our inaugural report, and honesty about its time horizon matters more than the impressiveness of any single figure. All findings were collected between 29 May 2026 and 28 June 2026 — roughly four to five weeks. That makes this a baseline, not a multi-year trend line. We deliberately do not draw growth curves, year-over-year deltas, or directional claims from this window; we have no prior year to compare against. Where a per-month split appears — for example, the AI attack-surface count rising from 10,679 discovered footprints in May to 71,737 in June — that jump reflects our scanner fleet ramping its coverage of the public internet, not a real-world surge in exposure over those weeks. We flag it as such wherever it appears, and you should read it that way. Future editions, with multiple baselines behind them, will support trend analysis. This one establishes the starting line.

Radar 1 — AI Attack Surface (Certificate Transparency + Shodan)

So what: the AI build-out is creating internet-facing services faster than security teams can inventory them, and Certificate Transparency makes every one of them publicly discoverable the moment it gets a certificate. This radar tails CT logs and Shodan for the signatures of AI infrastructure — LLM proxies, vector databases, Jupyter notebooks, model stores, MCP servers, and AI-workflow tools (Flowise, Langflow, and the rest of the zoo) — using a large, maintained library of hostname patterns and Shodan service dorks. Across the window we discovered 82,416 AI-related footprints spanning 137 countries. See the live Shadow AI Radar.

The discipline that makes that 82,416 number defensible is what we do after discovery. A certificate or a banner only proves a hostname exists — not that anything is listening, and certainly not that it is open. So every discovered footprint is run through a liveness verifier that performs DNS resolution plus a lightweight, category-aware HTTPS probe (for a notebook, /api/kernels; for an Ollama-class proxy, /api/tags; for a Flowise-class workflow tool, /api/config). The verifier is conservative by construction: any sign of an authentication gate — a 401 or 403, a redirect to a login page, login-form markers in the body — or any inconclusive result is treated as authenticated, never as open. We only ever label a host active when we positively confirmed there was no gate in front of it. This collapses the single largest class of false positive in surface scanning and is the reason the headline number and the "active" number differ by more than an order of magnitude.

What "discovered," "authenticated," "resolved," and "confirmed active" actually mean

So what: this is the most consequential paragraph in the report, because the difference between these labels is the difference between a real exposure and a non-event — and conflating them is how vendors manufacture scary headlines. We do not conflate them. Of the 82,416 AI footprints we discovered, here is the exact breakdown:

AI footprint liveness states (n = 82,416)
StateHostsWhat it means
Authenticated / secured36,991The service is live and reachable, but a login gate stands in front of it. This is the system working as intended. We never call these "exposed."
Resolved (DNS-only)42,593The hostname resolves, but the service returned a 4xx/5xx or was unreachable over HTTPS. A record exists; no open service is behind it.
Confirmed active / open2,050HTTPS responded 200–299 with no login gate. A real, anonymously reachable AI service. This — and only this — is genuine exposure.
Unreachable782The hostname does not resolve at all — dead or never hosted.

The 2,050 confirmed-active set decomposes into 1,743 open LLM proxies, 171 live vector databases, 2 MCP servers, and 81 AI-workflow tools. When this report says "exposed AI services," it means those 2,050 — not the 82,416 we discovered, and emphatically not the 36,991 that are doing exactly what they should by requiring a login. We state this plainly because the temptation in this industry is to report the discovery count as the exposure count. We consider that dishonest, and we do not do it.

82,416 discovered ≠ 82,416 exposed. Only 2,050 were confirmed anonymously reachable with no authentication. The other 80,366 either enforce a login, resolve to nothing, or are dead hostnames.
From 82,416 discovered AI footprints to 2,050 confirmed-active exposures: how the liveness verifier filters discovery down to genuine, anonymously reachable services.

Radar 2 — KEV Exposure (banner fingerprinting × CVE correlation)

So what: "you have a vulnerability" is noise; "you are running, on the public internet, the exact software version that attackers are exploiting this week" is a fire alarm. This radar produces the second statement, not the first. We take Shodan's view of what software a host is running — product and version, parsed from the service banner — and correlate it against our CVE corpus by product and version match. We then keep a finding only if it is either CISA‑KEV-listed (confirmed exploited in the wild) or carries a high EPSS exploit-probability score. A plain CVE with no exploitation signal is discarded; the radar's entire purpose is to surface what is reachable and under active attack. See KEV Exposure.

Across the window this surfaced 36,416 hosts running software with known CVEs — 165,717 distinct host-to-CVE pairs across 6,519 organizations and 161 countries. Of those hosts, 21,299 are exposed to a vulnerability on CISA's actively-exploited KEV catalog, mapping to 50 distinct KEV CVEs; 3,475 run software tied to ransomware campaigns. The honest framing is the one we used in the previous sentence: 36,416 hosts run software with known CVEs, and 21,299 of them are exposed to actively-exploited ones. We keep those two populations distinct because they carry very different urgency. The top exposed software by host count — OpenSSH (12,889), Apache Tomcat (5,219), Apache httpd (3,491), MongoDB (3,230), VMware ESXi (2,816) — and the top CVEs, led by the Terrapin SSH attack (CVE‑2023‑48795, 9,942 hosts) and regreSSHion (CVE‑2024‑6387, 5,713 hosts), are analyzed in the KEV chapter. One caveat travels with all banner-based correlation: a version string can be stale or back-patched, so a banner match is strong evidence of exposure but not proof a host is unpatched. We surface the correlation; we do not assert exploitation has occurred.

Radar 3 — Exposed Data Stores (banner + a single read-only verifier probe)

So what: an unauthenticated database on the public internet is the DeepSeek-class failure — a production datastore left open on the internet with no password, readable by anyone who found it. We hunt that class at scale. The challenge is that a banner alone cannot always tell an open store from a secured one: modern web-fronted engines (Kibana, Elasticsearch) render their login screens client-side, so a password-walled instance emits the same banner as an open one. So for HTTP-fronted engines we send one anonymous, read-only GET to an authentication-gated endpoint — one that returns 401 or a login redirect when the instance is secured — and report the host only if it answers with open, data-serving content. We deliberately avoid endpoints that stay public even on secured instances (Kibana's /api/status, CouchDB's welcome page) precisely because they would generate false positives. Raw-protocol engines (Redis, MongoDB, Memcached, Cassandra) are not probed at all; their banner already encodes auth state (Redis announces NOAUTH), and we never speak their wire protocol. See Exposed Databases.

This surfaced 6,607 open, unauthenticated databases across 1,723 organizations and 120 countries, led by Redis (4,243 hosts) and Memcached (1,756). The single most important sentence in this entire report follows, and it governs how every data-store number must be read:

We detect that the store is open. We never read its contents. Our classification is conservative — of 6,607 open stores, we flagged only 3 as likely holding PII and 1 as likely PCI, from metadata alone. We make no claim about how many records any store holds.

You will not find the phrase "millions of records exposed" anywhere in this report, because we did not read a single record and we cannot honestly count what we did not read. The PII and PCI classifications are conservative inferences from structural metadata — field and index names visible without authentication — not from data we extracted. When in doubt, we did not classify. This is the opposite of the prevailing breach-report style, and it is deliberate.

Radar 4 — Secret Sprawl on the web (path probing + redacting regex)

So what: in the rush to ship, applications routinely serve their own configuration straight to the internet — a .env file, a .git directory, a credential file sitting at a public URL with no authentication — and those files frequently contain cloud keys in plaintext. This radar checks public hosts for those well-known exposure paths and, when one is being served, runs the body through a redacting secret detector. We confirmed that the file is being served; we never read, tested, or used a single key. Doing so would be the attack. Across the window this surfaced 750 hosts, dominated by exposed environment files (2,571 instances), and within those bodies roughly 2,600 AWS credentials (1,319 secret keys and 1,284 access-key IDs). See Exposed AI Keys.

Radar 5 — Leaked Credentials in public Git (commit diffs + a six-stage validator)

So what: credentials committed to public repositories are a top root-cause of breaches, but the naive approach — regex-matching commit text — produces so many false positives (documentation examples, placeholder values) that the output is unusable. We engineered for precision over recall: we would rather miss a real credential than show you a fake one. A regex match is only the first of six gates a candidate must clear before it is ever stored or shown:

  1. Diff-awareness — only newly added (+) lines can introduce a secret; removed, context, and hunk-header lines are ignored.
  2. Known-example rejection — famous documentation and test credentials are dropped.
  3. Placeholder rejection — templated values, env-references, and your-key-here strings are dropped.
  4. Structural verification — provider checksums and decoders that prove a value is well-formed (GitHub token CRC32, JWT base64+JSON decode, AWS base32). "Verified" here means structurally verified by shape — never confirmed live, because confirming a key live would require using it.
  5. Example-context rejection — for unverified values, a test/docs/example file path drops the hit (the classic "token pasted as a code sample" false positive).
  6. Entropy and shape — unverified token values must clear a Shannon-entropy floor and character-diversity check, with no long repeats or monotonic sequences.

Only candidates that survive all six are surfaced — and even then the raw secret is never stored; we persist a redacted form and a fingerprint. This yielded 619 secrets across 478 repositories, spanning generic (249), Google (228), AWS (63), and other providers. Because "verified" means structurally verified by checksum and never confirmed by use, no number in this radar should be read as "619 live, working keys" — it is "619 strings that are shaped like real credentials and survived a six-stage anti-false-positive gauntlet." See Leaked Credentials.

Radar 6 — Subdomain Takeover (dangling-CNAME fingerprinting)

So what: when a subdomain's CNAME still points at a third-party service whose underlying resource has been deprovisioned — a cancelled SaaS, a deleted bucket — an attacker can register that resource and serve their own content from your subdomain, enabling phishing, cookie theft, and OAuth abuse. We detect these the way the established tooling (subjack, nuclei) does: a subdomain whose CNAME delegates to a known service and whose single anonymous read-only GET returns that service's "unclaimed" fingerprint is takeover-able. We never register or claim the dangling resource — that would be exploitation; we store only the matched fingerprint as evidence.

The gap between observation and confirmation here is enormous and instructive: we observed 7,952 subdomains delegating to takeover-prone services, but confirmed only 134 as actually vulnerable. The other 7,818 mostly resolve to live, properly-claimed services — a CNAME to a known provider is not a finding unless the resource behind it is genuinely unclaimed. We report the 134, not the 7,952, as the exposure. Two guardrails suppress the most common false positives: a domain the owner has provably verified with the provider cannot be claimed by anyone else (downgraded), and a wildcard delegation where unrelated subdomains share the same target is an intentional configuration, not a forgotten record (downgraded). See Subdomain Takeover.

Observed vs. confirmed across the radars: why we report the smaller number
RadarDiscovered / observedConfirmed exposureWhat the gap is
AI Attack Surface82,4162,050 activeAuthenticated, DNS-only, or dead hostnames
KEV Exposure36,416 CVE hosts21,299 KEV-exposedKnown-CVE vs. actively-exploited
Subdomain Takeover7,952 observed134 vulnerableLive/claimed delegations are not findings
Exposed Data Stores6,607 open3 PII / 1 PCI (conservative)We detect "open," never read contents
Leaked Credentialsmatch candidates619 (post 6-gate)Structurally verified, never confirmed live

The CVE corpus behind the correlation

So what: the KEV-exposure radar is only as good as the vulnerability intelligence it correlates against, so the corpus deserves its own accounting. It comprises 340,552 CVEs, of which 1,621 are CISA‑KEV-listed, 326 are ransomware-associated KEV entries, 33,409 are rated CRITICAL, and 5,190 are AI-related. Each CVE is enriched per-record with CVSS v3 and v4, FIRST EPSS exploit-probability, CISA‑KEV status with our own tiering, SSVC decision points, GitHub Security Advisory data, ransomware and AI-relatedness flags, and a synthesized EchelonGraph score. The full scoring methodology — including where we diverge from NVD and where we deliberately do not claim superiority — is documented separately; here it is enough to know that the KEV correlation rides on this enriched, continuously-refreshed corpus rather than a static feed. See CVE Pulse.

Attribution: forward-confirmed reverse DNS, and rare by design

So what: the most dangerous thing a report like this can do is attribute an exposure to the wrong company — both because it is wrong, and because telling Company A about Company B's open database leaks B's exposure to A. So we attribute conservatively, on provable ownership only, and we leave the unattributable in an aggregate bucket rather than guess. There are two cases:

Anything that fails this bar — an organization-name match alone, a PaaS-hosted application whose individual tenant we cannot identify, an IP that does not forward-confirm — is never attributed to a named company and never contacted. The cost of a false attribution is far higher than the cost of missing a true one. As a direct consequence, confident company-level attribution is rare: the figures in this report are aggregate and host-redacted, and the small subset where we can prove ownership is what feeds responsible disclosure, not publication.

Identified scanning and responsible disclosure

So what: anonymous internet-wide scanning is indistinguishable from reconnaissance for an attack, which is why we made ours the opposite of anonymous. Every outbound probe carries a self-describing User-Agent naming EchelonGraph and linking to our responsible-disclosure page, plus an RFC 7231 From header with a monitored contact address. Direct host probes additionally carry a one-off signed receipt — an HMAC-SHA256 token over the IP, port, radar, and timestamp — that any administrator can paste into a public verifier to confirm a request genuinely came from us (and, by the same token, identify impostors who merely put our name in their User-Agent). Where we can prove ownership, the platform renders a disclosure draft — the verified exposures and their specific fixes — that a human sends from their own mailbox. We never auto-send, never include a raw secret in the body, and never lead with a sales pitch. The unattributable majority is reported only in aggregate, as it appears throughout this document.

What this report is not

We close the methodology with its boundaries, stated as plainly as we can, because a finding misunderstood is worse than a finding never made:

The historical incidents this report references — the Equifax breach, the Capital One exposure, the DeepSeek open database, the MOVEit and regreSSHion campaigns — share a single property: in each, the failing surface was reachable and observable before it was abused. Someone could have looked on the victim's behalf and did not. That is the gap this methodology exists to close: a passive, identified, responsibly-disclosed view of the open doors anyone on the internet can already see, delivered to the people who can close them. Test your own surface at our self-check.

Chapter 3

The AI Attack Surface

The most important number in this chapter is the one we did not report. Across the observation window we discovered 82,416 internet-facing AI services in 137 countries. We could have published that figure as "82,416 exposed AI systems" and it would have travelled. It would also have been wrong. The honest finding is smaller, harder-won, and far more useful to a security leader: of those 82,416, the overwhelming majority are either secured behind authentication or are nothing more than a DNS record. The genuinely dangerous set — services we confirmed to be live, reachable, and answering without credentials — numbers 2,050. That distinction is the entire point of this chapter, and we ask you to carry it into every conversation that follows.

82,416 AI services discovered. 2,050 confirmed active and open. The gap between those two numbers is where most "AI exposure" reporting goes wrong.

What we counted, and what we refused to count

"Shadow AI" has become a boardroom phrase without a boardroom-grade definition. We use it narrowly: AI infrastructure that an organization is running on the public internet, often outside the visibility of its own security team — model-serving proxies, vector databases, notebooks, model registries, agent frameworks, and the emerging class of Model Context Protocol (MCP) servers that broker tool access to large language models. We find these passively, from Certificate Transparency logs and public internet scan data. We never authenticate, never log in, and never send a prompt. We observe that a service exists, we classify what it is, and — where it is determinable without interaction — we record whether it answered us without asking for a credential.

That last step is where discipline matters. A hostname appearing in a Certificate Transparency log proves only that someone requested a TLS certificate for it. It does not prove the service is running, reachable, or unprotected. Treating certificate evidence as exposure evidence is the single most common error in this category, and it inflates headline numbers by an order of magnitude. We therefore resolve every discovered footprint into one of four liveness states, and we report all four.

82,416discoveredResolved (DNS only) — 42,593 (52%)Authenticated (secured) — 36,991 (45%)Active / open — 2,050 (2%)Unreachable — 782 (1%)
The AI footprint by liveness state. Only the 2,050 "active / open" services answered us without authentication; 42,593 are DNS records with no reachable service, and 36,991 are protected by authentication. n = 82,416.
Liveness stateServicesWhat it means for risk
Resolved (DNS only)42,593A hostname resolves, but no AI service answered. Almost always noise — parked names, retired endpoints, internal-only records. Not an exposure.
Authenticated (secured)36,991A live AI service that correctly demanded a credential. This is the system working as intended. We do not call these "exposed."
Active / open2,050A live AI service that responded without authentication. This is the real attack surface.
Unreachable782Discovered, but no successful connection during the window — firewalled, rate-limited, or transiently down.

We dwell on the 36,991 authenticated services deliberately, because they are the part of the story that resists a clean headline. These are organizations running AI infrastructure on the public internet and gating it correctly. That an LLM proxy is internet-reachable is not, by itself, a finding; reachable-and-authenticated is the normal, defensible posture for a great deal of production AI. Conflating the 36,991 with the 2,050 would be the same mistake in the opposite direction — manufacturing alarm out of systems that are behaving exactly as they should. The number that should occupy a CISO's attention is 2,050.

The shape of the footprint: where AI is sprawling

Before narrowing to the active set, it is worth understanding the composition of everything we discovered, because it maps the directions in which AI infrastructure is proliferating fastest. The 82,416 services break down by category as follows.

AI Workflow35,675LLM Proxy19,572Model Store18,317Notebook4,859MCP Server1,563Vector DB589
Discovered AI infrastructure by type (all liveness states). AI-workflow and orchestration platforms dominate the footprint, followed by LLM proxies and model stores.
CategoryDiscoveredWhat it is
AI Workflow35,675Agent frameworks and orchestration platforms (Langflow, AnythingLLM, ComfyUI and similar) that chain models, tools, and data.
LLM Proxy19,572Gateways that front one or more language models — the inference edge of an AI application.
Model Store18,317Registries and repositories that host model artifacts and weights.
Notebook4,859Interactive data-science environments (Jupyter and kin) — frequently rich in code, credentials, and data.
MCP Server1,563Model Context Protocol endpoints that grant models access to tools, files, and external systems.
Vector DB589Embedding stores — the memory layer of retrieval-augmented applications, holding vectorized copies of private corpora.

The dominance of AI-workflow and orchestration platforms is the structural signal here. The fastest-growing slice of the AI attack surface is not the model itself but the connective tissue around it — the agent frameworks, proxies, and protocol servers that wire models to tools and data. These are typically deployed by application teams rather than platform or security teams, they ship with permissive defaults, and they accumulate access. That is precisely the profile of infrastructure that drifts out of a security organization's line of sight.

The real risk: 1,743 open LLM proxies and 171 live vector databases

Within the 2,050 active and open services, two categories carry disproportionate consequence. We confirmed 1,743 open LLM proxies and 171 live vector databases answering on the public internet with no authentication, alongside a small number of open MCP and AI-workflow endpoints. These are not parked names or polite 401 responses. They are running services that returned data to an unauthenticated request.

1,743 open LLM proxies and 171 live, unauthenticated vector databases — answering anyone on the internet, no credential required.

An open LLM proxy is, in practical terms, someone else's inference budget and someone else's model — handed to the entire internet. The immediate harm is the obvious one: unmetered consumption of a paid model, billed to the operator. The more serious harms are second-order. A proxy is a position of trust between an application and a model; depending on configuration it may expose system prompts, leak the contents of prior conversations held in context, accept prompt-injection that redirects the model's behaviour, or — where the proxy fronts tool-calling — become a pivot into whatever the model is permitted to do. An unauthenticated proxy is not merely a billing leak; it is an open door into an application's reasoning layer.

A live, unauthenticated vector database is the more acute exposure of the two, and it is the one we want security leaders to internalize. A vector store is the memory of a retrieval-augmented application. Organizations populate it by embedding their private corpora — support tickets, internal wikis, contracts, source code, customer records — into vectors so a model can retrieve them. Crucially, those stores frequently retain the original text alongside the vector, as metadata, so the application can show a human the source passage. An open vector database, therefore, is not an abstract index of numbers. It is, very often, a queryable copy of the exact private documents an organization fed into its AI — sitting on the internet without a password.

Why AI infrastructure is uniquely dangerous

A traditional exposed database is a serious problem; an exposed AI service can be a worse one, for a reason that is specific to how these systems are used. AI infrastructure concentrates raw, unstructured, high-sensitivity data at the exact moment it is least protected.

The combination is what makes the category distinctive. An exposed AI service tends to leak the most sensitive data in the most usable form, from infrastructure the security team did not know was there.

The parallel: DeepSeek's open database, 2025

This is not a hypothetical. In early 2025, security researchers reported that an internet-facing database belonging to the AI company DeepSeek was publicly accessible without authentication, and that it exposed operational data associated with the service. The episode became a reference point for AI-era exposure for a simple reason: a frontier-scale AI provider, under intense growth pressure, left a backing store open to the internet — the same failure mode that has produced data exposures for two decades, now sitting directly behind an AI product.

We invoke the DeepSeek incident as a pattern, not to restate its particulars, because the pattern is exactly what our data describes at scale. The 171 open vector databases and 1,743 open LLM proxies we confirmed are 1,914 instances of the same class of mistake — an AI backing service reachable without a credential — distributed across the internet, most of them at organizations that have no idea the door is open. DeepSeek made headlines because of the name attached to it. The thousand-plus equivalents in this dataset are anonymous only because no researcher has knocked yet.

The deeper continuity is with the pre-AI canon of exposure incidents. The mechanics that produced the 2017 Equifax breach and the 2019 Capital One breach — an unpatched edge, an over-permissioned path to data, a store reachable from where it should not have been — have not changed. What has changed is the content behind the open door. Where an exposed store once leaked structured database rows, an exposed vector store leaks the raw documents an organization trusted to its AI, and an open proxy leaks the live reasoning of the application itself. AI did not invent the failure; it raised the value of the loot.

Reading the trajectory honestly

We discovered 10,679 of these services in May and 71,737 in June. That is not a measurement of the internet's AI footprint growing sixfold in a month. The jump is overwhelmingly the result of our own discovery capacity ramping during this inaugural window — more Certificate Transparency patterns, more scan coverage, more categories under watch. We surface the monthly split for transparency, and we caution explicitly against reading it as a growth rate. This is a baseline, established over roughly five weeks. Its value is as a fixed point to measure against in future editions, not as a trend in itself.

What the baseline does establish, firmly, is the shape of the problem. AI infrastructure is now a first-class category of internet exposure, distributed across 137 countries, dominated by orchestration and proxy layers that live outside traditional security visibility. Most of it is either inert or correctly secured. A meaningful, concrete minority — 2,050 services, with 1,743 open proxies and 171 open vector databases at its core — is live, unauthenticated, and holding exactly the kind of data and access that makes AI systems worth attacking.

For the security leader. Your exposure here is almost never a system your security team chose to expose; it is one an application team stood up and forgot to gate. Inventory every AI service your organization runs on the public internet — proxies, vector stores, notebooks, MCP and agent endpoints — and verify that each one demands a credential. Treat an unauthenticated vector database as a data breach in waiting, not an infrastructure tidy-up. The free Surface Scanner will show you what the internet can already see; the Shadow AI Radar tracks the wider footprint this chapter is drawn from.

Chapter 4

Known-Exploited Vulnerability Exposure

The single most actionable fact in this report is that adversaries do not need to discover anything. The vulnerabilities that compromise organizations are, overwhelmingly, already catalogued, already weaponized, and already being used in the wild. Across the observation window of 29 May to 28 June 2026, EchelonGraph passively fingerprinted 36,416 internet-facing hosts running software with at least one known CVE, spanning 165,717 distinct host-to-CVE pairs, 6,519 organizations and 161 countries. These are not theoretical weaknesses. Of those hosts, 21,299 are exposed to at least one vulnerability on the CISA Known Exploited Vulnerabilities (KEV) catalog — vulnerabilities for which CISA has confirmed active exploitation against real victims. The gap between "a patch exists" and "the patch is applied" is the gap through which most breaches walk.

21,299 hosts are exposed to vulnerabilities that attackers are confirmed to be exploiting right now — across 50 distinct CISA-KEV CVEs.

This finding is the through-line of the modern breach record. Equifax in 2017 was not undone by a novel exploit; it was a known Apache Struts flaw (CVE-2017-5638) with a patch available for months before the intrusion. Capital One in 2019 turned on a misconfiguration layered over a reachable service. The 2023 MOVEit campaign weaponized a single SQL-injection flaw across thousands of organizations simultaneously, because the same vulnerable file-transfer appliance sat on the internet edge of so many enterprises at once. In every case, the precondition was the same one we measure here: a known weakness, exposed, unpatched, at scale. Our window is an inaugural baseline of roughly four to five weeks, not a multi-year trend — but the structure it reveals is consistent with a decade of incident data.

What "known-exploited" actually means — and why it changes the math

Most vulnerability programs drown in volume. Our broader corpus tracks 340,552 CVEs, of which 103,468 are rated HIGH and 33,409 CRITICAL by CVSS. No security team can remediate at that scale, and treating every CRITICAL as equally urgent is how real emergencies get buried under paperwork. The KEV catalog exists precisely to cut through this: it is the subset where exploitation is not predicted but observed. A vulnerability on the KEV list has crossed the line from "an attacker could" to "an attacker is."

That distinction reorders priorities. CVSS asks how bad a flaw would be if exploited. EPSS (the Exploit Prediction Scoring System) estimates the probability it will be exploited in the near term. KEV reports that it already has been. We enrich every host-to-CVE pairing with all three signals, plus ransomware-campaign association and our own EchelonGraph tiering, so that an operator can sort 165,717 exposures down to the handful that warrant a 2 a.m. response. Across the exposed population we identified 36,365 hosts carrying a high-EPSS vulnerability and — most consequentially — 3,475 hosts exposed to a vulnerability tied to a known ransomware campaign. Ransomware association is the signal that most reliably precedes a bad week. These are the exposures where the path from internet scan to encrypted estate has already been walked by someone else.

KEV-exposure population at a glance (29 May – 28 Jun 2026)
MeasureCountWhat it tells you
Hosts running known-CVE software36,416The addressable exposure surface
Distinct host:CVE pairs165,717Average host carries multiple known flaws
Hosts on CISA-KEV (actively exploited)21,299Confirmed-exploited, not theoretical
Distinct KEV CVEs observed50A small, knowable, patchable set
Hosts with high-EPSS vulnerabilities36,365High near-term exploitation probability
Hosts on ransomware-linked CVEs3,475Pre-positioned for extortion campaigns
Organizations affected6,519Breadth across distinct owners
Countries161A global, not regional, condition

The arithmetic is sobering. With 165,717 host-to-CVE pairs spread across 36,416 hosts, the average exposed host is not carrying one overdue patch — it is carrying roughly four to five. Exposure compounds: a host left unpatched against one known flaw is, in practice, a host left unpatched against several, because the same operational failure (no inventory, no patch cadence, no ownership) produces all of them at once.

The concentration is the opportunity: ten software families carry the load

The most useful thing about this data is how concentrated it is. The 36,416 exposed hosts are not spread thinly across thousands of obscure products. A short list of widely deployed software families accounts for the overwhelming majority of exposure, and that concentration is a defender's advantage: a remediation campaign aimed at fewer than a dozen products would retire most of the risk in this dataset.

OpenSSH12,889Apache Tomcat5,219Apache httpd3,491MongoDB3,230VMware ESXi2,816Citrix NetScaler1,709Jenkins1,568WordPress1,488Microsoft IIS1,092nginx994
Most-exposed software running KEV-class vulnerabilities, by host count (top 10 of the 36,416-host population)

OpenSSH dominates, appearing on 12,889 hosts — more than the next three products combined. That is not a comment on OpenSSH's engineering, which is among the most scrutinized in open source; it is a reflection of ubiquity. SSH is the front door to nearly every Linux server on earth, and a backlog in patching it shows up everywhere at once. Behind it sit the workhorses of the public internet: Apache Tomcat (5,219 hosts), Apache httpd (3,491), MongoDB (3,230), VMware ESXi (2,816), and Citrix NetScaler (1,709), followed by Jenkins (1,568), WordPress (1,488), Microsoft IIS (1,092), and nginx (994).

Top exposed software families by host count
#SoftwareHostsRole / why it matters at the edge
1OpenSSH12,889Remote administration; the front door to Linux fleets
2Apache Tomcat5,219Java application server; frequent RCE target
3Apache httpd3,491General-purpose web server
4MongoDB3,230Data store; doubles as a data-exposure risk
5VMware ESXi2,816Hypervisor; a single host = many workloads, prime ransomware target
6Citrix NetScaler1,709Remote-access gateway; pre-auth flaws yield network entry
7Jenkins1,568CI/CD orchestrator; pipeline compromise = supply-chain reach
8WordPress1,488CMS; vast plugin attack surface
9Microsoft IIS1,092Windows web server
10nginx994Web server / reverse proxy

Two clusters in this list deserve a CISO's particular attention, because they sit at the points of maximum blast radius. VMware ESXi is the substrate beneath the data center: compromise one hypervisor and you do not own one server, you own every virtual machine it hosts. ESXi has accordingly become the favored terminal target of ransomware crews, who encrypt at the hypervisor layer to take down dozens of workloads in a single action. Its presence on 2,816 exposed hosts is, line for line, the highest-leverage finding in the table. Citrix NetScaler and other remote-access gateways occupy the opposite end of the kill chain — the entry point. These appliances exist to terminate untrusted traffic from the internet, which means a pre-authentication flaw in one is a direct, credential-free route into the internal network. The industry has watched this category produce mass-exploitation events repeatedly; 1,709 exposed NetScaler hosts is 1,709 standing invitations.

Six CVEs, and where the real story diverges from the headline

Drilling from products to specific vulnerabilities sharpens the picture further. The most prevalent individual CVEs in our data concentrate heavily in two technologies — SSH and Tomcat — and the ranking carries a nuance worth stating plainly.

CVE-2023-48795 (Terrapin)9,942CVE-2023-38408 (OpenSSH)9,923CVE-2025-26465 (OpenSSH)8,525CVE-2023-44487 (Rapid Reset)5,859CVE-2024-6387 (regreSSHion)5,713CVE-2025-24813 (Tomcat)5,218CVE-2025-14847 (MongoDB)3,228
Most-prevalent CVEs across the exposed population, by host count
Top individual CVEs by host count
CVECommon nameAffected softwareHosts
CVE-2023-48795TerrapinOpenSSH9,942
CVE-2023-38408OpenSSH ssh-agent PKCS#11 RCEOpenSSH9,923
CVE-2025-26465OpenSSHOpenSSH8,525
CVE-2023-44487HTTP/2 Rapid ResetApache Tomcat5,859
CVE-2024-6387regreSSHionOpenSSH5,713
CVE-2025-24813Tomcat RCEApache Tomcat5,218

The honest reading of this table requires distinguishing severity from prevalence — a distinction practitioners routinely collapse, to their cost. The two most widespread CVEs, Terrapin (CVE-2023-48795, 9,942 hosts) and HTTP/2 Rapid Reset (CVE-2023-44487, 5,859 hosts), are protocol-level weaknesses rather than direct remote-code-execution holes. Terrapin is a prefix-truncation weakness in the SSH transport that can downgrade connection integrity; Rapid Reset is a denial-of-service amplification in the HTTP/2 stream-multiplexing design that was used to drive record-breaking volumetric attacks against major providers. They are genuine and worth fixing, but they are not "one packet to shell." Treating their large host counts as 9,942 imminent takeovers would overstate the case, and this report will not do that.

The teeth in the table belong to the remote-code-execution entries, even where their host counts are lower. regreSSHion (CVE-2024-6387), present on 5,713 hosts, is the one to lose sleep over: an unauthenticated remote-code-execution flaw in OpenSSH's server, reachable before any login, on the single most exposed service in the entire dataset. An RCE in the front door of the internet is the archetype of a vulnerability that converts a passive scan into an active intrusion. Alongside it, CVE-2023-38408 (an OpenSSH ssh-agent PKCS#11 RCE, 9,923 hosts) and CVE-2025-24813 (a Tomcat RCE, 5,218 hosts) round out the code-execution cluster. The practical instruction is to read this table in two passes: first by prevalence, to understand where your fleet most likely overlaps the herd, then by exploit class, to decide what gets patched tonight versus this sprint. The RCEs get tonight.

Severity at the host level: critical exposure is not the exception

Aggregating across every known flaw on every exposed host produces a severity distribution that should reset any assumption that critical exposure is rare. Of the host-level findings, 24,292 land at CRITICAL severity, 26,482 at HIGH, and 19,329 at MEDIUM. (Hosts carry multiple findings, so these severity rows count exposures, not unique machines — many hosts contribute to more than one row, consistent with the four-to-five-flaws-per-host average noted above.)

Critical24,292High26,482Medium19,329None scored11,136
Host-level KEV exposure by severity rating
Severity distribution of host-level findings
SeverityFindingsOperational implication
Critical24,292Immediate remediation; assume active interest
High26,482Prioritized patch cycle
Medium19,329Scheduled remediation; monitor for escalation

The signal here is that CRITICAL and HIGH together vastly outweigh MEDIUM. The exposed internet is not lightly bruised; it carries serious, exploitable weaknesses at the top of the severity scale as its normal state. For a CISO, the strategic takeaway is that the problem is not detection — these flaws are catalogued, fingerprintable, and visible to anyone with a scanner, attacker or defender alike — but cadence. The vulnerabilities are known. The exploits are public. The patches exist. What is missing, at 21,299 hosts, is the operational discipline to close the loop before someone else closes it for you.

The patch gap: why "a fix is available" is the most dangerous phrase in security

Every number in this chapter is a measurement of one thing under different lights: the patch gap, the interval between a fix becoming available and that fix being applied to a production system. It is the most studied and least solved problem in operational security, and our data quantifies its current width across the public internet.

The gap persists for reasons that are organizational before they are technical:

  • You cannot patch what you do not know you run. Asset inventory is the unglamorous foundation of the whole discipline, and it is routinely incomplete. Shadow IT, forgotten cloud instances, contractor-stood-up appliances, and acquired infrastructure all run software that no one is tracking — and therefore no one is patching. The 6,519 organizations in this dataset did not choose to expose 165,717 known flaws; in most cases they simply did not have a current picture of what they were running.
  • Edge appliances are the worst-patched class of all. The Citrix NetScaler and VMware ESXi findings are not accidents. Gateways, hypervisors, and firewalls are perceived as "infrastructure that just works," are change-controlled to the point of paralysis, and frequently sit outside the patch automation that covers ordinary servers. They are simultaneously the highest-value targets and the slowest to be remediated — the exact inversion of what risk would dictate.
  • Patching is a change, and changes carry risk. Every patch is a small bet that the fix will not break production. Operators who have been burned by a bad update rationally hesitate, and that hesitation, multiplied across a fleet, becomes a standing exposure window. The answer is not to patch recklessly but to make patching routine, tested, and reversible — to lower the cost of the change so the bet becomes easy.
  • Internet-facing assets are the least forgiving place to carry this gap. An unpatched internal system is a latent risk; an unpatched internet-facing one is a live one, scanned continuously by adversaries running the same fingerprinting that produced this report. The 21,299 KEV-exposed hosts are not waiting to be discovered. They have been discovered, repeatedly, by everyone looking.

The historical record removes any doubt about where this leads. regreSSHion, the unauthenticated OpenSSH RCE on 5,713 hosts in our data, is exactly the shape of flaw that produces a mass-exploitation event. The MOVEit campaign demonstrated the model end to end: a single known vulnerability in a single widely deployed appliance, exploited in parallel across every exposed instance an adversary could find, with victim organizations learning they were affected only after their data had been taken. The precondition every time is a known, exposed, unpatched weakness — the precise condition we count at 21,299 hosts. The vulnerabilities in this chapter are not predictions of future breaches. They are the standing inventory from which the next campaign will draw.

What to do with this

The concentration that makes this exposure dangerous also makes it tractable. Fifty KEV CVEs, ten software families, and six dominant vulnerabilities account for the bulk of the risk in 165,717 exposures — a target set small enough to act against deliberately rather than drown in. We recommend a sequenced response:

  1. Treat the KEV catalog as your priority queue, not your CVSS spreadsheet. The 21,299 KEV-exposed hosts are where exploitation is confirmed. Remediate these before anything that is merely rated CRITICAL but not known-exploited. Active exploitation outranks theoretical severity every time.
  2. Escalate the 3,475 ransomware-linked exposures to incident tempo. A ransomware-associated, internet-facing, unpatched host is the most reliable single predictor of a major incident in this dataset. These do not wait for the next maintenance window.
  3. Lead with the high-blast-radius edge: ESXi first, remote-access gateways second. The hypervisor is where ransomware terminates; the gateway is where intrusions begin. Patching both retires disproportionate risk per host touched.
  4. Read the top-CVE list by exploit class, not headline. Patch the RCEs — regreSSHion, the OpenSSH agent flaw, the Tomcat RCE — on the tightest possible clock. Schedule the protocol-level weaknesses (Terrapin, Rapid Reset) into the normal cycle. Do not let large host counts on lower-severity flaws crowd out the genuinely critical few.
  5. Close the inventory gap so the patch gap cannot reopen. Every exposure here traces to something that was running but not being watched. Continuous external discovery — knowing what you actually expose, from the attacker's vantage point — is the precondition for every other step. You can map your own exposure against this dataset with EchelonGraph's surface scanner, trace blast radius through your estate in the attack graph, and track the live KEV-exposed internet on the KEV Exposure radar.

The recurring lesson of the breach record — Equifax, Capital One, MOVEit, and the campaigns that will follow them — is not that attackers are uniquely sophisticated. It is that defenders leave known doors unlocked at scale. This chapter is a census of those doors as they stood across roughly five weeks of mid-2026: 36,416 hosts, 21,299 of them open to vulnerabilities attackers are using today. The fix for nearly every one of them already exists. The only question this data leaves open is who applies it first.

Chapter 5

Exposed Data Stores

A database that answers a query from anyone on the internet, without first asking who is asking, is not a misconfiguration in the abstract. It is an open door to whatever sits behind it — session tokens, customer records, telemetry, message queues, cached credentials. The category is unforgiving because the failure mode is silent: the store works exactly as designed, serves traffic, and shows no error. Nothing in the application breaks. The only signal that anything is wrong is that the wrong people can read it, and by the time that signal arrives it is usually a ransom note or a press inquiry.

Across the observation window of 29 May to 28 June 2026, EchelonGraph's passive discovery identified 6,607 open, unauthenticated databases reachable from the public internet, spread across 1,723 organizations and 120 countries. These are not hosts with weak passwords or guessable credentials. These are stores that accept connections and return data with no authentication step at all — the equivalent of a filing cabinet left on the sidewalk with the drawers open.

6,607 open unauthenticated databases, across 1,723 organizations in 120 countries — each one reachable with no login.

What we measured, and what we did not

This figure deserves precision, because the temptation in this category is to inflate. We did not. Our methodology is passive, public-data, and strictly detect-only: we observe that a store is listening and that it answers an unauthenticated handshake. We see the open door. We do not walk through it. We never read contents, never enumerate keyspaces, never count rows, and never exfiltrate a single record. Consequently this chapter contains no claim about "millions of exposed records," because we did not look inside to count, and any such number would be a fabrication.

The classification is deliberately conservative. Of the open stores observed, only a small number could be confirmed — from public banner metadata alone, without inspecting data — as carrying regulated content: three were classified as holding PII and one as in PCI scope. That is a floor, not a ceiling. It reflects what is provable from the outside without reading the store, and it almost certainly understates the true regulatory exposure, because most open databases reveal nothing about their contents from the doorway. The honest reading is this: 6,607 doors stand open; the contents behind the overwhelming majority are unknown to us, by design, and should be assumed sensitive by their owners until proven otherwise. You can check your own exposure with the surface scanner, and the live radar behind these figures is published at exposed-databases.

One framing caveat carries over from the broader report and applies here. This is an inaugural baseline gathered over roughly four to five weeks, not a multi-year trend line. The numbers describe a snapshot of the internet as it was during a single month. They are large enough to be alarming on their own terms; they are not yet a trajectory.

The shape of the exposure: caches and search nodes, not "the database"

The most important finding in this chapter is a structural one, and it changes how the risk should be triaged. When people imagine an exposed database, they picture the system of record — the customer table, the orders ledger, the primary store. That is not what the internet is leaking. The exposure is dominated by two categories of infrastructure that operators rarely think of as "databases" at all: in-memory caches and search and observability nodes.

Redis4,243Memcached1,756Kibana464Cassandra75MongoDB37InfluxDB16CouchDB16
Open unauthenticated data stores by engine, observed 29 May – 28 June 2026 (host counts). Redis and Memcached together account for the overwhelming majority of exposures.
Exposed unauthenticated data stores by engine (host counts)
EngineTypeOpen hosts
RedisIn-memory cache / key-value store4,243
MemcachedIn-memory cache1,756
KibanaSearch / observability front-end464
CassandraWide-column store75
MongoDBDocument store37
InfluxDBTime-series store16
CouchDBDocument store16

Redis alone accounts for 4,243 of the 6,607 open stores — nearly two in three. Add Memcached's 1,756 and the two in-memory caches together make up the great majority of the entire exposed population. This is the central lesson of the data, and it is a lesson about defaults and mental models rather than about exotic attacks.

Redis and Memcached were both designed for a world inside a trusted network boundary. For much of their history neither shipped with authentication enabled by default; both were built to be fast, simple, and adjacent to the application, on the assumption that a firewall or a private subnet stood between them and the internet. When that assumption fails — a security group left open, a container published to a public interface, a cloud instance with a public IP and a permissive ingress rule — the store is simply on the internet, answering to anyone, with no credential to stop them. The operator did not "turn off" authentication. They never turned it on, because the threat model they inherited said they did not need to.

The risk is frequently dismissed because of what these stores nominally hold. "It's only a cache" is the reflex. That reflex is wrong on two counts. First, caches routinely hold exactly the material an attacker wants: session identifiers, authentication tokens, rate-limit state, password-reset nonces, queued jobs, and copies of records pulled forward from the primary database for speed. A session token read from an open cache is a logged-in user. Second, an exposed Redis or Memcached instance is not only a confidentiality problem; it is often a foothold. Write access to an unauthenticated store lets an attacker poison cached values, corrupt application state, and in well-documented Redis cases abuse the store's own persistence and scripting features to achieve code execution on the host. The cache is not the periphery of the system. It is frequently the soft center.

The 464 exposed Kibana instances carry a related but distinct danger. Kibana is the window onto an Elasticsearch cluster — logs, metrics, traces, and whatever application data has been shipped into the index for analysis. An open Kibana front-end is a search interface over an organization's operational telemetry, and operational telemetry is where secrets go to hide: tokens printed in debug logs, full request bodies, stack traces with connection strings. The remaining engines — Cassandra, MongoDB, InfluxDB, CouchDB — appear in smaller numbers, but each represents a system that can hold a system of record outright. MongoDB in particular has a long public history in this category, and its presence here, however modest in count, is a reminder that the document stores are not immune; they are simply less numerous in this sample.

Where the open doors are

Exposure of this kind is not concentrated in any one jurisdiction or any one cloud. It tracks the global distribution of internet-facing infrastructure, which means it tracks the global distribution of cloud regions and hosting density. The 6,607 stores span 120 countries; the table below shows the six with the highest observed counts.

United States1,254China1,092Germany772France307Singapore248India221United Kingdom219Indonesia215
Top six countries by count of exposed unauthenticated data stores, 29 May – 28 June 2026.
Exposed unauthenticated data stores by country (top six, host counts)
CountryOpen hosts
United States1,254
China1,092
Germany772
France307
Singapore248
India221

The United States (1,254) and China (1,092) lead, together accounting for more than a third of all observed exposures and reflecting the two largest concentrations of public cloud and hosting capacity in the world. Germany (772) and France (307) follow, with Singapore (248) and India (221) rounding out the top tier — each a major regional cloud hub. The pattern is unsurprising and, for a security leader, clarifying: this is not a problem confined to one regulatory regime or one provider's customers. Wherever an organization stands up infrastructure quickly and at scale, the same default-trust assumptions follow it, and a fraction of those deployments end up listening on the open internet. The remaining 114 countries in the dataset each contribute fewer hosts, but their collective tail confirms that the exposure is genuinely global rather than an artifact of a handful of careless networks.

For multinational organizations the geographic spread carries a compliance edge worth naming. An open store in a European region is a candidate GDPR matter the moment it holds personal data; one in PCI scope is a candidate for cardholder-data findings regardless of where it sits. Because we deliberately do not read contents, we cannot tell an operator which of their stores crosses those lines — but they can, and the geographic and engine breakdown above is a map of where to look first. Mapping that exposure against regulatory obligations is the work the compliance surface is built for.

The precedent is not hypothetical

The reason this category warrants a chapter of its own, rather than a footnote, is that the open-database failure mode has produced some of the defining data-loss events of the last decade. The pattern is consistent: a store that should have been private was reachable, no credential stood in the way, and the contents left the building before anyone with authority noticed.

The pattern reaches the newest organizations as readily as the oldest. The rush to ship — whether a startup, an AI lab, or a global enterprise standing up a new region — recreates the trusted-network assumption that the default-no-auth cache was built for, and the internet is indifferent to the intention. A store stood up fast, on the assumption that something else would keep it private, is exactly the door this chapter counts.

Landmark public breaches over the years reinforce the same structural point even when the specific technical mechanism differed. Catastrophic exposures of personal and financial data have repeatedly traced back to data that was reachable when it should not have been; supply-chain campaigns against widely used file-transfer and data-handling systems have shown how a single reachable system can become an exfiltration vector across many downstream organizations at once. None of these were caused by an open Redis instance, and we make no such claim. What they share with the 6,607 stores in this dataset is the underlying lesson: when sensitive data is reachable and the barrier protecting it fails or was never raised, scale is on the attacker's side, and the gap between exposure and exploitation is measured in hours, not months.

That is the weight behind the conservative count. We classified only three stores as PII-bearing and one as PCI-scoped, because that is all we could prove from the doorway without reading inside. But every one of the 6,607 open doors is, from the perspective of its owner, a store whose contents are now governed by whoever finds it first. The right posture is not to take comfort in the small confirmed-sensitive number; it is to assume that any unauthenticated store reachable from the internet is compromised until taken offline and re-secured.

What to do before the radar finds you

The remediation for this category is unglamorous and, mercifully, well understood. It does not require new tooling so much as the discipline to apply what already exists.

  1. Assume no store belongs on the public internet until proven otherwise. The default for any cache, document store, search node, or time-series database should be a private network with no public ingress. Public reachability should be a deliberate, reviewed, and rare decision — never an accident of a default security group or a published container port.
  2. Turn authentication on, everywhere, including the caches. The engines most represented here — Redis and Memcached — are precisely the ones operators are most likely to leave open because "it's just a cache." Enable authentication and, where available, transport encryption, on every store regardless of what it nominally holds. The cache holds tokens; treat it like it holds tokens.
  3. Close the gap between "internal" and "reachable." Most of these exposures stem from a single broken assumption: that a network boundary that once existed still exists. Continuously verify, from the outside, what your organization actually presents to the internet. An external, attacker's-eye view is the only reliable way to catch the store that someone published last week.
  4. Inventory before you remediate. You cannot secure a store you do not know is running. The geographic and engine breakdowns above are a triage order: in a large estate, the in-memory caches are both the most numerous exposure and the easiest to overlook, and the search front-ends are the highest-value single targets.

The 6,607 figure is a measurement of how often the boundary fails in practice, across the whole internet, in a single month. It is large because the assumption that produces it — that the store sits safely behind a wall — is one of the most durable and most frequently violated assumptions in modern infrastructure. The stores in this dataset are not exotic. They are ordinary caches and search nodes, run by ordinary organizations, that ended up answering to the wrong audience. The work is to find yours before someone else does, and to do it knowing that the door, once open, does not announce itself.

EchelonGraph's view of this surface, the engine and country breakdowns behind these figures, and the means to check your own estate are maintained at the exposed-databases radar and the surface scanner.

Chapter 6
Chapter 6

Secret Sprawl: Exposed Keys & Credentials

A credential is a skeleton key. Unlike a software vulnerability, it requires no exploit chain, no memory-corruption primitive, no race window — an attacker who finds a valid access key simply authenticates and is, from the platform's point of view, you. There is no patch for a leaked secret; the only remedy is rotation, and rotation only helps if you know the secret leaked. This is what makes credential exposure the quietest and most consequential class of finding in this report. The internet is not merely running vulnerable software. It is publishing the keys to that software, in plaintext, on web servers and in public source repositories, where any party with a browser or a git clone can collect them at scale.

Over the observation window of 29 May to 28 June 2026, EchelonGraph passively catalogued credential exposure across two distinct surfaces: secrets served directly off the public web (misconfigured web roots leaking configuration files), and secrets committed to public Git repositories. Both were collected detect-only — we fingerprint the presence and type of a secret from publicly retrievable artifacts; we never use a discovered credential, never authenticate, never read the account behind it. The numbers below are therefore a conservative floor. They describe what the entire internet could already see during a single four-to-five-week baseline, not a multi-year accumulation.

≈2,600 live AWS credentials were recoverable from 2,571 exposed .env files across just 750 web hosts — alongside 619 more secrets sitting in 478 public Git repositories.

The web surface: configuration files served as plaintext

The dominant failure mode is mundane and almost entirely self-inflicted: an application's environment file, deployed into a directory the web server will serve, with no access control in front of it. The .env file — a convention popularized by twelve-factor application design and frameworks such as Laravel, Symfony, Django and Node.js — is meant to live beside an application and never be reachable over HTTP. When the document root is misconfigured, or the file is dropped into public/ by mistake, a request to /.env returns the application's entire secret inventory: database passwords, third-party API keys, signing secrets, SMTP credentials, and cloud access keys.

Across 750 hosts, EchelonGraph observed 2,571 distinct exposed .env files — the single largest category by a wide margin. The remaining exposure types are smaller in count but, in several cases, more severe per instance:

Exposed-secret artifacts on the public web (29 May – 28 Jun 2026; 750 hosts)
Exposure typeCountWhat it leaks
.env file2,571Full application secret inventory — DB, API keys, cloud keys
git-config exposed80.git/config — remote URLs, often with embedded tokens
git-exposed (repo on web root)61Reconstructable source tree, including committed secrets and history
cred-file17Standalone credential files (e.g. cloud SDK / service-account files)
db-dump3Database export served over HTTP
private-key2Private key material reachable without authentication

The two git categories deserve specific attention. An exposed .git/config (80 hosts) frequently embeds the credential used to clone the repository directly in the remote URL. An exposed .git directory served from the web root (61 hosts) is worse still: it lets an attacker reconstruct the application's entire source tree including its commit history, which is the single most reliable place to find secrets that a developer believed they had removed. A key deleted in the working tree but never purged from history is fully recoverable. The 3 database dumps and 2 reachable private keys are low in count but are each effectively a complete compromise of the asset behind them; we classify them as critical without qualification.

What the keys actually are

The composition of the secrets matters more than the file count, because not all secrets are equal. A leaked SMTP password lets an attacker send mail; a leaked cloud access key lets an attacker spend money, exfiltrate data, and pivot. The overwhelming majority of what we recovered from the web surface is cloud infrastructure credentials.

Secret material identified on the public web
Secret typeCount
AWS secret access key1,319
AWS access key ID1,284
GitHub token2
PEM private key1

The access-key ID and the secret access key are the two halves of an AWS credential pair; observed together, they constitute roughly 2,600 usable AWS credentials harvestable from the public web during a single baseline window. An AWS key pair is not a password to one application — it is a programmatic identity inside an account, and what it can do is bounded only by the IAM policy attached to it. In practice, far too many of these keys are over-permissioned: granted broad or administrative access for convenience during development and never scoped down. That is precisely the failure mode that turns a single leaked key into a full account compromise.

.env file2,571git-config80git-exposed61cred-file17db-dump3private-key2
Exposed secrets on the public web, by artifact type (n=750 hosts). The .env file dominates at 2,571 instances; git-config, git-exposed, cred-file, db-dump and private-key form the long tail.

Mapped to severity, the picture is stark: of the secret rows we classified, 1,386 were critical and 1,304 were high, with only 46 rated medium. There is almost no benign tail here. Web-served secret exposure is, by its nature, a high-severity finding — the artifact is reachable by anyone, the credential is live until rotated, and the discovery cost to an adversary is a single unauthenticated HTTP request.

Web secret findings by severity
SeverityFindings
Critical1,386
High1,304
Medium46

The second surface: secrets committed to public Git

If the web surface is about files that should never have been served, the Git surface is about secrets that should never have been committed. Source-code hosting is the other end of the same pipe: a developer hardcodes a token to make something work, commits it, and pushes it to a public repository — at which point the secret is not only visible in the current tree but permanently embedded in the immutable commit history, replicated to every fork and clone, and indexed by anyone who watches the public event firehose. Removing the file in a later commit does not remove the secret; only a history rewrite plus rotation does, and the credential should be assumed compromised the moment it lands in a public repo.

EchelonGraph observed 619 secrets across 478 public repositories during the window. The provider distribution is broader than the web surface and tells a different story — one increasingly shaped by the AI build-out:

Leaked secrets in public Git repositories, by provider (largest buckets; 619 secrets across 478 repos)
Provider / typeCount
Generic (high-entropy / unattributed)249
Google (API / service credentials)228
AWS63
Telegram (bot tokens)53
Discord (bot / webhook tokens)13
OpenAI (API keys)2
HuggingFace (access tokens)2

Several patterns are worth naming. Google credentials (228) rival the generic high-entropy bucket and lead all named providers — API keys and service-account material that, like AWS keys, can grant programmatic access to cloud projects and data. Telegram and Discord bot tokens (53 and 13) reflect the proliferation of automation and bot frameworks whose tokens are routinely hardcoded; a leaked bot token can let an attacker hijack the bot, read channel traffic, or use it as a command-and-control or data-exfiltration channel. And while OpenAI and HuggingFace keys (2 each) are small in absolute count, they are an early signal of an emerging category: AI service credentials. A leaked LLM API key is a direct financial-abuse vector — an attacker runs inference on the victim's account, and the bill arrives at month's end — and a model-registry token can expose proprietary or fine-tuned models. This dovetails with the AI attack surface documented in Chapter 3: organizations are not only standing up AI infrastructure faster than they can secure it, they are leaking the keys to it.

By severity, the Git surface skews high but carries a real critical core: 301 high, 273 medium, and 45 critical. The medium share is larger than on the web surface — many committed secrets are lower-privilege tokens or are partially mitigated by scope — but 45 critical leaks in a single baseline, each a live credential to production infrastructure, is not a rounding error.

Git-leaked secrets by severity (n=619)
SeverityFindings
High301
Medium273
Critical45

It is the permissions, not just the key

The defining lesson of credential exposure is that the damage is governed less by the leak itself than by what the leaked identity is allowed to do. A leaked secret, on its own, is just a string. Its blast radius is determined entirely by the privileges attached to it: a read-only token scoped to a single resource is an incident to clean up; an administrative key is a full account compromise. The same convenience culture that puts a key in a web-served .env file — or hardcodes it into a commit — tends also to grant that key more privilege than the task it was created for actually needs.

That is the exact risk profile of the ≈2,600 AWS keys in this chapter. Each one is, by itself, just a string. Its blast radius is determined entirely by the IAM policy behind it — and over-broad grants turn a single leaked key into a programmatic identity that can spend money, read data, and pivot across the account. The combination is the toxic part: a broadly-scoped key, reachable by anyone, never rotated. Two of those three conditions are within the operator's direct control before any attacker is involved.

Why "least privilege" is the load-bearing control. You cannot guarantee a key will never leak — humans misconfigure web roots and commit secrets, and they always will. What you can guarantee is that when a key leaks, it opens as little as possible. A read-only key scoped to one bucket is an incident; an administrator key is a breach. Scoping every credential to its minimum necessary permission converts the unavoidable leak from catastrophic to survivable — it is the one control that limits blast radius after every other safeguard has already failed.

Why this is structurally worse than a vulnerability

It is worth stating plainly why credential sprawl deserves separate treatment from the known-exploited-vulnerability exposure in Chapter 4. A vulnerability requires an exploit; a leaked credential requires only a login. A vulnerability is often noisy to exploit and may trip detection; authenticating with a valid key is, by design, indistinguishable from legitimate use — the access looks exactly like the developer it was stolen from. And a vulnerability has a vendor patch with a clear before-and-after; a leaked secret has no patch, only rotation, and rotation is only triggered if someone realizes the leak occurred. The absence of an exploit step is precisely what makes this class so dangerous: it removes the friction and the signal that defenders normally rely on.

  • No exploit required. Possession of the credential is the attack. The barrier to entry is a browser or a clone command.
  • No patch exists. The only fix is rotation — and rotation depends on detection, which most organizations lack for their own externally-visible secrets.
  • Detection-evading by construction. Valid-credential access blends into normal authenticated traffic and rarely fires an alert.
  • Permanent in Git. Once a secret is in public commit history, deleting the file does not remove it; assume it is compromised the moment it is pushed.

What this means for defenders

Two surfaces, one root cause: secrets handled as plaintext data rather than as managed, scoped, rotatable identities. The remediation pattern is well understood and does not require new technology — it requires discipline applied consistently.

  1. Get secrets out of files and into a secrets manager. Application secrets should be injected at runtime from a managed store, never committed to source and never written into a web-served directory. This single change eliminates both surfaces in this chapter.
  2. Scope every credential to least privilege. Assume each key will eventually leak and ensure that when it does, it opens the minimum possible. This is the control that most directly limits the blast radius of the ≈2,600 keys above.
  3. Add pre-commit and CI secret scanning. Catch the secret before it is pushed, not after. The 478 repositories in this chapter are evidence that secrets reach public Git at scale without it.
  4. Block .env, .git, and credential files at the web server. Deny-by-default for dotfiles and version-control directories at the edge stops the dominant 2,571-file web exposure outright, independent of how the application is deployed.
  5. Rotate on the assumption of exposure, and monitor your own external footprint. Any secret that has ever touched a public surface should be rotated and the old value revoked. You cannot rotate what you do not know has leaked — continuous external secret discovery is the detection layer that makes rotation possible.

An organization can verify its own exposure against the same passive methodology used to produce these figures via the EchelonGraph external surface scanner and the dedicated exposed-keys and leaked-credentials radars. The cost of looking is a single scan; the cost of not looking is measured in the gap between when a key leaks and when you find out — a gap that, for the keys catalogued here, is currently the entire internet's to exploit.

Chapter 7
Chapter 7

Subdomain Takeover

A subdomain takeover is the rare exposure that hands an attacker a legitimate piece of your brand. Where most of the surfaces in this report leak something the attacker can read, a dangling subdomain lets the attacker publish — to serve content, set cookies, and accept credentials under a hostname that the browser, the corporate proxy, and the human reader all treat as yours. The address bar is genuine. The lock icon is genuine. Only the content underneath has been replaced. That combination is what makes the class disproportionately useful for phishing and session theft, and disproportionately hard for victims to spot.

Across the observation window of 29 May to 28 June 2026, EchelonGraph passively catalogued 7,952 subdomains delegated to third-party hosting services via CNAME records, and identified 134 of them as confirmed vulnerable to takeover through a dangling CNAME — a delegation pointing at a provider resource that no longer exists and can be re-claimed. These were detected, not exploited: the platform observed the public DNS delegation and the provider's unclaimed-resource response, and never registered, served from, or otherwise took control of any name. The figures below are a single inaugural baseline drawn over roughly four-to-five weeks, not a multi-year trend.

134 subdomains confirmed hijackable via a dangling CNAME — 1.7% of the 7,952 third-party-delegated names observed. Each is a working, trusted hostname an attacker can claim and serve content from.

Why a dangling CNAME is dangerous

The mechanism is mundane, which is precisely why it persists. An organization points blog.example.com or shop.example.com at a managed service — a Shopify storefront, a GitHub Pages site, a Heroku app — by adding a CNAME record that aliases the subdomain to a provider-controlled hostname. Months or years later the underlying service is decommissioned: the store is closed, the repository deleted, the app torn down. The provider releases the backing resource. The DNS record, however, is rarely cleaned up at the same time, because DNS and application lifecycles sit with different teams and different change processes. The subdomain is now dangling: it still resolves, still points at the provider, but the target no longer exists.

On most platforms that allow customers to claim arbitrary hostnames, an attacker who notices the dangling record can create a new account, register the same hostname the CNAME still points to, and immediately begin serving content. No DNS access to the victim's zone is required — the victim's own delegation does the work. From that moment the attacker controls a live, fully-resolving page on the victim's domain, served over the provider's infrastructure and, in the common case, fronted by a provider-issued TLS certificate that turns the browser padlock green. To every external observer the page is part of the victim's web estate.

The damage follows from the trust that a hostname carries:

  • Credential phishing that survives scrutiny. The single most effective control users are told to apply — "check the URL" — fails here, because the URL is correct. A login form on a hijacked support.example.com or careers.example.com defeats the link-inspection habit, passes mail-gateway domain reputation checks, and clears the URL allow-lists that security-awareness training tells staff to rely on. The phish is hosted on the brand it impersonates.
  • Cookie theft and session hijacking. Browsers scope cookies by domain. A cookie set with Domain=.example.com, or any session that trusts the parent domain, is readable from any subdomain of example.com — including the one the attacker now controls. A hijacked subdomain can therefore read session cookies that were never meant for it, and can set its own cookies that the parent application will subsequently honor (session fixation). Single sign-on and OAuth flows that treat sibling subdomains as same-origin-adjacent are especially exposed.
  • Content Security Policy and all-list bypass. Security policies frequently trust *.example.com wholesale — for script sources, frame ancestors, CORS origins, and redirect targets. A hijacked subdomain is inside that trust boundary, so it can host malicious script, frame the real application for clickjacking, or act as a sanctioned open-redirect that launders attacker URLs through the legitimate brand.
  • Reputation and deliverability collateral. A subdomain caught serving malware or phishing is the organization's domain being blocklisted, flagged by safe-browsing services, and cited in abuse reports — with the cleanup, delisting, and customer-trust cost landing on the victim, not the attacker.

None of this requires a software vulnerability, a leaked credential, or an exploit chain. It requires only a CNAME that outlived the thing it pointed at. The barrier to entry is a free provider account.

Where the dangling names point

The 7,952 delegated subdomains concentrate on a small number of high-volume hosting platforms — the same managed services that make it trivial to stand up a site, and equally trivial to leave a record behind when that site is removed. The distribution below is by observed subdomain (host) across the top five providers.

Third-party-delegated subdomains by hosting service (hosts observed, top five providers)
Subdomains delegated to third-party services, by provider (observation window 29 May – 28 Jun 2026)
Hosting serviceSubdomains observedShare of top five
Shopify4,86963.2%
GitHub Pages1,56320.3%
Heroku77310.0%
SmugMug3314.3%
Fastly1702.2%
Top five total7,706100%

Two patterns in this distribution deserve a CISO's attention. First, the providers that dominate are the ones whose value proposition is self-service: anyone can open a Shopify store, publish a GitHub Pages site, or deploy a Heroku app in minutes, which is exactly the property an attacker needs to re-claim a released hostname. Second, the volume is driven by marketing and developer convenience, not by core engineering. Storefronts, campaign microsites, documentation pages, status pages, and project sites are stood up by teams outside the central infrastructure function and are torn down without a DNS de-provisioning step. The CNAME is created in a hurry and forgotten at leisure.

Of the 7,952 names observed, 134 were confirmed to be in the dangling, claimable state — concentrated on the self-service providers where re-registration of an unclaimed hostname is open to any account, including Shopify, GitHub Pages, and Heroku. The remainder resolve to live, claimed services and are not vulnerable; their presence in the dataset is simply the population from which takeovers emerge. The ratio matters more than either number alone: roughly one in sixty externally-delegated subdomains in this sample was a working takeover candidate. For an enterprise with hundreds or thousands of third-party-hosted names, that ratio implies a standing inventory of live, brand-trusted footholds that no one is watching.

Subdomain takeover exposure at a glance
MetricValue
Third-party-delegated subdomains observed7,952
Confirmed vulnerable (dangling CNAME, claimable)134
Confirmed-vulnerable rate1.7%
Distinct hosting services in top five5
Detection methodPassive DNS + provider unclaimed-resource signal; detect-only

Who this hits

Subdomain takeover is not a problem of immature organizations. It is a problem of large organizations, because exposure scales with the breadth of an estate and the number of teams allowed to create DNS records. The most exposed profiles are:

  • Consumer and retail brands with sprawling Shopify and marketing-platform estates — one storefront or landing page per campaign, region, or product line — and no lifecycle owner for the records left behind when a campaign ends.
  • Engineering-heavy companies that publish documentation, status pages, and project sites on GitHub Pages and PaaS providers, where a deleted repository or torn-down app routinely outlives its CNAME.
  • Organizations that have grown by acquisition, inheriting DNS zones whose history no current employee fully knows, including delegations to services that were cancelled before the acquisition closed.
  • Any brand whose customers are accustomed to logging in at subdomains — portal., account., my., secure. — because a hijacked sibling under the same parent domain inherits the credibility, and frequently the cookie scope, of the real one.

The defender's structural disadvantage is asymmetry of attention. Attackers enumerate an organization's subdomains continuously and cheaply, using the same public certificate-transparency logs and DNS data that this report draws on; finding a dangling record is a scriptable, undifferentiated commodity activity. Defenders, by contrast, typically discover dangling records only after an incident — when the hijacked subdomain is already serving a phishing page and the abuse reports have started. The window between a service being decommissioned and the dangling record being noticed by the right party is the entire exposure, and on the volunteer-to-clean-up side that window is open-ended.

Incident parallels

Subdomain takeover lives in the same family as the breaches that have defined the last decade of cloud and web exposure, and it is instructive to place it beside them. Where Equifax in 2017 turned on an unpatched, known-vulnerable component — the pattern this report quantifies in its known-exploited-vulnerability exposure — and where Capital One in 2019 turned on a misconfigured, internet-reachable resource — the pattern catalogued in this report's exposed data stores — subdomain takeover turns on neither a vulnerability nor a misconfiguration in the running system. It turns on a record that points at nothing, and a provider that lets a stranger fill the gap.

That distinction is what makes it dangerous in a way the others are not. The Equifax and Capital One classes are at least theoretically visible to the asset owner: the vulnerable host is in the inventory, the open store is in the cloud account. A dangling CNAME points away from the organization, to infrastructure it no longer owns or operates, which is exactly why it falls out of every inventory the organization keeps of itself. It is an exposure of subtraction — the danger is the absence of a resource, not the presence of one — and conventional asset management is built to find what exists, not to notice what has quietly ceased to.

The post-exploitation use is also distinct. The mass-exploitation events of recent years — the MOVEit file-transfer campaign, the regreSSHion remote-code-execution exposure in OpenSSH that this report finds running on thousands of hosts — are about reaching into a victim. Subdomain takeover is about reaching out from a victim's identity: using the trusted brand to attack the brand's own customers, partners, and staff. The blast radius is not the compromised host; it is everyone who trusts the domain. We do not attribute or detail any specific takeover incident here — our methodology is passive and detect-only — but the structural logic is identical across every public case of the class: a trusted name, a forgotten record, a free account, and a phishing page that the victim's own DNS vouches for.

What to do about it

The remediation is unusually clean for a problem this consequential, because the fix is the deletion of a record rather than the patching of a system.

  1. Remove the delegation, not just the service. Make DNS de-provisioning a mandatory, gating step in every decommission runbook. When a Shopify store, GitHub Pages site, Heroku app, or any externally-hosted property is retired, the corresponding CNAME must be deleted in the same change. The dangling record exists because these two actions are owned by different teams; closing that gap closes the exposure class at its source.
  2. Inventory outward-facing delegations continuously. Enumerate every CNAME in every zone that points to a third-party service and reconcile it against a live-resource check, on a schedule — not once. Certificate-transparency logs and passive DNS make the attacker's enumeration trivial; the defender must run the same enumeration first, and treat any delegation whose target returns an unclaimed-resource signal as a live incident.
  3. Constrain who can create records and where they can point. Treat the authority to add a CNAME to a corporate zone as a privileged action with an owner and an expiry, and prefer a small set of sanctioned hosting providers over an open field, so that the population to monitor stays bounded.
  4. Reduce the blast radius of a takeover before it happens. Scope session cookies to the exact host that needs them rather than the parent domain; avoid Domain=.example.com for anything sensitive. Tighten Content Security Policy, CORS, and redirect allow-lists so they do not trust *.example.com wholesale. These controls do not prevent a takeover, but they sharply limit what a hijacked subdomain can do to the rest of the estate.

Organizations can establish their own exposure directly. EchelonGraph publishes the aggregate subdomain-takeover findings from this study, and a self-service external surface scan that surfaces an organization's own third-party-delegated subdomains and flags dangling records before an attacker claims them. Because the entire estate of trusted hostnames is the blast radius, the same names belong in the attack-graph view that models how a single trusted foothold connects to identity, session, and downstream systems — so that a forgotten storefront is evaluated as the phishing and session-theft platform it can become, not merely as a stale line in a DNS zone.

Chapter 8
Chapter 8

The Vulnerability Landscape

The preceding chapters described exposure — what an organization has inadvertently left reachable on the public internet. This chapter describes the raw material that adversaries pair with that exposure: the vulnerabilities themselves. A reachable host is only a target once a usable flaw exists in the software it runs. The two halves are inseparable. The KEV-exposure findings in Chapter 4 are precisely the intersection of an exposed surface and a known, weaponized flaw; this chapter steps back to the full corpus from which that intersection is drawn.

The headline problem is not that vulnerabilities exist — they always have — but that they are now disclosed faster than any human triage queue can absorb, and that the severity score most teams still anchor on is a poor predictor of which ones will actually be used against them. EchelonGraph tracks and enriches the complete public CVE record so that this chapter can speak to scale, composition, and — most importantly — the shift in how defensible organizations decide what to fix first.

The corpus, in scale

As of the close of the observation window, EchelonGraph tracked and enriched 340,552 distinct CVEs — effectively the entire published catalogue of software vulnerabilities with an assigned identifier. This is not a sample. It is the population, and its composition matters because it dictates the arithmetic every security team is up against.

Of 340,552 tracked CVEs, 33,409 are rated Critical and 103,468 are rated High — roughly two in five carry a severity that, taken at face value, demands urgent action.

That figure is the crux of the modern triage crisis. If a Critical or High rating were a reliable instruction to drop everything and patch, the instruction would arrive for 136,877 vulnerabilities — the sum of the two top bands. No organization — not the largest bank, not a hyperscaler — has the engineering capacity to treat 136,877 items as urgent. Severity alone, applied literally, produces a backlog that is not a prioritization at all. It is noise with a red label.

The tracked CVE corpus by class (n = 340,552)
ClassCountWhat it signals
Total CVEs tracked & enriched340,552The full published catalogue
Critical severity33,409Highest CVSS band
High severity103,468Second band; large by volume
CVSS v4 scored26,817Newer scoring standard, adoption growing
AI-related5,190Flaws in or adjacent to AI/ML systems
High-EPSS (exploitation likely)4,265Statistically probable to be exploited
CISA-KEV (known exploited)1,621Confirmed used in the wild
Ransomware-linked KEV326Tied to known ransomware operations
Newly published, last 30 days7,624Net new arrivals in the window
High103,468Critical33,409CVSS v4 scored26,817High-EPSS (≥0.50)4,265AI-related5,190CISA-KEV1,621
The CVE corpus by class (n = 340,552). The two largest bars — High and Critical severity — dwarf the bars that actually predict exploitation: High-EPSS and CISA-KEV. The mismatch in scale is the triage problem rendered visually.

The signal hides in the small numbers

Read the table again, but invert the instinct. The large numbers — 103,468 High, 33,409 Critical — are the distraction. The numbers that should govern a remediation queue are the small ones at the bottom.

Only 1,621 of the 340,552 tracked CVEs appear on CISA's Known Exploited Vulnerabilities catalogue — the authoritative public record of flaws confirmed to be exploited in the wild. That is less than half of one percent of the corpus. A further 4,265 carry a high Exploit Prediction Scoring System (EPSS) probability, meaning the data-driven forecast says exploitation is statistically likely even where it has not yet been publicly confirmed. And just 326 — a few hundred out of a third of a million — are tied to known ransomware operations, the category most likely to produce a board-level incident.

The vulnerabilities confirmed exploited in the wild number 1,621 — under 0.5% of the catalogue. The remaining 99.5% are real, but they are not where adversaries are spending their time.

This is the single most important reframing in the chapter. The defensible posture is not "patch everything Critical." It is "find the few hundred to few thousand vulnerabilities that are actually being used, confirm whether your exposed surface runs the affected software, and fix those first." The corpus is enormous; the actionable subset is small, knowable, and changes the math from impossible to tractable.

From CVSS-only triage to EPSS, KEV and SSVC

For most of the last decade, vulnerability management ran on a single number: the CVSS base score. It is a useful measure of intrinsic technical severity — how bad the flaw is in the abstract, assuming an attacker reaches it. What CVSS was never designed to answer is the question a CISO actually asks: is this one going to be used against us, and does it matter in our environment? Three complementary signals have emerged to close that gap, and EchelonGraph enriches every CVE in the corpus with all of them.

The four signals — what each answers, and its limits
SignalQuestion it answersWhat it does not tell you
CVSS (v3 / v4)How technically severe is this flaw in the abstract?Whether anyone is, or will be, exploiting it
EPSSWhat is the probability it will be exploited in the near term?Whether it is severe, or relevant to your assets
CISA-KEVIs it confirmed exploited in the wild, right now?How likely future exploitation is for items not yet listed
SSVCGiven exploitation, exposure and impact, what should I do?A raw number — it outputs a decision, not a score

The four are not competitors; they are layers. CVSS establishes whether a flaw is worth caring about at all. EPSS — the Exploit Prediction Scoring System — converts the question of exploitation from a guess into a probability, recalculated daily from observed attacker behaviour, telemetry and exploit availability. CISA-KEV is the ground-truth backstop: a binary, human-curated "this is being exploited today, act now." And SSVC — the Stakeholder-Specific Vulnerability Categorization model — fuses exploitation status, system exposure and mission impact into an explicit decision: Track, Track *, Attend, or Act. It is the only one of the four that outputs an action rather than a metric, which is why mature programs increasingly anchor on it.

The practical consequence is a reordering. Under CVSS-only triage, a Critical-rated flaw with no known exploitation and a negligible EPSS sits in the queue ahead of a "merely High" flaw that is on the KEV list and tied to ransomware. That ordering is exactly backwards relative to risk. Layering EPSS and KEV on top of CVSS — and ideally resolving to an SSVC decision — inverts it back to something a defender can rationally execute. EchelonGraph's per-CVE enrichment carries CVSS v3 and v4, EPSS, CISA-KEV status with its own tiering, an SSVC decision, GHSA cross-references, ransomware and AI-related flags, and a composite EchelonGraph score, precisely so a team does not have to assemble these signals by hand for each item.

Every CVE in the corpus is enriched with multiple decision signals — CVSS v3/v4, EPSS, CISA-KEV with EchelonGraph tiering, SSVC, GHSA cross-references, and ransomware / AI-related flags — not a single severity number.

Where exposure and the corpus meet

This chapter and Chapter 4 are two readings of the same reality. The corpus described here is the universe of what could be exploited; the KEV-exposure findings are what is exploitable and reachable today across the observed surface. The intersection is sobering: 21,299 hosts were observed exposing software affected by CISA-KEV actively-exploited vulnerabilities, and 3,475 host:CVE pairs involved ransomware-linked flaws. Those are not abstract corpus statistics — they are live, internet-facing instances where the small, dangerous subset of the catalogue has landed on a real, reachable machine.

It is worth stating plainly what that intersection has historically cost. The 2017 Equifax breach turned on a single known, patchable web-framework flaw left unremediated on an internet-facing system — a textbook case of a high-EPSS, ultimately KEV-class vulnerability meeting an exposed surface. The 2023 MOVEit Transfer campaign showed how one flaw in a widely deployed file-transfer product, once weaponized, cascades across thousands of organizations simultaneously the moment it moves from "disclosed" to "exploited in the wild." And the 2024 regreSSHion flaw in OpenSSH — observed in this very dataset across thousands of exposed hosts — is a reminder that the most ubiquitous, most trusted software is also the highest-value target precisely because it is everywhere. In each case the defensive failure was not ignorance of the flaw; it was an inability to distinguish the one that mattered from the tens of thousands that did not, and to act on it before the window closed.

Two pressures that bend the curve: velocity and AI

Two structural shifts make the triage problem worse over time, not better.

The first is velocity. In the roughly five-week observation window, 7,624 new CVEs were published — net new arrivals on top of an already-saturated backlog. Disclosure now runs faster than human review can keep pace, which means the gap between "a flaw exists" and "a team has assessed it" widens continuously unless triage is automated against exploitation signals rather than handled item by item. The arithmetic only closes if the filter is EPSS- and KEV-driven from the outset.

The second is AI. The corpus already contains 5,190 AI-related vulnerabilities — flaws in or adjacent to AI and machine-learning systems, model-serving stacks, inference frameworks and the tooling around them. This is a vulnerability class that barely existed a few years ago and is now a measurable, growing slice of the catalogue. It maps directly onto the AI attack surface documented in Chapter 3: as organizations stand up model proxies, vector databases and notebook environments, they introduce a fresh and rapidly evolving inventory of flaws that traditional vulnerability programs were never built to track. An AI-related CVE on an exposed, unauthenticated inference endpoint is the next decade's equivalent of an unpatched web framework on a public server.

Two pressures bending the vulnerability curve
PressureObserved figureWhy it compounds
Disclosure velocity7,624 new CVEs in ~5 weeksArrivals outrun human triage; backlog grows unless filtered by exploitation signal
AI-class vulnerabilities5,190 AI-related in corpusNew surface (Chapter 3), immature tooling, fast-moving flaw inventory
Scoring transition26,817 CVSS v4 scoredDual v3/v4 world; programs must read both during the multi-year migration

The scoring transition itself is a quieter third pressure. With 26,817 CVEs now carrying CVSS v4 scores alongside the long-established v3, vulnerability programs are operating in a dual-standard world for the foreseeable future. A team that reads only v3, or only v4, will systematically misrank a growing share of the catalogue. EchelonGraph carries both for every applicable CVE so that a comparison is never apples-to-oranges.

What this means for the reader

The vulnerability landscape is not best understood as a number that goes up every year. It is best understood as a separation problem: an enormous, ever-growing corpus in which the genuinely dangerous subset is small, identifiable, and obscured by severity inflation. The organizations that fare well are not the ones that patch the most; they are the ones that patch the right few hundred items before the exploitation window opens.

  • Stop triaging on CVSS alone. Severity tells you how bad a flaw is in the abstract; it does not tell you whether it will be used against you. Of the corpus, only 1,621 CVEs are confirmed exploited and 4,265 are statistically likely to be — that is the universe to anchor on, not the 136,877 rated High-or-Critical.
  • Layer EPSS, KEV and SSVC over CVSS. Probability of exploitation (EPSS), confirmation of exploitation (KEV) and an explicit do-this decision (SSVC) turn an impossible backlog into a ranked, executable queue.
  • Intersect the corpus with your own exposure. A KEV-class flaw matters to you only if you run the affected software on a reachable host. Pairing the catalogue against your external surface — as in Chapter 4's 21,299 exposed KEV hosts — is how a global statistic becomes a personal worklist. The free Surface Scanner is the starting point for that intersection.
  • Plan for velocity and for AI. 7,624 new CVEs in five weeks is the steady state, not a spike; 5,190 AI-related flaws are an early reading of a curve that is still bending upward. Triage that depends on human review of every item will fall further behind every month it runs unchanged.

The corpus is the map of everything that could go wrong. The enrichment signals — EPSS, KEV, SSVC, ransomware and AI flags, the composite EchelonGraph score available on every entry in the CVE Pulse feed — are the legend that tells a defender which roads adversaries are actually driving down. Read together, they convert a third of a million vulnerabilities from a source of paralysis into a short, ordered list of decisions.