Auto-Remediation Architecture
Overview
Auto-Remediation (T3.7) turns a finding from any of the scanner tiers into an Infrastructure-as-Code patch and routes it through the delivery channel you pick. Three modes are useful at different points in your rollout — pick one per tenant from the Remediation Settings page in the dashboard:
- Dry-run — patches are generated and audit-logged, never auto-applied. Operator marks-applied or dismisses manually. (Default; safe.)
- Pull Request — every new patch opens a PR (GitHub) or MR (GitLab) in the repository you nominate. Your reviewers merge.
- Approval queue — patches land in a pending-approval inbox; an admin clicks Approve to promote.
- Auto-apply — non-review-required patches apply automatically (kubectl / terraform). Templates flagged
requires_reviewstill route through the approval queue.
The same pipeline runs in three deployment shapes — pick whichever matches your environment trust model.
The three tiers
Tier 1 — SaaS-side generator (default; works out of the box)
The SaaS-side generator is a worker inside our core-backend service. It polls tier3_findings for new rows, looks up your tenant's remediation_settings, renders the patch, and writes it to remediation_patches. Strict-ZK posture: SaaS has no plaintext access to the encrypted finding payload, so patch bodies use REPLACE_ME placeholders for resource names — your operator substitutes them during review.
- •Detects IOCs / processes / syscalls
- •AES-GCM encrypts sensitive fields
- •Ships ciphertext only
- •Looks up remediation_settings
- •Picks template from 11-rule catalogue
- •Renders body with REPLACE_ME placeholders
Best for: every customer's first week. No infrastructure changes needed; review patches in the dashboard, copy bodies into your own change-management process.
Tier 2 — Pull Request connector (your repo + PAT)
Same generator, but when Mode = Pull Request the worker opens a PR/MR in *your* repository as soon as the patch is rendered. The personal access token (PAT) lives in GCP Secret Manager under a deterministic name (remediation-{tenant_id}-{provider}-pat); only the secret resource name lives in our database. PATs never round-trip through the dashboard after the initial paste.
- •mode
- •github_default_repo
- •github_api_base ← self-hosted GHE
- •github_token_secret (resource name)
- •PAT plaintext stored here only
- •addVersion on rotation
- •Resource name only in PG
- •TLS 1.2 floor + retry/backoff
- •Body cap 256 KiB + panic recovery
- •APIBase honours self-hosted hostname
Best for: customers who already gate infrastructure changes through PR review. Patches arrive as merge requests with the rollback snippet pre-populated in the description.
Tier 3 — Agent-side full path (master agent applies in your cluster)
The Master agent in your customer cluster runs the remediation engine locally. It scans Kubernetes for misconfigurations (e.g. missing default-deny NetworkPolicy), renders the patch, optionally applies it via kubectl / terraform, and audit-reports the outcome to SaaS through the existing SubmitRemediationOutcome gRPC. SaaS only ever sees the audit row — no patch bodies, no resource names, unless your operator chose to apply (in which case the audit log reflects what actually ran).
Enable via Helm:
master:
remediation:
enabled: true
mode: dry-run # or approval / pr / auto
autoApply: false
pollInterval: 5mBest for: regulated environments where exfiltration is unacceptable, or air-gapped clusters where the Master must apply patches itself. The dashboard shows agent-produced rows with a green AGENT badge so you can audit what ran.
Self-hosted Git Enterprise (Walmart, Coupang, JPMorgan, …)
Many enterprises don't put their infrastructure code on github.com. Common patterns:
- GitHub Enterprise Server at
github.(e.g..com github.walmart.com,github.coupang.net) - GitLab self-hosted (Omnibus / Helm / Dedicated) at
gitlab..com - Bitbucket Data Center — not yet supported (open a feature request)
The Pull Request connector handles GHE and GitLab self-hosted natively. Each server exposes the same REST API as its public counterpart, just at a customer-controlled hostname:
| Platform | Public default | Self-hosted format |
|---|---|---|
| GitHub.com | https://api.github.com | — |
| GitHub Enterprise Server | — | https:// |
| GitLab.com | https://gitlab.com/api/v4 | — |
| GitLab self-hosted | — | https:// |
In Remediation Settings → GitHub connector (or GitLab), paste the API base URL into the optional API base URL field. Leave blank for public GitHub.com / GitLab.com. Make sure the PAT you paste was issued by the same instance — a github.com token won't authenticate against github.walmart.com and vice versa.
Required PAT scopes:
- GitHub:
repo(private repo + branch + commit + PR) - GitLab:
api
Network reachability: SaaS-side Tier 2 calls these APIs from our Cloud Run egress. If your enterprise instance is behind a corporate firewall not reachable from the public internet, you have two options:
- Allow-list our Cloud Run egress IPs (we can provide them under NDA).
- Switch to Tier 3 (agent-side) — the Master pod runs inside your network and reaches your Git server directly without any inbound egress allow-listing.
Where secrets live
| Item | Storage | Plaintext exposed to SaaS? |
|---|---|---|
| GitHub / GitLab PAT | GCP Secret Manager, name remediation-{tenant}-{provider}-pat | Only at PUT time (HTTPS), never at rest |
| Patch body (Tier 1 SaaS) | remediation_patches.body (Postgres) | Yes (REPLACE_ME placeholders only — no resource names) |
| Patch body (Tier 3 Agent) | remediation_patches.body (Postgres) | Yes — your operator chose to delegate apply, so the audit row reflects what ran |
| Customer infrastructure code | YOUR GitHub / GitLab | No — we open a PR; your reviewer merges |
Tokens are stored in Secret Manager rather than the database so they survive a database compromise. The settings UI accepts the PAT once via HTTPS POST; the backend writes it via the Secret Manager v1 addVersion API and stores only the secret resource name in Postgres.
Choosing a mode (decision tree)
End-to-end walkthrough — first PR-mode patch
- Open Remediation Settings in the dashboard (admin role required).
- Choose Mode = Pull Request.
- Under GitHub connector, paste:
acme-corp/infrastructure
- Base branch: main - API base URL: leave blank for github.com, or paste https://github.acme.com/api/v3 for GHE - Personal access token: ghp_… (the dashboard never displays it again)
- Click Save settings. The backend writes the PAT to Secret Manager and stores the resource name in
remediation_settings. - Wait for the next finding (the validation cluster fires
T3.6-IOC-DOMAINevery few seconds for testing). - Within ~30 seconds, the generator polls
tier3_findings, renders the patch, fetches the PAT from Secret Manager, and opens a PR. - The Remediation Center row flips to
pr_openedwith a working PR # link. - Your reviewer merges; the dashboard's per-row "Mark applied" button records the apply.
If anything fails (e.g. PAT lacks repo scope, branch already exists), the row flips to failed with the GitHub error message in error_message so you can debug from the drawer.
Operational notes
- Per-tenant matched_count — duplicate detections collapse onto one row; the row shows "matched 12× · last 2m ago" so you don't chase ghosts.
- Rotation — paste a new PAT into the same field on Remediation Settings; the backend calls Secret Manager
addVersionso the latest version is always read by the generator. Old versions remain accessible in Secret Manager for audit; you can disable them post-hoc. - Rollback — every patch has a stored
reversesnippet. The "Rollback" button onstatus='applied'rows runs the reverse (or, on Tier 1, hands you the snippet to run yourself). - Dismiss — false-positive patches go to
status='dismissed'with a reason field so subsequent re-detections don't re-noise the list.