How it works
Pipeline summary
| Step | Tool | Input | Output | Human checkpoint |
|---|---|---|---|---|
| 0 | — | Target identifier, authorization basis, out-of-scope list | Engagement-scoped task | ✓ operator confirms |
| 1 | Tavily | CVE / target keywords | Patch diff URL, advisories | — |
| 2 | git, wget, APK pull | Vulnerable + patched versions | Two binary trees | — |
| 3 | Ghidra + GhidraMCP | Two binaries | Changed-function list, variant hypotheses | — |
| 4 | AFL++ | Harness + seed corpus | Crash inputs | — |
| 5 | GDB + pwndbg | Crash input | Bug class, root cause | ✓ operator triages |
| 6 | pwntools, ROPgadget | Bug class, primitive | Reproducible PoC | — |
| 7 | — | PoC + write-up | Encrypted artefact set | ✓ operator releases |
Step 0Scope intake & authorization phase.intake
The operator provides a target identifier (CVE, package, firmware image, or bug-bounty program), an authorization basis (private research, USG-contract task order, defense-prime engagement under contract, or written bug-bounty scope), and an out-of-scope list. The agent does not fire against any target without these inputs. Legal regime and EAR/ITAR posture are at /about.
Two activation paths, both opt-in and explicit at process launch:
# Coordinated-disclosure / public-research (HackerOne, PSIRT, advisory):
EROSOLAR_PROFILE=variant-research erosolar
# Procurement-delivery (USG contract / defense prime / bug-bounty program):
EROSOLAR_PROFILE=engagement-delivery erosolar
Default erosolar launches the coding profile with the offsec tool surface excluded, per the capability separation rule on /about. Profile rulebooks: agents/variant-research.rules.json, agents/engagement-delivery.rules.json.
- Inputs: target identifier, authorization basis, out-of-scope list, engagement reference (engagement-delivery only).
- Outputs: engagement record persisted to the artifact store; downstream phases inherit it.
- Validation: agent refuses to advance to phase.recon if any required input is missing.
- Checkpoint: operator confirms target and scope before any tool fires.
Step 1Target discovery with Tavily phase.recon
tavily search "CVE-2026-XXXX Android kernel patch diff"
The agent picks a recent critical CVE affecting an in-scope, high-value target. Tavily returns blog posts, NVD entries, vendor advisories, and ideally a link to the upstream Git commit or patched build.
- Inputs: CVE ID or target keywords scoped to Step 0.
- Outputs: patch-diff URL, advisory text, candidate commit hashes.
- Validation: at least one source independently corroborates the patch location.
Step 2Binary acquisition phase.acquire
The CLI uses terminal commands to download the vulnerable and patched versions of the software. For open-source components, it clones the repo at two different commits and compiles them. For closed-source, it pulls firmware from a device or APKs from a vendor channel.
- Inputs: patched and pre-patch identifiers from Step 1.
- Outputs: two binary trees ready for diffing.
- Validation: both builds run their own smoke checks (entry-point reachable, expected exports present).
Step 3Variant analysis with Ghidra MCP phase.bindiff + phase.variant
The agent calls Ghidra's Version Tracking to automatically diff the two binaries. It receives a list of changed functions. Using Ghidra's decompiler, it examines each changed function to understand the fix. It then uses Ghidra's search tools to find the same code pattern in related software or older versions on your system — all done via MCP calls. This is how a public patch becomes a hypothesis generator for fresh, unpatched variants.
This is the same starting shape Project Zero's Big Sleep / Naptime uses: a known fix becomes a hypothesis generator for unfixed siblings.
- Inputs: two binary trees from Step 2.
- Outputs: changed-function list and a ranked set of variant hypotheses.
- Validation: each hypothesis names the unpatched call site, the matching sink, and the conditions for reachability.
Step 4Fuzzing campaign phase.fuzz
If no exact variant is found, the agent identifies the vulnerable input type (a specific file format, network packet, or syscall). It writes a small fuzzing harness in C/Python, seeds it with a valid input mutated to stress the fixed code path, and launches AFL++ via terminal:
afl-fuzz -i seeds/ -o findings/ -- ./harness @@
The CLI monitors the fuzzer output. When a crash appears, the crash file flows to Step 5.
- Inputs: harness, seed corpus, target binary.
- Outputs: crash inputs plus AFL++ run statistics.
- Validation: each crash reproduces under a fresh process before promotion to Step 5.
Step 5Crash triage & root cause analysis phase.triage
The agent runs the crashing input under GDB with pwndbg, captures the register state and backtrace, then cross-references with the Ghidra decompilation of the crashing function (via MCP). It diagnoses the bug class: heap overflow, use-after-free, integer overflow, type confusion, etc.
gdb -batch \
-ex "run < crash" \
-ex "checksec" \
-ex "bt full" \
-ex "info registers" \
./harness
- Inputs: crashing input, target binary, Ghidra decompilation of the faulting function.
- Outputs: bug class, root cause, controlled-corruption primitive (if any).
- Validation: non-reproducing crashes, OOMs without controlled corruption, and crashes that won't classify get dropped — they do not advance to Step 6.
- Checkpoint: operator reviews the bug-class diagnosis before exploit work begins.
Step 6PoC & exploit building phase.poc
Using pwntools templates and the bug details, the agent scripts an initial proof of concept. For use-after-free it crafts a heap spray; for stack overflows it uses ROPgadget to find gadgets in the binary. The LLM guides the process via the CLI.
- Inputs: bug class and primitive from Step 5.
- Outputs: reproducible PoC isolating the primitive.
- Validation: PoC reproduces across N fresh runs, runs cleanly on the patched build (negative test), and isolates the controlled-corruption primitive from incidental crashes. PoCs that fail any of the three do not graduate to Step 7.
Step 7Documentation for delivery phase.disclose / phase.deliver
Once a reliable exploit is built, the agent compiles a detailed write-up — technical description, affected versions, exploitation technique, PoC code — and stores it in a local, encrypted archive. The artefact set is delivered to the engagement's authorized recipient: a USG sponsor, a U.S. defense prime under contract, or a published bug-bounty program. See /defense for procurement and /about for scope.
- Inputs: validated PoC, root-cause analysis, affected-version matrix.
- Outputs: encrypted artefact set ready for operator release.
- Validation: package builds reproducibly from the captured PoC and write-up.
- Checkpoint: operator reviews the artefact set before any external delivery — nothing leaves the pipeline autonomously.
Validation: what makes a finding ship-ready
A finding leaves the pipeline only after three reproducibility checks: (1) the crash reproduces from the saved input under a fresh process, (2) the same input runs cleanly on the patched build (negative test), and (3) the operator reviews and signs off on the package before any external delivery. Crashes that don't reproduce, OOMs without a controlled-corruption primitive, and PoCs that won't run twice in a row do not graduate to Step 7. This is the false-positive bar credible peers ship against — Big Sleep's "build the actual exploit, FP rate goes to zero" rule, applied at every promotion boundary.
Where humans stay in the loop
Three named checkpoints. Step 0 — the operator confirms the target identifier, the authorization basis, and the out-of-scope list before the agent fires. Step 5 — the operator reviews the bug-class diagnosis before exploit work begins; non-reproducing crashes and uncontrolled corruption are dropped here. Step 7 — the operator reviews the artefact set before any external delivery; nothing leaves the pipeline autonomously. End-to-end autonomy without human curation is not a claim this page makes; comparable systems (Big Sleep, XBOW) operate the same way.
What this pipeline does not do
- Operate against systems lacking written authorization. EAR/ITAR posture and lawful-scope detail: /about.
- Run destructive validation in production environments. Crash and PoC validation runs in isolated harnesses against the captured binary, never against a live target.
- Deliver autonomously. Every Step 7 artefact set is operator-released; no external delivery path bypasses the human checkpoint.
Comparable systems
Three reference points the procurement audience already knows:
- Project Zero — Big Sleep / Naptime. Same starting shape as Step 3 — a known fix used as a hypothesis generator for unfixed siblings, validated against SQLite. Difference: Big Sleep is internal Google research; Erosolar delivers to U.S. government and U.S. defense-prime customers under EAR scope.
- XBOW. Demonstrated autonomous offensive AI at scale — #1 on HackerOne's U.S. leaderboard with ~1,060 submissions, all manually pre-reviewed before submission. Similar shape to Steps 4–6 (autonomous fuzz → triage → PoC), parallel-agent web-app pentesting. Difference: Erosolar is binary VR scoped for procurement delivery, not web-app pentesting.
- DARPA AIxCC finalist CRSs. Autonomous PoV generation plus patch generation against open-source codebases. Difference: AIxCC mandates patch generation; Erosolar's Step 7 is artefact delivery to an authorized recipient, not patch shipping.