Case Study:

VaultCheck — PHP CLI Tool for .env Secrets Auditing

The Challenge

Every PHP project that reaches production carries an invisible attack surface: the .env file. Developers routinely commit secrets to git by accident, reuse the same credentials across staging and production, leave placeholder values in place after onboarding, and rely on filesystem permissions that expose secret files to anyone on the shared server. The problem compounds with scale — a team of five generating a handful of environment files becomes a team of twenty with overlapping .env.staging, .env.testing, and backup files, none of which are consistently audited.

The more insidious half of the problem lives in git history. A secret that was committed once and then deleted is still fully retrievable by anyone who can clone the repository. Without tooling that actively scans history, teams operate under a false sense of security — believing that removing a secret from the current branch means it is gone. Beyond discovery, there was no free tool that could definitively answer whether a current production credential had ever appeared in a commit, making rotation decisions speculative rather than evidence-based.

Existing solutions were either expensive commercial platforms, incomplete scripts checking only one dimension of the problem, or generic secret scanners with no understanding of environment variable hygiene, codebase usage patterns, or cross-environment consistency. The goal was a single open-source CLI that ran in seconds, covered every dimension of the problem, and integrated cleanly into any CI/CD pipeline without configuration.

Engineering Solution

VaultCheck was architected around a single-pass scanning model: all environment files, PHP source files, and git history are parsed once per audit run and loaded into a shared ScanContext value object. Each of the 43 checks reads from this shared context rather than performing its own I/O, making the audit fast regardless of how many checks are registered. Adding a new check requires only implementing a two-method interface — run() and an optional isApplicable() guard — with no modification to the core engine.

Checks are organized into six categories that together cover the full lifecycle of a secret: Environment checks audit .env and .env.example file structure and content; Codebase checks cross-reference what the PHP source actually calls against what is defined; Permission checks inspect filesystem access controls; Consistency checks compare values across multiple environment files to detect credential reuse; Strength checks assess the entropy and known-weakness of secret values; and Git History checks scan every commit ever pushed to the repository for exposed credentials, using both a pattern registry of 30+ known service credential formats and Shannon entropy analysis to catch secrets that don't match any known format.

Beyond the primary audit command, VaultCheck ships four additional commands that address the ongoing operational challenge of secrets hygiene. The keys command gives teams a live inventory of every environment variable — whether it is defined, empty, referenced in code, unused, or missing a default fallback. The snapshot and drift commands create a baseline of the current state and surface what changed between audits, tracking key additions, removals, value rotations, and finding regressions without ever storing raw secret values. The fix command applies safe, reversible remediations automatically, correcting file permissions, stripping Windows line endings, and removing duplicate keys, with a --dry-run mode to preview changes before they are applied.

Git History Intelligence

The git history scanner is the most distinctive capability in VaultCheck and the hardest class of problem to address with manual review. When an audit runs without --skip-history, the tool shells out to the project's git binary, walks up to 500 commits (or the entire history with --full-history), extracts every added line from every diff, and runs two independent analyses in parallel: pattern matching against a registry of 30+ service-specific credential formats, and entropy analysis using Shannon's information entropy formula to flag high-randomness tokens that don't match any known format.

The most actionable output is G008 - Unrotated Leak: if a value currently in .env was also found anywhere in the commit history, the check fires as CRITICAL. This is definitive evidence that a credential was both exposed and never rotated — the most dangerous combination a production system can be in. All matched values are redacted in output to show only the first four and last two characters, so the finding can be acted on without re-exposing the secret in a log file or report.

Technical Highlights

  • Single-pass scanning architecture — .env files, PHP source code, and git history are each parsed once per run and stored in a shared ScanContext value object; all 43 checks read from the shared context rather than performing their own I/O, keeping audit time flat regardless of check count
  • 43 checks across six categories (Environment, Codebase, Permissions, Consistency, Strength, Git) with a four-tier severity model (CRITICAL -> HIGH -> MEDIUM -> LOW), a pluggable check registration interface, and per-check isApplicable() guards that skip irrelevant checks rather than producing false negatives
  • Git history intelligence pipeline combining a pattern registry of 30+ known service credential formats (Stripe, AWS, GitHub, Google, Twilio, SendGrid, Slack, JWT, PEM keys, and more) with Shannon entropy analysis to catch high-randomness tokens that don't match any known format — both run against every added line in the commit graph
  • G008 Unrotated Leak detection - cross-references every non-trivial current .env value against the full commit history; a match is reported as CRITICAL and identifies credentials that were definitively exposed and never rotated, turning a speculative rotation decision into an evidence-based one
  • Snapshot and drift system - saves a baseline of key hashes (SHA-256, never raw values) and the current finding set to .vaultcheck/snapshot.json; subsequent drift runs surface added keys, removed keys, rotated values, newly introduced findings, and resolved findings since the last snapshot, making regression detection a one-command operation
  • Pluggable reporter system with three output formats: colored terminal output for interactive use, a JSON envelope with severity counts and structured finding objects for CI/CD pipelines and dashboards, and a Markdown table report for sharing - all consuming the same FindingCollection without duplication
  • Auto-fix engine with five safe, reversible remediations (world-readable permissions, world-writable permissions, group-writable permissions, Windows CRLF line endings, duplicate keys) behind a mandatory --safe flag, with --dry-run preview and --yes for non-interactive pipelines; destructive operations like backup file deletion and git history rewriting are intentionally excluded
  • CI/CD integration via --strict flag on the audit command, which exits with code 1 if any MEDIUM or higher finding exists - enabling a pipeline step that blocks deployments when secrets hygiene regresses, with JSON output allowing finding data to be piped into dashboards or ticketing systems