Case Study:

TraceTrellis - Multi-Source OSINT Platform with Domain Security Intelligence

The Challenge

Investigators, security professionals, and due diligence analysts routinely need to build a complete picture of a digital subject, email addresses, domains, and usernames, but the data is scattered across dozens of disconnected sources: certificate transparency logs, DNS registries, WHOIS data, web archives, paste sites, social platforms, and code repositories. Working across these manually is slow, produces no unified view, and makes it nearly impossible to see how entities relate to each other across sources.

Beyond raw data aggregation, there was a structural gap in free tooling: no existing tool assessed the security posture of a domain as part of the investigation workflow. Knowing that a domain exists is one signal, knowing it exposes high-risk services, fails to enforce basic security headers, or runs an outdated technology stack is an entirely different level of intelligence. The platform needed to collapse the full OSINT and security assessment workflow into a single automated investigation.

A deeper design challenge emerged at the architecture level: the intelligence value of a multi-source investigation comes not from any individual source, but from the relationships between what they collectively return. A username appearing across three social platforms, a GitHub account whose email matches a domain registrant, a certificate organization field tying two domains to the same operator, these cross-source connections are invisible when sources are queried independently. Building a platform that reliably surfaces them required designing a correlation layer from the ground up, one that could reason across heterogeneous artifact types with meaningful confidence scores.

System Architecture

TraceTrellis is built around a strict two-phase pipeline that separates data collection from intelligence correlation. The separation is intentional: correlating artifacts incrementally as they arrive from concurrent sources produces inconsistent results, because a rule that depends on the combination of two source outputs may fire before one of them is available. By collecting all artifacts first and correlating in a single deterministic pass, the engine produces a consistent graph regardless of which source finishes first.

Phase 1: Collection. When an investigation is submitted, a PipelineJob dispatches a batch of RunSourceJob instances, one per data source, to Redis-backed Horizon workers, where they execute concurrently. Each source implements a shared SourceInterface with its own configured timeout, failure handling, and typed artifact output. A source that fails writes a classified SourceFailure record and exits cleanly; its failure has no effect on the sources running alongside it. The result is a set of typed artifact records covering whatever sources succeeded, ready for the correlation pass.

Phase 2: Correlation. After the batch completes, a CorrelateArtifactsJob runs the CorrelationRuleEngine against the full artifact set, producing a typed graph of nodes and edges. TimelineBuilder then processes the same artifact set to extract a chronological event sequence with group classifications and node linkage. Because collection and correlation are fully decoupled, new data sources can be added to the SourceRegistry without touching the correlation engine, and new correlation rules can be added without touching any source implementation.

Engineering Solution

Each of the platform's data sources is a self-contained, pluggable service implementing a shared interface with its own timeout, failure classification, and artifact output schema. The SourceRegistry resolves which sources run for a given investigation type at dispatch time, so the pipeline core has no conditional branching around individual sources, adding a new source means registering it; nothing else changes. Sources execute concurrently rather than sequentially, reducing total investigation time to a fraction of what sequential API calls would require.

Domain investigations run a full security intelligence pass alongside standard OSINT collection: SSL certificate metadata and expiry analysis, HTTP security header inspection across eight standard headers with per-header remediation guidance, open port scanning against a curated list of high-risk services, and passive technology stack fingerprinting derived from certificate transparency data. DNS intelligence is resolved separately, A and AAAA records, MX servers, NS servers, and merged with WHOIS-derived node data during the correlation pass, so all intelligence about a given domain is unified at the graph level rather than scattered across separate source cards.

The CorrelationRuleEngine applies ten confidence-scored rules to the collected artifact set. Each rule maps a specific artifact pattern to a named relationship type: a certificate transparency record produces an owns_domain edge at 0.90 confidence; a social profile confirmed across platforms produces a confirmed_identity edge at 0.90; a breach record produces an exposed_in_breach edge at 0.99. An in-memory node cache prevents duplicate edges within a single correlation pass without requiring additional database round-trips. Results feed into an interactive Cytoscape.js relationship graph and a vis-timeline chronological view, both scored against a multi-dimensional exposure model that weighs social footprint, domain visibility, certificate history, and security posture into a single normalized risk score per investigation.

Key Engineering Decisions

Two-phase pipeline over streaming correlation. An early alternative was to correlate incrementally, as each source job completed, run correlation rules against whatever artifacts had arrived so far. This was discarded because rules that depend on the combination of two source outputs produce incorrect results when only one has arrived, and re-running the full correlation pass each time a source completes is prohibitively expensive at 16 concurrent jobs. The two-phase model trades a brief wait after collection for a single deterministic correlation pass that sees the complete artifact set.

Graceful degradation as a first-class design principle. Sources are architecturally isolated, a failed source writes a typed SourceFailure record (classified as timeout, rate limited, service unavailable, missing API key, or unknown error) rather than propagating an exception to the pipeline. An investigation that completes with 13 of 16 sources is far more useful than one that fails entirely because a single upstream API was slow. Users see exactly which sources failed and why, displayed alongside the results from the sources that succeeded, and can retry failed sources independently without re-dispatching the full investigation.

Pure Blade and Alpine.js on the investigation show page. The rest of the application is Livewire-driven, but the investigation show page uses pure Blade templating with Alpine.js for reactive elements. Cytoscape.js and vis-timeline both hold significant client-side state and manipulate the DOM directly; Livewire's server-driven DOM diffing conflicts with libraries that expect stable node references. Keeping the show page outside Livewire's update cycle allowed both visualization libraries to be integrated without lifecycle workarounds, while the rest of the application continued to use Livewire normally.

Technical Highlights

  • Parallel job dispatch system executing all source jobs concurrently via Laravel's batch API, with per-source timeout enforcement, typed failure classification across five reason categories, real-time progress reporting to the browser via a polling endpoint, and stale detection that surfaces a warning alert if an investigation remains in a running state past a configurable threshold
  • CorrelationRuleEngine applying ten confidence-scored rules to produce a typed relationship graph, rules cover certificate ownership (0.90), GitHub identity (0.95), confirmed and unconfirmed social profiles (0.85–0.90), Wayback domain operation history (0.80), Gravatar profile linkage (0.90), related domains via shared registrant (0.85), and breach exposure (0.99); an in-memory node cache prevents duplicate edges within a single pass without additional database round-trips
  • Domain security intelligence pipeline covering SSL certificate metadata and expiry analysis, eight HTTP security header checks with per-header remediation guidance, open port exposure scanning with automatic risk flagging for high-severity services, and passive technology stack fingerprinting from certificate transparency data, all scored and surfaced inline with OSINT findings
  • DNS intelligence source resolving IPv4 and IPv6 records, MX servers with priority ordering, and NS servers; results merge with domain node metadata during correlation so all domain-level intelligence, registrant data, DNS records, and certificate history — is unified at a single graph node rather than scattered across separate cards
  • Interactive relationship graph built on Cytoscape.js with nine typed node shapes and colors, confidence-weighted edge styles where weight, opacity, and dash pattern all reflect the edge confidence score, three layout algorithms (force-directed, circular, and grid), a low-confidence toggle, a clickable node detail panel with type-specific metadata templates, and double-tap pivot, double-clicking any discovered node launches a new investigation seeded with that entity's value
  • Chronological timeline built on vis-timeline, organizing events into five typed groups (profile, domain, code activity, certificate, and breach) with full bidirectional linkage to graph nodes, clicking a timeline event animates the graph to the matching node, and clicking a graph node highlights its timeline events; the timeline state is captured and embedded in PDF exports alongside the graph image
  • Multi-dimensional exposure scoring model aggregating social footprint breadth, domain visibility and history, certificate and archive coverage, and security posture findings into a single normalized risk score per investigation, surfaced in the investigation summary alongside granular per-source results
  • White-label PDF report generation supporting fully custom-branded exports, user-uploaded logo, company name, tagline, and contact details are injected into every page, the relationship graph is embedded as a base64 PNG rendered via Cytoscape headless export, and breach details, timeline events, and artifact summaries are all included, producing a client-ready deliverable with no platform branding