This skill provides a systematic framework for planning A/B tests, ensuring every experiment is backed by a valid hypothesis, defined metrics, and sufficient statistical power. It acts as a procedural gatekeeper, preventing common pitfalls like 'peeking,' insufficient sample sizes, or ill-defined success criteria. By guiding users through mandatory checks for assumptions, metric selection (Primary, Secondary, and Guardrail), and execution readiness, it transforms A/B testing from a trial-and-error process into a disciplined scientific methodology suitable for production environments.
Key Features
01Statistical power and sample size estimation tools
02Multi-tier metric definition including primary and guardrail metrics
03Validity and assumption checks for traffic stability and randomization
04Mandatory hypothesis locking to prevent mid-test goal shifting
050 GitHub stars
06Structured analysis discipline for interpreting statistical significance