How does this skill prevent common A/B testing errors?

It enforces mandatory 'hard gates' for hypothesis locking and sample size calculation, preventing 'peeking' and goalpost-shifting before any code is written.

Can I use this for multivariate testing?

Yes, the skill provides criteria for selecting between A/B, A/B/n, Multivariate (MVT), and Split URL tests based on your specific traffic and needs.

Why would the skill refuse to proceed with a test setup?

The skill includes safety refusals if traffic volume is too low for statistical power, if the baseline rate is unknown, or if the primary metric is undefined.

What are guardrail metrics in this framework?

Guardrail metrics are secondary data points that must not degrade during a test; they act as a safety trigger to stop a test if a variant causes harmful side effects.

A/B Test Setup

Name: A/B Test Setup
Author: claudiodearaujo

byclaudiodearaujo

0•

Analytics & Monitoring

Enforces a structured, statistically rigorous workflow for designing and validating A/B tests before implementation.

This skill provides a systematic framework for planning A/B tests, ensuring every experiment is backed by a valid hypothesis, defined metrics, and sufficient statistical power. It acts as a procedural gatekeeper, preventing common pitfalls like 'peeking,' insufficient sample sizes, or ill-defined success criteria. By guiding users through mandatory checks for assumptions, metric selection (Primary, Secondary, and Guardrail), and execution readiness, it transforms A/B testing from a trial-and-error process into a disciplined scientific methodology suitable for production environments.

Key Features

01Statistical power and sample size estimation tools

02Multi-tier metric definition including primary and guardrail metrics

03Validity and assumption checks for traffic stability and randomization

04Mandatory hypothesis locking to prevent mid-test goal shifting

050 GitHub stars

06Structured analysis discipline for interpreting statistical significance

Use Cases

01Validating feature rollout impact on core product performance metrics

02Establishing safety guardrails to prevent unintended negative effects during testing

03Designing conversion rate optimization (CRO) experiments for landing pages

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add claudiodearaujo/izacenter ab-test-setup

For use in Claude.ai and ChatGPT

Key Features

01Statistical power and sample size estimation tools

02Multi-tier metric definition including primary and guardrail metrics

03Validity and assumption checks for traffic stability and randomization

04Mandatory hypothesis locking to prevent mid-test goal shifting

050 GitHub stars

06Structured analysis discipline for interpreting statistical significance

Use Cases

01Validating feature rollout impact on core product performance metrics

02Establishing safety guardrails to prevent unintended negative effects during testing

03Designing conversion rate optimization (CRO) experiments for landing pages

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add claudiodearaujo/izacenter ab-test-setup

For use in Claude.ai and ChatGPT