01Data leakage detection for evaluation and training sets
02Statistical test evaluation for non-deterministic outputs
03Production-grade reliability metrics and benchmarking
04Behavioral contract testing to verify agent invariants
05Adversarial testing frameworks to identify edge-case failures
060 GitHub stars