01Tracks success reliability using standardized pass@k and pass^k metrics
02Implements Eval-Driven Development (EDD) workflow within Claude sessions
03Supports deterministic code-based, model-based, and human-in-the-loop graders
040 GitHub stars
05Automates evaluation report generation and status tracking
06Standardizes eval storage in project-specific .claude/evals directories