010 GitHub stars
02Metric selection guidance including F1, Spearman's ρ, and Cohen's κ
03Custom rubric generation for domain-specific quality standards
04Automated LLM-as-a-Judge scoring and comparison frameworks
05Systematic bias mitigation protocols for position and length bias
06Chain-of-Thought integration for transparent evaluation reasoning