01Comprehensive Bias Mitigation for position, length, and verbosity
02Automated Rubric Generation with observable level characteristics
03Standardized Direct Scoring and Pairwise Comparison frameworks
0410 GitHub stars
05Chain-of-Thought evaluation protocols to improve scoring reliability
06Consistency-checked model comparison and tie-breaking logic