01Access to 100+ benchmarks from 18+ harnesses including MMLU, GPQA, and IFEval
023,983 GitHub stars
03Specialized evaluation modules for AI Safety and Vision-Language Models (VLM)
04Automated result exporting to MLflow, Weights & Biases, and local JSON formats
05Multi-backend support for local Docker, Slurm HPC clusters, and cloud platforms
06Seamless integration with OpenAI-compatible endpoints like vLLM and TRT-LLM