01Integrated workflows for tracking training progress with automated checkpoint evaluation.
02Support for multiple backends including HuggingFace Transformers, vLLM, and external APIs.
03Access to 60+ standardized academic benchmarks including MMLU, GSM8K, and TruthfulQA.
04Extensive support for quantization, few-shot prompting, and custom task configuration.
053,983 GitHub stars
06High-performance inference options using vLLM for up to 10x faster benchmarking.