01Rollout Routing Replay (R3) for exact bit-wise expert alignment between train and inference
021 GitHub stars
03Unified FP8 and INT4 quantization-aware training for massive MoE models
04Speculative RL via EAGLE algorithm for up to 40% faster rollout throughput
05Comprehensive support for DeepSeek V3, Qwen3-MoE, and Llama model families
06Zero-copy weight synchronization using CUDA IPC mapping