01FP8 precision support specifically for Hopper (H100) architecture
02Integrated torch.compile setup for 3-6x speed improvements after warmup
03Automatic GPU tier detection for H100, A100, and RTX 4090/3090
04Layered configuration logic that preserves hardware optimizations across test/prod modes
05Optimized environment scaling (n_envs) to ensure maximum GPU saturation
060 GitHub stars