01Token-based parsing for Thinking/Reasoning models
02Batch inference support for high-throughput processing
03vLLM-accelerated generation for 2x faster inference
040 GitHub stars
05Advanced SamplingParams control (temperature, top_p, top_k)
06GPU memory monitoring and automated cleanup utilities