01Fast structured generation with JSON schema, regex, and grammar constraints
02High-performance serving with up to 5x faster inference for agentic workloads
033,983 GitHub stars
04RadixAttention for automatic prefix caching and KV cache reuse
05Native support for multi-turn conversations and function calling
06Compatible with 100+ text and vision models via an OpenAI-compatible API