01High-performance local inference with continuous batching and speculative decoding
02Advanced KV cache optimizations including paged, prefix, and disk caching with quantization
03OpenAI and Anthropic compatible API for LLMs, VLMs, embeddings, and audio models
04Local image generation and editing via Flux models with dedicated API endpoints
05Built-in support for LLM tool calling and reasoning/thinking modes
0610 GitHub stars