01Model conversion and 4-bit/8-bit quantization workflows
0210 GitHub stars
03OpenAI-compatible local model serving and API integration
04Automated model downloading and management from Hugging Face Hub
05Native LLM inference and streaming optimized for Apple Silicon
06Efficient fine-tuning using LoRA and QLoRA adapters