01Automated multi-modal dataset preparation for image-text tasks
02FastVisionModel optimization for 2x faster training speeds
03Support for leading VLMs including Pixtral, Ministral, and Llama 3.2
04Integrated UnslothVisionDataCollator for efficient batching
05Vision-specific LoRA configuration for encoder and language layers
060 GitHub stars