01Vision-language model with dynamic resolution mechanism
022,495 GitHub stars
03Hybrid attention architecture (Lightning Attention, Softmax Attention, MoE)
04Training context length of 1 million tokens, inference up to 4 million tokens
05Large language model with 456 billion parameters
06ViT-MLP-LLM framework for multimodal capabilities