01Native PDF processing for table extraction and structured data output from documents up to 1,000 pages.
02Visual understanding including OCR, object detection, and pixel-level segmentation via Gemini 2.5.
03Advanced image generation and iterative editing using controllable styles and aspect ratios.
04Video analysis for scene detection and temporal Q&A with support for local files and YouTube URLs.
051 GitHub stars
06Comprehensive audio transcription and analysis with timestamp support for files up to 9.5 hours.