About
This skill provides a comprehensive framework for processing multimodal data within Claude. It enables deep analysis of various media types including image recognition, OCR, object detection, video summarization, audio transcription, and complex multi-page PDF document extraction. By offering granular control over media resolution and token optimization, it allows developers to balance quality and cost effectively while building AI-driven applications that need to interpret visual, auditory, and document-based information.