Cleaning Data provides a structured framework for addressing data quality issues before analysis begins, following the principle that cleaning is always mandatory for high-quality insights. It guides the user through five distinct phases: reviewing quality reports, delegating deep-dive detection tasks to sub-agents, designing a documented cleaning strategy, executing transformations, and verifying results. By offering specific decision frameworks for handling complex issues like winsorization of outliers, fuzzy matching for near-duplicates, and referential integrity validation, this skill ensures that datasets are robust, consistent, and ready for high-stakes analytical workflows.
Key Features
01Structured 5-phase data remediation workflow
02Semantic categorization for free-text data columns
030 GitHub stars
04Sub-agent delegation for specialized duplicate and outlier detection
05Framework-based prioritization of data quality issues
06Referential integrity validation for multi-table relationships