Builds custom OpenLineage extractors to capture data lineage and metadata from Airflow operators.
This skill provides structured guidance and code patterns for implementing data lineage in Apache Airflow using the OpenLineage standard. It helps developers create custom extractors for third-party operators or integrate lineage methods directly into proprietary operators, ensuring comprehensive visibility into data movement, column-level dependencies, and job metadata across the data stack. It specifically addresses common pitfalls like circular imports and registration issues while providing ready-to-use templates for SQL and file-based tasks.
Key Features
01Configuration and registration templates
02OpenLineage method implementation guidance
03Custom Airflow extractor generation
04Column-level lineage mapping patterns
05SQL parsing and metadata extraction
06155 GitHub stars
Use Cases
01Capturing lineage from unsupported or third-party Airflow operators
02Implementing data observability in custom proprietary data pipelines
03Tracking dynamic runtime datasets created during task execution