How does this skill help with Spark OOM errors?

It provides memory tuning patterns, including executor configuration and memory fraction adjustments, to prevent memory pressure and spill-to-disk issues during execution.

Can this skill assist with data skew?

Yes, it includes specific patterns for manual salting joins and enabling Adaptive Query Execution (AQE) settings to handle uneven data distribution across partitions.

Does it support different storage formats?

It covers optimization for high-performance formats like Parquet and Delta Lake, focusing on predicate pushdown, column pruning, and Z-ordering.

Is this specific to PySpark?

While the code examples are provided in PySpark, the core architectural concepts of partitioning, shuffle optimization, and memory management apply to all Spark-supported languages.

Spark Optimization

Name: Spark Optimization
Author: as4584

byas4584

0•

Data Science & ML

Optimizes Apache Spark performance through advanced partitioning, memory tuning, and shuffle management strategies.

This skill provides comprehensive guidance for diagnosing and resolving performance bottlenecks in Apache Spark applications. It offers production-ready patterns for efficient memory management, join optimizations (including broadcast and salt joins), data skew mitigation, and storage format tuning. Whether you are dealing with OutOfMemory (OOM) errors, slow shuffles, or scaling data pipelines for massive datasets, this skill equips Claude with the technical patterns needed to build robust, high-performance distributed data processing jobs.

Key Features

010 GitHub stars

02Shuffle reduction and data format optimization

03Caching and persistence management

04Partitioning strategies for balanced parallelism

05Advanced join optimization and skew handling

06Memory and executor configuration tuning

Use Cases

01Debugging OutOfMemory (OOM) errors in large-scale Spark jobs

02Implementing bucketed joins to eliminate expensive shuffles

03Improving the execution time of slow data processing pipelines

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add as4584/antigravity-skills spark-optimization

For use in Claude.ai and ChatGPT

Download Skill