About
This skill provides comprehensive guidance for managing software incidents, from initial detection and severity classification to mitigation and post-mortem analysis. It implements industry-standard SRE (Site Reliability Engineering) practices, including blameless culture, clear incident command structures, and structured on-call rotations. Whether you are setting up a new response process, drafting executable runbooks, or facilitating a post-mortem review, this skill helps engineering teams improve system reliability, reduce Mean Time to Recovery (MTTR), and build a more resilient operational culture.