The Incident Response skill empowers Claude to act as a seasoned Site Reliability Engineer, guiding users through the critical lifecycle of a production outage. It provides a structured framework for detecting issues, investigating root causes, implementing mitigations, and managing recovery. Beyond immediate crisis management, it automates the creation of detailed incident timelines, Root Cause Analysis (RCA) documents, and runbook updates, ensuring teams learn from every event to build more resilient systems.
Key Features
01Automated incident timeline tracking and logging
02RCA and postmortem template generation
0315 GitHub stars
04Severity-based guidance for SEV1 and SEV2 incidents
05Runbook update automation and maintenance
06Standardized SRE incident handling workflows