About
This skill empowers SREs and DevOps teams to streamline system observability by automating the design and configuration of comprehensive alerting rules. It guides users through the process of defining intelligent thresholds based on historical data, setting up multi-category alerts—including latency and error rates—and establishing robust routing and escalation policies. By generating associated runbooks and testing procedures, the skill helps teams minimize alert fatigue, improve response times, and maintain high service-level objectives (SLOs) across their infrastructure.