Incident Management Guide FAQs

Question 1

Can this skill help with post-mortem documentation?

Accepted Answer

Yes, it includes guidelines for 5 Whys analysis, timeline reconstruction, and templates for distributing findings to stakeholders to ensure organizational learning.

Question 2

What incident management philosophy does this skill follow?

Accepted Answer

It follows Site Reliability Engineering (SRE) principles, emphasizing early declaration, prioritizing mitigation over root cause analysis during the outage, and maintaining a strictly blameless culture.

Question 3

How does this skill handle severity levels?

Accepted Answer

It provides a framework for four levels: SEV0 (Critical Outage), SEV1 (Major Degradation), SEV2 (Minor Issues), and SEV3 (Low Impact), each with specific response time targets and notification requirements.

Question 4

Does it provide advice on on-call rotations?

Accepted Answer

Yes, it covers rotation patterns like Primary/Secondary, Follow-the-Sun, and Tiered Escalation, along with best practices for preventing alert fatigue.

Question 5

What is the role of an Incident Commander (IC) in this system?

Accepted Answer

The IC owns overall coordination and strategic decisions; they delegate technical tasks to SMEs but do not perform hands-on debugging themselves to maintain a high-level view.

Incident Management Guide

Key Features

Use Cases

Incident Management Guide

Key Features

Use Cases