SLAs, SLOs, SLIs, and KPIs
The incident is over. The service is back up. The monitoring dashboard is green, the on-call engineer has stood down, and the post-incident review is on the calendar for Thursday. But there is a question that separates good operations teams from great ones: do you actually know what that incident cost you in terms of reliability commitments? Whether you breached an SLO. Whether a customer-facing SLA is now at risk.