The Anatomy of Three Incidents | Randy Shoup on 99 Percent Visible
The best response to a system outage is not "What did you do?", but "What did we learn?" This session walks through three system-wide outages at Google, at Stitch Fix, and at WeWork. In all cases, many things went right and a few went wrong, and after the blameless postmortems, we ended up learning a lot and making substantial improvements in our systems.
Lightstep’s observability platform is the easiest way for developers and SREs to monitor health and respond to changes in cloud-native applications. Powered by cutting-edge distributed tracing and a groundbreaking metrics database, and built by the team that launched observability at Google, Lightstep’s Change Intelligence provides actionable insights to help teams answer the question “What caused that change?”
Check out our website to learn more:
Sign up for our free Community plan: