Improving a Distributed System Post-Incident Julius Zerwick Failover Conf 2020
In this session, we will dive into a case study of how a team can recover & improve a distributed system after a major incident. Distributed systems are more prone to failure than other systems due to their incredible complexity and scale, and incidents are a fact of life with these systems.