Part 4: Causal Observability - Level 3
It’s not surprising that most failures are caused by a change somewhere in a system, such as a new code deployment, configuration change, auto-scaling activity or auto-healing event. As you investigate the root cause of an incident, the best place to start is to find what changed. To understand what change caused a problem and what effects propagated across your stack, you need to be able to see how the relationships between stack components have changed over time.