Observability with Context: Telemetry, Time, Tracing, and Topology
That’s the question ops personnel have been asking for decades whenever something goes wrong in the production IT environment. Everything was working before, so the reasoning goes, and now it’s not. We have an incident. And to figure out what caused the incident – and hence, to have any idea how to fix it – we must know what changed. There’s just one problem with this approach. What if everything is subject to change, all the time?