I have been noticing that a lot of folks are often confused between event logging and tracing. In terms of building out a generic SD for devs to report on observability data, should Event APIs be distinct from Trace APIs? Is an Event just a single Trace Span ? If you look at Honeycomb’s implementation, an Event seems to be equivalent to a single span trace. The middleware wrapper creates a Honeycomb event in the request context as a span in the overall trace.
Incident response playbooks are a set of actions that need to be executed by your incident repsonders depending on the nature of the outage. Having well defined incident response playbooks can be extremely critical, especially during high customer impact events, that you would typically classify as Sev-0 incidents.
Providing customers with a world-class and seamless user experience is critical for the success of any business. It is therefore important that you have a robust on-call strategy that optimizes the availability of the right subject matter experts, on-call engineers, and support engineers to resolve critical, user-impacting incidents as soon as possible.
I often hear folks in my network being triggered by interactions with product managers within their companies whenever they follow up on certain product-related issues. The triggering phrase invariably is “It’s a known issue”. And they often wonder, well if it’s a known issue, why on earth isn’t anything done about it?