Observability

splunk

Does Observability Throw You for a Loop? Part Two: Close with Controllability

In part one, we introduced the duality of observability, controllability. As a reminder, observability is the ability to infer the internal state of a "machine” from externally exposed signals. Controllability is the ability to control input to direct the internal state to the desired outcome. So observability is a loop problem. And we need to stop treating it as the end state of our challenge in delivering performant, quality experiences to our users and customers.

lightstep

The Case for Human-Centric Observability

Modern systems have resulted in an explosion of complexity for organizations of every size, shape, and purpose. In an effort to create more resilient and reliable software, we’ve cast around for solutions to tame and manage this complexity. Observability practices have come to be seen as essential for operating systems at scale, but in practice, they’re often seen as technical solutions to what is ultimately a social problem: Software, at the end of the day, is built and run by people.

Why Human-Centric Observability Matters for Microservices

HCO (human-centric observability) is a methodology built around people, not machines. In this session, well demo how HCO serves 3 key groups – external users, internal users, and engineers – and show you how to center them in your observability practice. In the end, you’ll see how HCO lets you not just build resilient and reliable software, but assemble a truly resilient organization that can adapt to changing requirements and needs.
honeycomb

Challenges with Implementing SLOs

A few months ago, Honeycomb released our SLO — Service Level Objective — feature to the world. We’ve written before about how to use it and some of the use scenarios. Today, I’d like to say a little more about how the feature has evolved, and what we did in the process of creating it. (Some of these notes are based on my talk, “Pitfalls in Measuring SLOs;” you can find the slides to that talk here, or view the video on our Honeycomb Talks page).

lightstep

Why Observability Needs to Stay Weird

It’s a strange truth about technology — most of our problems with it wind up being, well, weird. This seems strange when you step back to think about it. There’s a bit of a meme that’s been going around for several years now, that a computer is just a rock that we tricked into thinking — but it’s true! Computers, and the systems we build with them, all should be very ordered and logical, because the underlying thing that backs them is extremely straightforward math.

lightstep

Want to Reduce Service Cost and Resource Waste? Start Squeeze Testing

For growing businesses, it’s normal to size deployments of services based on the intuitions of the development team involved. Almost all the time sizing includes some safety margin so that sudden spikes in demand don’t take down the service or wake up whoever is on call. Unfortunately, often these intuitions, even if correct at the beginning, don’t stay up to date with changes in the service or its dependencies leading to both outages and unhelpful paging.

splunk

Does Observability Throw You for a Loop? Part One: Open with Observability

The duality of observability is controllability. Observability is the ability to infer the internal state of a "machine” from externally exposed signals. Controllability is the ability to control input to direct the internal state to the desired outcome. We need both in today's cloud native world. Quite often we find that observability is presented as the desired end state. Yet, in modern computing environments, this isn’t really true.

honeycomb

OpenTelemetry: New Honeycomb Exporters

We’re really big fans of OpenTelemetry at Honeycomb. As we’ve blogged about before, OpenTelemetry is the next phase of the OpenTracing and OpenCensus projects. Instead of working on separate but similar efforts, those two projects have merged to create OpenTelemetry. This is wonderful for the larger community as it gives people a clear way to instrument their code for metrics and traces that isn’t specific to any tool or vendor. OpenTelemetry is a CNCF sandbox project.

lightstep

What is Observability?

There are some excellent resources on observability written by experts, and someday, I will read (and understand) them. What I needed instead was a foundational knowledge that I could grow over time and add nuance to when the time was right. This is how I learned to code, play the trumpet, do car repairs and pretty much everything else I know how to do. This article will be a foundation you can build on to understand observability (often abbreviated as o11y).