The latest News and Information on Observabilty for complex systems and related technologies.
TL;DR: Use auto-instrumentation from OpenTelemetry. Traces will happen. Then your code can use global library functions to customize those traces with your specific important data.
Every few years, the tech world either rebrands an old term or tries to find a way to use old technology to create new advancements. This rabbit hole is easy to fall into with observability, yet it is distinct from some of its predecessors.
When we talk about observability, we tend to focus first and foremost on the metrics, logs, and traces that you can collect from applications – such as request rates, error rates, and request duration. Infrastructure-level metrics, like CPU and memory utilization, might factor into the discussion as well. Here’s a third category of critical observability insights that teams tend to overlook: the network.
Like any great technology, the interest in and adoption of Kubernetes (an excellent way to orchestrate your workloads, by the way) took off as cloud native and containerization grew in popularity. With that came a lot of confusion. Everyone was using Kubernetes to move their workloads, but as they went through their journey to deployment, they weren’t thinking about security until they got to production.
In the past, we’ve written about what instrumentation is and the insights it provides. Instrumenting your code generates telemetry that shows you how your system is performing, and whether your system is healthy. Like with most other companies, at Honeycomb we don’t write all of the code that runs in our systems.
The term Site Reliability Engineer (SRE) first appeared in Google in the early 2000s. In Google’s 2016 SRE Book, Benjamin Treynor Sloss wrote that, generally speaking, “an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).” This means that the SRE teams at Google decide how a system should run in production as well as how to make it run that way.
A developer's viewpoint is distinct. It can be difficult to keep track of operations and detect the fault that is causing the software to malfunction when handling numerous sectors. What if you could detect the issue ahead of time and fix it as soon as possible? The tactics that we concentrate on and put into action are those that assist us in properly managing our tasks. Knowing about observability makes this possible. Let's take a closer look at it in this blog.
Oh goody, I’m so tickled to get this one. *rubs hands gleefully* Funny story, back in 2016–2017 we thought we were building Honeycomb primarily for DB use cases. The use cases are that killer. I’ve never seen another tool do the kinds of things you can do on the fly with Honeycomb and databases.