Getting At The Good Stuff: How To Sample Traces in Honeycomb

(This is the first post by our new head of Customer Success, Irving.) Sampling is a must for applications at scale; it’s a technique for reducing the burden on your infrastructure and telemetry systems by only keeping data on a statistical sample of requests rather than 100% of requests. Large systems may produce large volumes of similar requests which can be de-duplicated.


Observability Trends in 2020 and Beyond: Announcing the DevOps Pulse 2019 Results

2020 is here and it looks like it’ll be a truly exciting and impactful year for the DevOps community. As you know, the landscape is changing rapidly, and as a result, new technologies and methodologies are emerging to solve challenges you’re experiencing on the job. Observability is one such concept–and achieving it is a huge challenge for software engineers across the globe.


Instrumenting Lambda with Traces: A Complete Example in Python

We’re big fans of AWS Lambda at Honeycomb. As you may have read, we recently made some major improvements to our storage engine by leveraging Lambda to process more data in less time. Making a change to a complex system like our storage engine is daunting, but can be made less so with good instrumentation and tracing. For this project, that meant getting instrumentation out of Lambda and into Honeycomb.


OTel Me More: OpenTelemetry Project News - Vol 10

With the beta release expected later this year, 2020 has a lot in store for the OpenTelemetry project! OpenTelemetry is the reference implementation for the W3C Trace Context specification, which has just moved on to the next maturity level. With the goal of standardizing distributed tracing context propagation between services, this is a great step towards minimizing adoption hurdles for vendors, platforms, and languages considering OpenTelemetry.


Honeycomb SLO Now Generally Available: Success, Defined.

Honeycomb now offers SLOs, aka Service Level Objectives. This is the second in a set of of essays on creating SLOs from first principles. Previously, in this series, we created a derived column to show how a back-end service was doing. That column categorized every incoming event as passing, failing, or irrelevant. We then counted up the column over time to see how many events passed and failed. But we had a problem: we were doing far too much math ourselves.


What we've learned about observability in 2019

2019 was a banner year for observability. To increase the security and quality of their products, both SecOps and DevOps sought out observability of their systems in growing numbers. But gaining observability is like trying to hit a moving target. Even since the beginning of the year, there have been changes to how we understand it.


Observability and Alerts come together to enable real-time Incident Response and DevOps monitoring

What happens if you don’t get immediately alerted to issues that may impact service levels? Do you want your customers telling you there is something wrong before you even know that there’s a problem? Not having the instant visibility and alerting needed to quickly respond to a possible disruption can cost your company a lot of money, and can even tarnish your reputation.