Operations | Monitoring | ITSM | DevOps | Cloud

Honeycomb

One-Click Insights with Board Templates

Whether you’re a new Honeycomb user or a seasoned expert looking to uncover fresh insights, chances are you’ve sent tremendous amounts of data into Honeycomb already. The question is, now what? We have the answer: Board templates. Teams can now create Boards based on pre-built templates that generate visualizations with a single click.

OpenTelemetry Gotchas: Phantom Spans

This guest post is written by Ian Duncan, Staff Engineer - Stability Team at Mercury. To view the original post, go to Ian's website. At work, we use OpenTelemetry extensively to trace execution of our Haskell codebase. We struggled for several months with a mysterious tracing issue in our production environment wherein unrelated web requests were being linked together in the same trace, but we could never see the root trace span.

Streamlining Incident Investigation

Honeycomb Customer Success Manager Josh Levin explains how to troubleshoot production incidents using Honeycomb's telemetry data: metrics, traces, and logs. While these data forms have separate interfaces, you can investigate seamlessly within Honeycomb. Josh highlights the key role of the "retriever" service in data ingestion and querying and demonstrates cross-validating tracing data with metrics to spot anomalies in pod deployments and resource usage, presented in a separate dataset. He also uses effective log filtering and searching for keywords like "update status.".

Deploying the OpenTelemetry Collector to Kubernetes with Helm

The OpenTelemetry Collector is a useful application to have in your stack. However, deploying it has always felt a little time consuming: working out how to host the config, building the deployments, etc. The good news is the OpenTelemetry team also produces Helm charts for the Collector, and I’ve started leveraging them. There are a few things to think about when using them though, so I thought I’d go through them here.

Incident Review: What Comes Up Must First Go Down

On July 25th, 2023, we experienced a total Honeycomb outage. It impacted all user-facing components from 1:40 p.m. UTC to 2:48 p.m. UTC, during which no data could be processed or accessed. This outage is the most severe we’ve had since we had paying customers. In this review, we will cover the incident itself, and then we’ll zoom back out for an analysis of multiple contributing elements, our response, and the aftermath.

Calculating Sampling's Impact on SLOs and More

What do mall food courts and Honeycomb have in common? We both love sampling! Not only do we recommend it to many of our customers, we do it ourselves. But once Refinery (our tail-based sampling proxy) is set up, what comes next? Since sampling is inherently lossy, it’s good to be sure the organization’s most important measurements aren’t negatively affected.

Honeycomb + Tracetest: Observability-Driven Development

Our friends at Tracetest recently released an integration with Honeycomb that allows you to build end-to-end and integration tests, powered by your existing distributed traces. You only need to point Tracetest to your existing trace data source—in this case, Honeycomb. This guest post from Adnan Rahić walks you through how the integration works.

Observability and the DORA metrics

The Accelerate State of Devops Report highlights four key metrics (known as the DORA metrics, for DevOps Research & Assessment) that distinguish high-performing software organizations: deployment frequency, lead time for changes, time-to-restore, and change fail rate. Observability can kickstart a virtuous cycle that improves all the DORA metrics.

Infinite Retention with OpenTelemetry and Honeycomb

The needs of observability workloads can sometimes be orthogonal to the needs of compliance workloads. Honeycomb is designed for software developers to quickly fix problems in production, where reducing 100% data completeness to 99.99% is acceptable to receive immediate answers. Compliance and audit workloads require 100% data completeness over much longer (or "infinite") time spans, and are content to give up query performance in return.

Reducing Mean Time to Diagnosis: How Salary Finance Uses Honeycomb to Ask the Right Questions

Salary Finance is a UK-based financial well-being employee benefit program. Over the last seven years, the company grew from a startup to a scaleup, earning rave reviews along the way from its more than 4,000 customers. However, with fast growth also comes natural growing pains. As their customer base expanded, so did the number of incidents they experienced, which also became harder to diagnose due to lack of visibility into their increasingly complex environment.