Operations | Monitoring | ITSM | DevOps | Cloud

Honeycomb

HoneyByte: Using Application Metrics With Prometheus Clients

Have you ever deep dived into the sea of your tracing data, but wanted additional context around your underlying system? For instance, it may be easy to see when/where certain users are experiencing latency, but what if you needed to know what garbage collection is mucking up the place or which allocated memory is taking a beating? Imagine having a complete visual on how an application is performing when you need it, without having to manually dig through logs and multiple UI screens.

Tracing makes a bug easy to spot

Today, I found a bug before I noticed it. Like, it was subtle, and so I wasn’t quite sure I saw it—maybe I hadn’t hit refresh yet? Later, I looked at the trace of my function and, boom, there was a clear bug. Here’s the function with the bug. It responds to a request to /win by saving a record of the win and returning the total of my winnings so far. Can you spot the problem in the TypeScript? It’s subtle. Now here’s a trace in Honeycomb: Now do you see the bug?

OpenTelemetry Browser Instrumentation

One of the most common questions we get at Honeycomb is “What insights can you get in the browser?” Browser-based code has become orders of magnitude more complex than it used to be. There are many different patterns, and, with the rise of Single Page App frameworks, a lot of the code that is traditionally done in a backend or middle layer is now being pushed up to the browser. Instead, the questions should be: What insights do frontend engineers want?

Ask Miss O11y: Mapping Out Your Observability Journey

Dear Trapped, Thanks for asking the question! Approaching observability as an all-or-nothing problem often leads to the project feeling daunting. But that’s not specific to observability—any project can be overwhelming if you think it needs to be done all at once, perfectly. Such as, erm, writing an entire book on observability! *looks around worriedly*

Ask Miss O11y: I Don't Want to Be On Call Anymore. Am I a Monster?

First, I’d like to say that pager duty isn’t something we should treat like chronic pain or diabetes, where you just constantly manage symptoms and tend to flare-ups day and night. Being paged out of hours is as serious as a fucking heart attack. It should be RARE and taken SERIOUSLY. Resources should be mustered, product cycles should be reassigned, until the problem is fixed.

Incident Resolution: Do You Remember, the Twenty Fires of September?

From September to early October, Honeycomb declared five public incidents. Internally, the whole month was part of a broader operational burden, where over 20 different issues interrupted normal work. A fraction of them had noticeable public impact, but most of the operational work was invisible. Because we’re all about helping everyone learn from our experiences, we decided to share the behind-the-scenes look of what happened.

Game Launches Should Be Exciting for Your Players, Not for Your LiveOps Team

The moment of launching something new at a game studio (titles, experiences, features, subscriptions) is a blockbuster moment that hangs in the balance. The architecture—distributed and complex, designed by a multitude of teams, to be played across a variety of devices in every corner of the world—is about to meet a frenzy of audience anticipation, along with the sky-high expectations of players, executives, and investors.