Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Negotiating Priorities Around Incident Investigations

There are countless challenges around incident investigations and reports. Aside from sensitive situations revolving around blame and corrections, tricky problems come up when having discussions with multiple stakeholders. The problems I’ll explore in this blog—from the SRE perspective—are about time pressures (when to ship the investigation) and the type of report people expect.

Much Ado About OpenTelemetry

There is so much good work that OpenTelemetry has done in the software industry, specifically around the domain of observability, in the last five years. Bringing users and vendors together to define the future of telemetry? Check! Unify logs, traces, and metrics under a completely vendor-neutral API? Check! Deprecate other standards by bringing their collaborators to the table to ensure their use cases are met? CHECK!

APM From a Developer's Perspective

In twenty years of software development, I did not have the privilege of being on call, of tending to my software in production. I’ve never understood what “APM” means. Anybody can tell me what it stands for—Application Performance Monitoring (or sometimes, the M means Management)—but what does it mean? What do people use APM for?

Flight to Success: Birdie's DevOps Evolution Fueled by Observability Insights

Birdie wanted to uplevel observability to a platform that would provide meaningful insights for application performance and debugging. Ensuring customers can provide seamless and timely care to in-home patients stands as a top priority for Birdie, and the development team takes pride in building and maintaining a high-quality platform distinguished by its reliability and responsiveness.

Three Properties of Data to Make LLMs Awesome

This post first appeared on Phillip's personal blog. Back in May 2023, I helped launch my first bona fide feature that uses LLMs in production. It was difficult in lots of different ways, but one thing I didn’t elaborate on in several blog posts was how lucky I was to have a coherent way to get the data I needed to make the feature useful for users.

What Is Application Performance Monitoring?

Every business is a software business. And by software, we don’t mean code—we mean running software serving customers in production. Those customers may be internal to the company, they may pay you money, or they may represent attention that increases ad revenue—either way, making them happy is your business. And your fast, reliable software makes them happy. Application performance monitoring, also known as APM, represents the difference between code and running software.

Safer Client-Side Instrumentation with Honeycomb's Ingest-Only API Keys

We're delighted to introduce our new Ingest API Keys, a significant step toward enabling all Honeycomb customers to manage their observability complexity simply, efficiently, and securely. Ingest Keys are currently available for Environment & Services customers, with Classic support and programmatic key management capabilities under development and coming soon!

Data Sovereignty and OpenTelemetry

In today’s economic and regulatory environment, data sovereignty is increasingly top of mind for observability teams. The rules and regulations surrounding telemetry data can often be challenging to interpret, leaving many teams in the dark about what kind of data they can capture, how long it can be stored, and where it has to reside. In the past, addressing these issues at scale was a costly endeavor.

Where Does Honeycomb Fit in the Software Development Lifecycle?

“Mommy, where does software come from?” “Software grows in a circle, just like this!” The software development lifecycle (SDLC) is always drawn as a circle. In many places I’ve worked, there’s no discernable connection between “5. Operate” and “1. Plan.” However, at Honeycomb, there is. More on that later.

Avoid Stubbing Your Toe on Telemetry Changes

When you have questions about your software, telemetry data is there for you. Over time, you make friends with your data, learning what queries take you right to the error you want to see, and what graphs reassure you that your software is serving users well. You build up alerts based on those errors. You set business goals as SLOs around those graphs.