Operations | Monitoring | ITSM | DevOps | Cloud

When In Doubt, Add More Spans: A Tale of Tracing and Testing In Production

Recently, Toshok was telling a story about the kind of thing he talks about a lot—improving the performance of some endpoint or page or other. Obviously, we spend a lot of time thinking about how to improve the experience of our users, but what caught my attention this time was that what he was describing sounded like a new kind of testing in production—so I asked him to go into a bit more detail.

Incident Review: Caches are Good, Except When They Are Bad

Between Wednesday, April 17th and Friday, April 26th, Honeycomb had four separate periods of downtime affecting the Honeycomb API, resulting in approximately 38 minutes of total downtime. At Honeycomb, we believe that visibility into production services is important, especially when service outages are making your users unhappy. We take the impact of outages on our customers seriously, and believe that transparency is key to you trusting in and using our service.

Metrics vs Events: A Conversation About Controlling Volume

If I’m used to metrics, how should I think about events in Honeycomb? This question cuts to the heart of how Honeycomb is different from other vendors in the APM and metrics space who claim to provide tools that help teams achieve observability, and we hear variations on it fairly often.

A New Bee's First Oncall

I’m Honeycomb’s newest engineer, now on my eighth week at Honeycomb. Excitingly, I did my first week of oncall two weeks ago! Almost every engineer at Honeycomb participates in oncall, and I chose to join in the tradition. This may seem unconventional for a Developer Advocate — surely my time might be better spent holding more meetings with customers and giving more talks? Yet, I found that being oncall was the right decision for me.

Illuminating the under-loved with Honeycomb

Most modern web apps end up sprouting some subset of tasks that happen in the “background”, i.e., when a user is not directly waiting on the request from a server to finish. These types of tasks range across all kinds of use cases – processing media, generating aggregate statistics for later view in the front end, and syncing data to 3rd party providers are just a few of many examples.

Observations from SRECon Americas, Brooklyn, NY - March 2019

“We can know one thing: shit’s gonna fail.” Last week we participated in SRECon Americas, held across three days of intensive learning with practitioner talks, hands-on workshops, socializing, and of course vendor booths. Now in its third year, attendees numbered 650 with an additional ~300 from sponsors and organizers.

Budget Planning for Next-generation APM and Observability

If you’re trying to evaluate and understand the ROI of building an observability practice and carve out a budget for it, you’re not alone. You’ve probably got some monitoring and metrics capability already, but that’s proving to not be enough–how can you empower your teams as your environment becomes too complex for the basics? And how much will that cost?

Support Your Customers More Effectively with Honeycomb

Customer success can be a serious differentiator and competitive advantage for companies today. Everyone wants to ship quality products to their customers faster, and the rise of subscription-based pricing and SaaS applications in the last decade means that ensuring customer success is a more critical part of the business than ever.

BubbleUp Meets Tracing (and Other Odd-shaped Data)

A few weeks ago, BubbleUp came out of Beta. We’ve been getting fantastic user feedback on how BubbleUp helps users speed through the Core Analysis Loop and lets people find things they never could have found before. We’ve also been learning more about how BubbleUp works with Tracing, which unearthed some difficult issues. Today, we’re taking those head on.

How To Learn Systems Debugging by People-watching

When I first joined this startup that makes an observability platform, I was a front-end Javascript developer who had never ssh’ed into production–I didn’t even know what tracing or monitoring or metrics were, let alone what it meant for logs to be structured or how they could be useful to me. But within a couple months I joined the on-call rotation, and now share responsibility in our services along with the rest of my team.