Automating Collection of Troubleshooting Data with Triggers: a How-To Guide

Everyone wants to be more efficient — to spend less time on the tedious things, and more time on the things that move the needle. As much as possible, if you can automate those tedious things, you should. With Honeycomb, we enable you to understand how your application behaves in production through the ability to iteratively ask questions of the system instrumentation data, no matter how granular. Honeycomb triggers enable you to be notified when specific things happen in your system.

Observability Through The Development Lifecycle

In this interview with Honeycomb Software Engineer, Ben Hartshorne, we get a to see and hear valuable insights on why observability, distributed tracing and Honeycomb help engineers gain great understanding on how software behaves in all stages of development. Ben will tell you how he builds software, instruments his code and uses Honeycomb to constantly update the “mental model” of how software really works.

Dynamic Sampling by Example

Last week, Rachel published a guide describing the advantages of dynamic sampling. In it, we discussed varying sample rates to achieve a target collection rate overall, and having different sample rates for distinct kinds of keys. We also teased the idea of combining the two techniques to preserve the most important events and traces for debugging without drowning them out in a sea of noise.


Stop Your Database From Hating You With This One Weird Trick

Let’s not bury the lede here: we use Observability-Driven Development at Honeycomb to identify and prevent DB load issues. Like every online service, we experience this familiar cycle. This is not a bad thing! It’s a normal thing. Databases are easy to start with and do an excellent job of holding important data. There are many variations of this story and many ways to manage and deal with the growth, from caching to read replicas to sharding to separation of different types of data and so on.


The New Rules of Sampling

One of the most common questions we get at Honeycomb is about how to control costs while still achieving the level of observability needed to debug, troubleshoot, and understand what is happening in production. Historically, the answer from most vendors has been to aggregate your data–to offer you calculated medians, means, and averages rather than the deep context you gain from having access to the actual events coming from your production environment.


When In Doubt, Add More Spans: A Tale of Tracing and Testing In Production

Recently, Toshok was telling a story about the kind of thing he talks about a lot—improving the performance of some endpoint or page or other. Obviously, we spend a lot of time thinking about how to improve the experience of our users, but what caught my attention this time was that what he was describing sounded like a new kind of testing in production—so I asked him to go into a bit more detail.


Incident Review: Caches are Good, Except When They Are Bad

Between Wednesday, April 17th and Friday, April 26th, Honeycomb had four separate periods of downtime affecting the Honeycomb API, resulting in approximately 38 minutes of total downtime. At Honeycomb, we believe that visibility into production services is important, especially when service outages are making your users unhappy. We take the impact of outages on our customers seriously, and believe that transparency is key to you trusting in and using our service.


A New Bee’s First Oncall

I’m Honeycomb’s newest engineer, now on my eighth week at Honeycomb. Excitingly, I did my first week of oncall two weeks ago! Almost every engineer at Honeycomb participates in oncall, and I chose to join in the tradition. This may seem unconventional for a Developer Advocate — surely my time might be better spent holding more meetings with customers and giving more talks? Yet, I found that being oncall was the right decision for me.