At Sensu Summit 2018, Box.com Sr. Infrastructure SRE Trent Baker told the story of how they migrated over 350,000 Nagios checks to Sensu. In this post, I’ll recap that talk, sharing some info about the infrastructure at Box.com, how they migrated a legacy monitoring system, and what’s next.
By now you’ve learned about reducing the sheer amount of alerts you’re getting as well as automated triage and remediation. In this post, I’ll go into some extra steps you can take to further fine tune Sensu and cut down on alert fatigue.
As a monitoring company, it’s only natural that we’d always seek more data to inform our product decisions. With that in mind, we created Tessen, a hosted Sensu call-home service. Tessen is opt-in for the current version of Sensu, but will be opt out in Sensu Go (ICYMI, here’s our product roadmap, including the GA release date). Here’s what you need to know, including what we’re collecting and how that data benefits the Sensu Community.
I’m excited to share the official roadmap and and Beta releases leading up to the General Availability (GA) release date for Sensu Go, the latest and greatest version of Sensu. Here are some key upcoming dates.
This year at Sensu Summit, Fletcher Nichol and I gave a talk on systems architecture entitled Pull, don’t push: Architectures for monitoring and configuration in a microservices era. In this post, I’d like to reiterate and expand on some of the concepts in that presentation and make some more concrete recommendations for systems design in an era of complex distributed systems.
So far, we’ve covered alert reduction with Sensu filters and token substitution; automating triage; and remediation with check hooks and handlers (links above). In this post, I’ll cover alert consolidation via round robin subscriptions and JIT/proxy clients; aggregates; and check dependencies. These are all designed to help you cut through the “white noise” and focus on what’s important (especially in the middle of a major incident).