As the post-pandemic world finds its footing again, a resilient spirit drives the revival, propelling businesses to embrace a new era of technological innovation. Notably, IT teams are swiftly adopting the digital transformation of their processes, particularly in incident response. From virtual collaboration tools and remote IT support to automated incident management, teams have found innovative ways to ensure seamless business continuity while delivering IT services with minimum downtimes.
One of the most talked about topics in observability today is centered around the question of how to get more value out of the ever-increasing amount of data collected by agents, collectors, scrapers, and the like. Back in May, we announced Adaptive Metrics, a new feature in Grafana Cloud that allows you to reduce the cardinality of Prometheus metrics and the overall volume and costs of your metrics.
Any organization that’s keeping up with today’s sharp rise in business demands (or better yet, getting ahead of the game) is doing so by getting innovative and jumping at the chance to do things differently. They’re not relying on the old ways or trying to use their existing toolbox. Instead, organizations are looking to the newest technologies and means of adding efficiency to as many day-to-day functions as possible.
Ensuring software availability is essential for any SaaS company—including Gremlin. To do that, our teams need to identify the reliability risks hiding in our systems. That’s why our development, platform, and SRE teams use Gremlin regularly to perform Chaos Engineering experiments, run reliability tests, and track the reliability of our systems against our standards. Along the way they’ve picked up a thing or two about how to find and fix reliability risks with Gremlin.