Operations | Monitoring | ITSM | DevOps | Cloud

Top 12 Site Reliability Engineering (SRE) Tools

Ben Treynor Sloss, then VP of Engineering at Google, coined the term “Site Reliability Engineering” in 2003. Site Reliability Engineering, or SRE, aims to build and run scalable and highly available systems. The philosophy behind Site Reliability Engineering is that developers should treat errors as opportunities to learn and improve. SRE teams constantly experiment and try new things to enhance their support systems.

DEJ's 2022 IT Performance Management Study: Key Takeaways

DEJ's 2022 IT performance management study shines a light on the 24 areas impacting IT teams today. The pain points giving IT teams sleepless nights are all here – the war for talent, managing complexity, data management and analytics at scale, for example. As you delve deeper, however, a pattern begins to emerge – it all comes down to business outcomes.

Datasets, Traces, and Spans-Oh My!

If you've stumbled (or purposefully landed) on this blog post, chances are you are new to—or diving deeper—into the observability space, o11y for short. Suffice it to say, you’re not in Kansas anymore. Honeycomb in a lot of ways can serve as a yellow brick road into o11y, and this article should serve as an introduction into how Honeycomb facilitates implementing o11y into applications and distributed services.

A Data Lake Is Not Enough to Keep Your Observability Ambitions Afloat

Recently I heard one of our prospects talk about a competitor who was promoting their data lake and ask, how are we different than that? His question got me thinking about why a data lake alone does not provide the depth of observability you really need. The goal of observability is to help SREs, IT Ops and DevOps teams run their IT systems with close-to-zero downtime. Consolidating data from across your environment into a data lake is certainly a good step.

MetricFire: A Great Instrumental Monitoring Alternative

Instrumental has made the decision to shut down its platform starting August 2022 including its application, servers, and all related APIs being shut down. Users will need to migrate to another solution or risk all their data being permanently deleted! But Instrumental users need not fret!

How we improved Grafana Mimir query performance by up to 10x

Earlier this year we introduced the world to Grafana Mimir, a highly scalable open source time series database for Prometheus. One of Mimir’s guarantees is 100% compatibility with PromQL, which it achieves by reusing the Prometheus PromQL engine. However, the execution of a query in the Prometheus PromQL engine is only performed in a single thread, so no matter how many CPU cores you throw at it, it will only ever use one core to run a single query.