Who should define Reliability - Engineering, or Product?
Whoever owns Reliability should define its parameters. But who owns the Reliability of a Product? Engineering? Product Management? Or the Customer success team?
Whoever owns Reliability should define its parameters. But who owns the Reliability of a Product? Engineering? Product Management? Or the Customer success team?
From Robocars to Reliability — SRE with self-driving cars; mapping out where the Observability space is in conjunction with self-driving cars.
The Reliability industry needs a managed, non-vendor lock-in answer to spiraling costs, high cardinality and the toil of managing a tsdb.
JMX metrics give solid insights into the workings of your application. Integrating them with Levitate (our time series data warehosue) required us to jump some hoops with vmagent.
High cardinality in time series data is challenging to manage. But it is necessary to unlock meaningful answers. Learn how streaming aggregations can rein in high cardinality using Levitate.
This article covers questions such as what are MTTF, MTBF, MTTD, and MTTR, their differences, how to adopt them, and their use cases.
The current tech winter has a number of glaring stories — cyclical as they may be, there’s one truth that’s been gleaned over more than the rest; the money spent on internal software tools to support tech infrastructure is bloated. And there’s nothing cyclical about this infrastructure spending.
A Recap of SRECon 2023 Americas by guest author Sebastian Vietz.
Everything you need to know about Mean Time Between Incidents (MTBI) and how it can help Site Reliability Engineers.
What's the difference between SLAs vs SLOs vs SLIs. Understanding these little nuances are critical for DevOps folks. Here's a simple reckoner on what each of these mean.