Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Incident Management Tools - Do I Even Need Them?

Software is hard… Maintaining software reliability is harder than it used to be. Software systems have grown dramatically in complexity, as they’re applied in a wider range of applications and environments. Many of which have become fundamental to the everyday function of our society. On the other hand, the pace of software development and release is also faster than ever. Innovating new features faster than competitors has become the key to success in a rapidly-changing market.

Failure Analysis: Engineering incidents are a bigger problem than you think

Engineering incidents can be quite harmful for companies, both in terms of financial costs and reputational damage. In some cases, engineering incidents can even put people's lives at risk, which can have serious legal and moral implications for the company involved.

SRE Maturity Model: How Do You Assess Your Team?

How do you evaluate your SRE team’s progress in implementing SRE? We discuss the key SRE indicators for evaluating your team’s progress in the SRE maturity model. ‍ What is the SRE maturity model? ‍ The SRE maturity model is a way of judging how far you are in implementing SRE principles. It is a method used by teams to understand where they ought to implement more SRE best practices to reach greater SRE maturity.

Tag You're It: Organized, Configurable Tagging is a Must-do for Great Incident Analytics.

Wouldn’t it be nice to learn which parts of your service see the most incidents, or why one service experiences more Sev1 incidents than the others? It’s not always easy to see the full disruptive impact of an engineering incident. Even harder to see trends across incidents and over time. Developing incident insights that you can use to help guide and shape the way your team designs and operates your product takes time, careful consideration, team engagement and the right tooling.

Blameless culture drives incident learning and other key insights from Catchpoint's 2022 SRE Report

SRE is a constantly evolving field, responding to the challenges of increasing reliance on tech and the opportunities of its evolving abilities. Reliability has to remain a step ahead of the cutting edge, whether it’s navigating remote work, implementing AI assistance, or optimizing internal processes. But how do we know that SRE is keeping up? ‍ We’re proud and excited to announce the results of the SRE Survey we ran in partnership with Catchpoint.

For incident management, should you build or buy?

Is your incident response held together by a thread? Are you manually recording incident updates in a shared doc? Do you struggle to juggle the incident management workload with your other responsibilities? Does everyone on-call report data the same way? These are all common problems faced by DevOps teams still relying on homegrown incident management tooling.