Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Postmortem Pitfalls

Last week, we spent some time talking to Gergely Orosz about our thoughts on what happens when an incident is over, and you're looking back on how things went. If you haven't read it already, grab a coffee, get comfortable, and read Gergely's full post Postmortem Best Practices here. But before you do that, here's some bonus material on some of our points.

A developer's guide to programatically overcome fear of failure

People are more than happy to talk about their successes, but if you ask them about their failures, they can be much more hesitant to share. Failure is a subject that, interestingly enough, is entangled with the emotion of shame. Yet it’s integral to achieving anything novel, and the learnings that come from failure are unparalleled. So, let’s find ways to get more comfortable with failing, and figure out why people fear it.

Incident Management Metrics That Matter - 2021

What are the Key Incident Management metrics/ KPI ‘s? How important is it to track Your Team’s Performance? If you are not doing so already the time is right to get your finger on the pulse by better understanding and managing your organizations incident management key metrics. How a company manages IT Incidents matters and most importantly the process has the power to impact sales – recent studies indicate 52% of U.S.

Uptime/SLA calculator: what is an SLA and how to calculate it?

A Service Level Agreement (SLA) is a document that details the expected level of service guaranteed by a vendor or product. This document generally sets out metrics such as uptime expectations and any payoffs if these levels are not met. For example, if a provider advertises an uptime of 99.9% and exceeds 43 minutes and 50 seconds of service downtime, technically the SLA has been breached and the customer may be entitled to some type of remuneration depending on the agreement.

Intelligent Alert Grouping: What It Is and How To Use It

It’s 2 AM and you’re paged when you’re still awake – how well can you find what you need to fix the latest mistake? When the incident begins it might only be impacting a single service, but as time progresses, your brain boots, the coffee is poured, the docs are read, and all the while as the incident is escalating to other services and teams that you might not see the alerts for if they’re not in your scope of ownership.

What Operational Maturity Looks Like Today With PagerDuty's Kyle Duffy

Companies that underwent accelerated digital transformations during the past 18 months are looking to understand how they can improve their operational maturity to handle the increase in complexity. This is paramount to an organizations’ future success.

4 Pressures at Tech Companies xMatters Can Help Relieve

Technology companies are at the forefront of innovation, changing the way consumers and the general public interact with their everyday lives. As the late Stan Lee so wisely stated, “with great power comes great responsibility,” and this heightened pressure often leaves little room for error when an issue arises—which happens more often than you’d think.

OnPage for Clinical Communication and Collaboration

Modern healthcare teams require a modern solution to streamline clinical communications and medical workflows. In life and death situations, it’s critical that physicians receive immediate alerts and messages to provide patient care promptly. OnPage is the industry’s most trusted clinical communications platform. OnPage is more reliable and secure than traditional pagers. The system enables care teams to easily communicate and achieve maximum patient satisfaction.