Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

This Is the Most Underappreciated Skill for SREs

Delivering great software and sustainable systems is a team sport. Without the support of all stakeholders, adoption initiatives often fail. In successful initiatives, SREs are responsible for bringing together all resources and team members to help resolve reliability-related issues. But getting together these resources takes much more effort than people think. SREs engage in lots of glue work to ensure these collaborative efforts happen.

Building and Scaling Your SRE Team

Building Site Reliability Engineering (SRE) teams is hard! There are so many articles and explanations of what SRE means, it’s easy to get lost. Going beyond understanding what the individual SRE role is into building and scaling a team of SREs is more of a challenge. It’s important to find the right information that will help you take your SRE team to the next level.

5 Steps to Building a Robust Incident Response Plan for your MSP

Today’s organizations face ransomware, malware, and other cyber attacks, and managed service providers (MSPs) need an incident response plan (or “IRP”) to mitigate against these threats. In a recent survey of 200 MSPs, 74% of respondents said they have suffered a cyber attack, and 83% noted their small and medium-sized business (SMB) customers experienced one as well. Yet, with an incident response plan (IRP), MSPs can protect themselves and their customers against cyber attacks.

Seamless CMDB Provisioning Gives Responders the Data They Need to Respond Faster

We knew that the most loved feature in our ServiceNow 7.0 release would be the CMDB features. And in our ServiceNow 7.5 release (available now), we’ve expanded our CMDB capabilities even further—based on your feedback—around the importance of reducing the effort it takes to re-create the same services within PagerDuty.

2020 Year in Review: OnPage Continues to Grow Despite the Pandemic

2020 was an unpredictable year that presented several challenges, such as the outbreak of the coronavirus (COVID-19) pandemic. As part of the “new normal,” the world has adopted infection prevention procedures. The 2020 calendar year was defined by face coverings, constant sanitization and physical distancing. At its core, the year was an exhausting, surreal 12-month period for many.

Better incident management while working remotely: The Squadcast way

As the pandemic wears on, remote incident management has become the norm worldwide for businesses. Here we share some best practices that helped us to address remote incidents and make on-call less stressful. With the onset of remote work due to Covid-19, remote incident management has become the norm for businesses worldwide. Organisations that were earlier used to having war rooms now find themselves having to coordinate teams through Slack, MS Teams or other collaboration tools.