Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Toil: Still Plaguing Engineering Teams

Our industry has always had localized expressions for work that was necessary but didn’t move the company forward. The SRE movement calls this type of work “toil.” The concept of toil is a unifying force because it provides an impartial framework for identifying — then containing — the work that takes up our time, blocks people from fulfilling their engineering potential, and doesn’t move the company forward.

Cyber, incident, downtime: Three words that chill the board, and how to tame them

There are three words that every member around a boardroom table fears when they hear them strung together: "Cyber... incident... downtime". They are never the precursor to a good meeting! Technology incidents can leave the business in the dark and bring the wheels of industry grinding to a halt. With no operational systems, a Gartner report found that companies can lose up to half a million dollars per hour from severe incidents based on losses and remediation.

DERDACK SIGNL4 for Microsoft Sentinel, Defender for Cloud and more

Doreen talks us through the value-add of SIGNL4 for MSPs and enterprise customers of Microsoft Security products and how SIGNL4 facilitates an automated and seamless 24/7 oncall management experience. Derdack SIGNL4 is a member of the Microsoft Intelligent Security Alliance (MISA).

How to Help Teams Create Optimal Infrastructure for Availability

Teams are locked into a cycle of suffering characterized by the feeling that they are sprinting just to stay still. This morale and productivity-destroying state is caused by an inability to find time to save time. Our new research, The State of Availability Report 2022, discovered that teams know what they want to do—harness cloud and DevOps practices and tools to advance digital transformation—but something’s getting in the way.

Improving Incident Management with Automation

Incident management is your organization’s first line of defense. When incidents occur, internal teams must be ready to respond quickly. While incidents can happen anytime, it’s unrealistic to expect incident managers to be prepared to perform manual root cause analysis. Manually monitoring and analyzing applications on multiple servers is extremely difficult, which is why human reaction times have traditionally limited the speed of incident management.

What's New: Updates to Incident Response, PagerDuty Process Automation Software & PagerDuty Runbook Automation, Mobile App Experience, and More!

We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud in addition to the November Product Launch announcements made earlier this month. Recent development and app updates from the product team include Incident Response, PagerDuty® Process Automation, the PagerDuty Mobile App, Integrations, as well as Community & Advocacy Events updates.

7 Incident Management Best Practices to Improve Business Efficiency

Think about the last time your IT systems had an outage: How did your team react to it? Were they organized with a clear idea of how best to resolve the issue? Or was it chaotic, with people firing questions from all directions and customer service channels ablaze with requests for help? Digital technology disruptions are typical (and even expected) at the workplace, but it doesn’t have to be chaotic, with teams rushing around to extinguish the metaphoric fire.