Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Key considerations before signing up for cyber insurance

With 2021 seeing 5.1 billion records breached and an annual increase in attacks at 11%, the risk of security incidents is only getting greater every year. And when an attack hits, the cost to recover, which includes fines, penalties, legal fees, and much more, are also great. To help minimize the scope of financial damage, many organizations turn to cyber insurance. Albeit a relatively new branch of insurance, demand is already huge and ever increasing.

Sponsored Post

Classifying Severity Levels for Your Organization

Major outages are bound to occur in even the most well-maintained infrastructure and systems. Being able to quickly classify the severity level also allows your on-call team to respond more effectively. Imagine a scenario where your on-call team is getting critical alerts every 15 minutes, user complaints are piling up on social media, and since your platform is inoperative revenue losses are mounting every minute. How do you go about getting your application back on track? This is where understanding incident severity and priority can be invaluable. In this blog we look at severity levels and how they can improve your incident response process.

Setting up Runbooks in Squadcast | SRE Best Practices | Squadcast

A Runbook is a compilation of routine procedures and operations that are documented for reference while working on a critical incident. Sometimes, it can also be referred to as a Playbook. From this video, learn to create, attach, reference and mark progress for incident resolution using Runbooks.

What's New: Updates to Incident Response, PagerDuty Process Automation, Integrations, and More!

Following another successful PagerDuty Summit, development continues across several areas of the product. We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud. Recent updates from the product team include Incident Response, PagerDuty® Process Automation and PagerDuty® Runbook Automation, Partner Integrations & Ecosystem, as well as Community & Advocacy Events updates.

Release Notes: Process Automation and Rundeck OSS 4.4.0

Product managers Forrest Evans and Jake Cohen show off new features and enhancements in PagerDuty Process Automation and Rundeck Open Source version 4.4.0. Version 4.4.0 features two new plugins for #AWS:#Lambda Custom (ephemeral) scripts#ECS/#Fargate Commands For more details on other improvements in this release, see the full Release Notes.

3 common pitfalls of post-mortems

Small confession: we currently use the term 'post-mortem' in incident.io despite preferring the term 'incident debrief'. Unless you have particularly serious incidents, the link to death here really isn’t helping anyone. However, we're optimising for familiarity, so we're sticking to the term 'post-mortem' here. Ask any engineer and they’ll tell you that a post-mortem is a positive thing (despite the scary name).

Zero Trust Security: Key Concepts and 7 Critical Best Practices

Zero trust is a security model to help secure IT systems and environments. The core principle of this model is to never trust and always verify. It means never trusting devices by default, even those connected to a managed network or previously verified devices. Modern enterprise environments include networks consisting of numerous interconnected segments, services, and infrastructure, with connections to and from remote cloud environments, mobile devices, and Internet of Things (IoT) devices.

Automating Common Diagnostics for Kubernetes, Linux, and other Common Components

This is the second piece in a series about automated diagnostics, a common use case for the PagerDuty Process Automation portfolio. In the last piece, we talked about the basics around automated diagnostics and how teams can use the solution to reduce escalations to specialists and empower responders to take action faster. In this blog, we’re going to talk about some basic diagnostics examples for components that are most relevant to our users.