Operations | Monitoring | ITSM | DevOps | Cloud

Incident Response

Words matter: incident management versus incident response

I recently published a couple of blog posts about what happens when you invest in a thoughtful incident management strategy and three first steps to take to do so. What I’m getting at in these posts is that we need a shift toward proactivity in the software operators community. I’d wager most of the world is responding to incidents as they happen, and nothing more.

Developing a Data Breach Incident Response Plan

With cybersecurity boundaries going beyond the traditional walls of an office and attack surfaces constantly expanding, data breaches are inevitable. Managing risks from data breaches requires organizations to develop a comprehensive incident response plan – an established guideline that facilitates incident detection, response and containment, and empowers cybersecurity analysts to secure a company’s digital asset.

How to Standardize Service Ownership at Scale for Improved Incident Response

Service ownership is a DevOps best practice where team members take responsibility for supporting the software they deliver at every stage of the development lifecycle. This level of ownership brings development teams much closer to their customers, the business, and the value being delivered. Service owners are the subject matter experts (SMEs) for their services – and in a service ownership model, they are also responsible for responding to any production issues.

Introducing Grafana OnCall OSS, on-call management for the open source community

Last November, we announced the launch of Grafana OnCall, an easy-to-use on-call management tool that helps reduce toil through simpler workflows and interfaces tailored for developers. Born out of Grafana Labs' acquisition of Amixr Inc., Grafana OnCall began as a cloud-only solution that became generally available to all Grafana Cloud users, on both paid and free plans, in February.

Crossing "The Last Mile" with an Incident Response System

Delivering dependable and high-performing IT services in 2022 requires coordination and collaboration across different workflows, areas of expertise, and even time zones. Whether serving in-house colleagues or external clients, there is immense pressure on IT management to create seamless experiences 24/7/365. Seconds matter when critical systems break down, and slow incident resolution can have costly ramifications on customer experience and employee productivity.

The Future of Incident Response is Automated, Flexible, and Proactive

We know our customers rely on PagerDuty as the backbone of critical real-time operations, so we want to make sure each and every enhancement helps streamline incident response. How can we help our customers spend less time firefighting and more time innovating? One of PagerDuty’s values is Champion the Customer – and we take this very seriously. When building and improving features, we aim to keep a pulse on what’s going on with our customers: what’s keeping them up at night?

What's New: Updates to Incident Response, AIOps, Pagerduty Process Automation, and More!

Summit’s right around the corner (have you registered yet?) but the shipping doesn’t stop! We’re excited to announce a new set of updates and enhancements to PagerDuty’s Digital Operations Platform. Recent updates from the product team include On-Call Management, Incident Response, Process Automation, and Integrations, to PagerDuty Community & Advocacy Events. New capabilities enable users and customers to resolve incidents faster, do the following, and more.

When incident response requires business response, who should you notify?

From a single on-call engineer hopping online to resolve a problem, to a massive cross-team effort that brings in even the most senior technical leadership (CTO, CISO, or CIO), incident response teams are lucky when they’re able to resolve issues before a customer is aware. But in the cases where there is customer impact, other stakeholders like sales and customer service need to be informed and updated as well.

How to empower your team to own incident response

Responding to and managing incidents feels fairly straightforward when you’re in a small team. As your team grows, it becomes harder to figure out the ownership of your services, especially during critical times. In those moments, you need everyone to know exactly what their role is in order to recover fast. Moving to incident.io as the 7th engineer, from a scaleup of around 70 engineers, has given me a new perspective on what it means to own your code.

Lightstep Incident Response | Integration: Grafana

Learn how to integrate Grafana into Lightstep Incident Response. The all-in-one web application performance tool provides performance insight from the end-user experience. Your feedback helps us serve you better. Did you find this video helpful? Leave us a comment to tell us why or why not.