Incident Response

How to Standardize Service Ownership at Scale for Improved Incident Response

Jun 22, 2022 By Hannah Culver In PagerDuty

Service ownership is a DevOps best practice where team members take responsibility for supporting the software they deliver at every stage of the development lifecycle. This level of ownership brings development teams much closer to their customers, the business, and the value being delivered. Service owners are the subject matter experts (SMEs) for their services – and in a service ownership model, they are also responsible for responding to any production issues.

Read Post

PagerDuty

Read more about How to Standardize Service Ownership at Scale for Improved Incident Response

Introducing Grafana OnCall OSS, on-call management for the open source community

Jun 14, 2022 By Matvey Kukuy In Grafana

Last November, we announced the launch of Grafana OnCall, an easy-to-use on-call management tool that helps reduce toil through simpler workflows and interfaces tailored for developers. Born out of Grafana Labs' acquisition of Amixr Inc., Grafana OnCall began as a cloud-only solution that became generally available to all Grafana Cloud users, on both paid and free plans, in February.

Read Post

Grafana

Read more about Introducing Grafana OnCall OSS, on-call management for the open source community

Crossing "The Last Mile" with an Incident Response System

Jun 10, 2022 By James Truslow In OnPage

Delivering dependable and high-performing IT services in 2022 requires coordination and collaboration across different workflows, areas of expertise, and even time zones. Whether serving in-house colleagues or external clients, there is immense pressure on IT management to create seamless experiences 24/7/365. Seconds matter when critical systems break down, and slow incident resolution can have costly ramifications on customer experience and employee productivity.

Read Post

OnPage

Read more about Crossing "The Last Mile" with an Incident Response System

The Future of Incident Response is Automated, Flexible, and Proactive

Jun 7, 2022 By Vivian Chan In PagerDuty

We know our customers rely on PagerDuty as the backbone of critical real-time operations, so we want to make sure each and every enhancement helps streamline incident response. How can we help our customers spend less time firefighting and more time innovating? One of PagerDuty’s values is Champion the Customer – and we take this very seriously. When building and improving features, we aim to keep a pulse on what’s going on with our customers: what’s keeping them up at night?

Read Post

PagerDuty

Read more about The Future of Incident Response is Automated, Flexible, and Proactive

What's New: Updates to Incident Response, AIOps, Pagerduty Process Automation, and More!

May 31, 2022 By Vera Chan In PagerDuty

Summit’s right around the corner (have you registered yet?) but the shipping doesn’t stop! We’re excited to announce a new set of updates and enhancements to PagerDuty’s Digital Operations Platform. Recent updates from the product team include On-Call Management, Incident Response, Process Automation, and Integrations, to PagerDuty Community & Advocacy Events. New capabilities enable users and customers to resolve incidents faster, do the following, and more.

Read Post

PagerDuty

Read more about What's New: Updates to Incident Response, AIOps, Pagerduty Process Automation, and More!

When incident response requires business response, who should you notify?

May 25, 2022 By Hannah Culver In PagerDuty

From a single on-call engineer hopping online to resolve a problem, to a massive cross-team effort that brings in even the most senior technical leadership (CTO, CISO, or CIO), incident response teams are lucky when they’re able to resolve issues before a customer is aware. But in the cases where there is customer impact, other stakeholders like sales and customer service need to be informed and updated as well.

Read Post

PagerDuty

Read more about When incident response requires business response, who should you notify?

How to empower your team to own incident response

May 16, 2022 By Martha Lambert In Incident.io

Responding to and managing incidents feels fairly straightforward when you’re in a small team. As your team grows, it becomes harder to figure out the ownership of your services, especially during critical times. In those moments, you need everyone to know exactly what their role is in order to recover fast. Moving to incident.io as the 7th engineer, from a scaleup of around 70 engineers, has given me a new perspective on what it means to own your code.

Read Post

Incident.io

Read more about How to empower your team to own incident response

Incident response: leadership & psychological safety with Jeli founder Nora Jones

May 13, 2022 By CircleCI In CircleCI

Incident response can tricky. How do you create a culture prepared to handle it in a healthy and efficient way? What should and shouldn't leaders be involved in? How do you create a psychologically safe environment? Find out in this episode as Rob as he interviews Jeli founder Nora Jones on the do's and don'ts of incident response.

View Video

CircleCI

Read more about Incident response: leadership & psychological safety with Jeli founder Nora Jones

How to Make Your Incident Response Plan with Mattermost

May 11, 2022 By Andrew Zigler In Mattermost

For teams who deploy software to users around the world, every second counts when responding to outages and other incidents. It’s important that you have tools in your arsenal that are up to the challenge. Service monitoring, alerting, collaboration, and visibility are all essential components of a well-implemented incident response plan.

Read Post

Mattermost

Read more about How to Make Your Incident Response Plan with Mattermost

5 Ways Automated Incident Response Reduces Toil

May 9, 2022 By Faith Kilonzi In Torq

Toil — endless, exhausting work that yields little value in DevOps and site reliability engineering (SRE) — is the scourge of security engineers everywhere. You end up with mountains of toil if you rely on manual effort to maintain cloud security. Your engineers spend a lot of time doing mundane jobs that don’t actually move the needle. Toil is detrimental to team morale because most technicians will become bored if they spend their days repeatedly solving the same problems.

Read Post

Torq

Read more about 5 Ways Automated Incident Response Reduces Toil

Operations | Monitoring | ITSM | DevOps | Cloud

Incident Response

How to Standardize Service Ownership at Scale for Improved Incident Response

Introducing Grafana OnCall OSS, on-call management for the open source community

Crossing "The Last Mile" with an Incident Response System

The Future of Incident Response is Automated, Flexible, and Proactive

What's New: Updates to Incident Response, AIOps, Pagerduty Process Automation, and More!

When incident response requires business response, who should you notify?

How to empower your team to own incident response

Incident response: leadership & psychological safety with Jeli founder Nora Jones

How to Make Your Incident Response Plan with Mattermost

5 Ways Automated Incident Response Reduces Toil

Monthly Archive

Follow Us