%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

RetroDuty: How We Scale Continuous Improvement Beyond Engineering at PagerDuty

Nov 5, 2019 By Derek Ralston In PagerDuty

If you’ve worked on a team that has adopted Agile techniques, you’ve probably heard of a retrospective. If not, here’s the TL;DR: A retrospective is a meeting in which a team connects regularly to reflect on what happens throughout a project and continuously improve how they work moving forward.

Read Post

PagerDuty

Read more about RetroDuty: How We Scale Continuous Improvement Beyond Engineering at PagerDuty

Meet Root Cause Changes from BigPanda - IT Ops, NOC and DevOps Teams' Best friend For Supporting Fast-Moving IT Stacks

Nov 5, 2019 By Mohan Kompella In BigPanda

TL;DR: Fast-moving IT stacks see frequent, long and painful outages. Thousands of changes – planned, unplanned and shadow changes – are one of the main reasons behind this. Until now, IT Ops, NOC & DevOps teams didn’t have an easy way to get a real-time answer to the “What Changed?” question – the answer that can help reduce the duration of outages and incidents in these fast-moving IT stacks. Now, with BigPanda Root Cause Changes, they do.

Read Post

BigPanda

Read more about Meet Root Cause Changes from BigPanda - IT Ops, NOC and DevOps Teams' Best friend For Supporting Fast-Moving IT Stacks

What Is MTTR? Mean Time to Repair, Explained In Detail

Nov 5, 2019 By Ben Munat In XpoLog

Whether you’re slinging code, managing developers, wrangling servers, or filling most other roles in the modern tech firm, you care about keeping your software running while bringing home the bacon. If your website or application is down, you’re not making money. (Or, if you aren’t in this for profit, your message isn’t getting to the people who need it.) Therefore, it’s everyone’s job to keep things running smoothly.

Read Post

XpoLog

Read more about What Is MTTR? Mean Time to Repair, Explained In Detail

Join the alpha program for Mattermost's new Incident Response Workflow app

Nov 4, 2019 By Jason Blais In Mattermost

Is your InfoSec or DevSecOps team ready to resolve issues as quickly as possible? To help accelerate response times, we’re happy to announce the alpha release of the Mattermost Incident Response Workflow application for Enterprise Edition, supported in Mattermost 5.12 and later. The app is designed specifically for incident response and enables you to connect all your workflows, automate repetitive tasks, and collaborate on incidents—all without leaving Mattermost.

Read Post

Mattermost

Read more about Join the alpha program for Mattermost's new Incident Response Workflow app

Rise of the Digital Operations Ecosystem

Nov 4, 2019 By Jukka Alanen In PagerDuty

Many organizations today are dealing today a lot of complexity and disconnected tools. Teams and departments are running in parallel but siloed from each other. People are burned out from a lot of manual work, and everyone is crunched for time. This is not a happy ecosystem to live in. If this digital ecosystem doesn’t work together, your teams don’t know what’s going on and they lack the right information.

Read Post

PagerDuty

Read more about Rise of the Digital Operations Ecosystem

PagerDuty ServiceNow Integration How-To Video (Complete w/Parts 1 -3)

Nov 1, 2019 By PagerDuty In PagerDuty

Learn how to install, configure, and test the PagerDuty ServiceNow Integration. (PagerDuty Integration Version 6.0).

View Video

PagerDuty

Read more about PagerDuty ServiceNow Integration How-To Video (Complete w/Parts 1 -3)

It Came From Below

Oct 31, 2019 By Kelsey Shannahan In PagerDuty

I’m going to assume most people who read this blog are familiar with PagerDuty. But just in case anyone isn’t, PagerDuty is a tool we use in IT to notify us if some predefined check has failed. Maybe a key process has died or maybe we’re not seeing our expected traffic volume or maybe our server has stopped responding to ping. Whatever it is, PagerDuty will relentlessly, remorselessly, and loudly notify whoever is on call that something needs attention.

Read Post

PagerDuty

Read more about It Came From Below

Achieve Better Accountability With Full-Service Ownership

Oct 30, 2019 By Julie Gunderson In PagerDuty

Software teams seeking to provide better products and services must focus on faster release cycles. But running reliable systems at ever-increasing speeds presents a big challenge. Software teams can have both quality and speed by adjusting the policies around ongoing service ownership. While on-call plays a large part in this model, advancement in knowledge, more resilient code, increased collaboration, and practice also mean engineers don’t have to wake up to a nightmare.

Read Post

PagerDuty

Read more about Achieve Better Accountability With Full-Service Ownership

What is BigPanda?

Oct 29, 2019 By BigPanda In BigPanda

BigPanda helps IT Ops, NOC and DevOps teams detect, investigate, and resolve IT incidents & outages in fast-moving IT.

View Video

BigPanda

Read more about What is BigPanda?

BigPanda Root Cause Changes

Oct 29, 2019 By BigPanda In BigPanda

Changes are responsible for more than 85% of incidents and outages. BigPanda automatically analyzes information from your CI/CD and change tools, and matches it to your monitoring alerts, to quickly identify the root cause changes.

View Video