Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

What the Big Brother Approach to IT Monitoring and Incident Management May Be Missing

Jan 28, 2021 By Olaf Schouws In StackState

We asked in a recent poll which popular TV show your IT team resembles the most. Big Brother came out on top, with almost 40% of respondents saying that their incident resolution process most resembled this show. Would you compare your incident management process to an episode of Big Brother? If so, it's likely that your IT environment is highly monitored, but incidents still seem to slip through the cracks.

Read Post

StackState

Read more about What the Big Brother Approach to IT Monitoring and Incident Management May Be Missing

SLA vs SLI vs SLO: Know the differences between them.

Jan 28, 2021 By Neil Haran In OneUptime

SLA basically means a Service Level Agreement. It’s a formal agreement between you and your customer. It basically describes the reliability of your product/service so you can have a formal agreement which basically says our product will be online 99 percent of the time annually and if we fail to achieve that objective we will give 30% of your annual license fee back. SLA’s also include penalties in the contract.

Read Post

OneUptime

Read more about SLA vs SLI vs SLO: Know the differences between them.

How We Use Blameless Incident Retrospectives for Remote Work

Jan 28, 2021 By Blameless In Blameless

With Blameless incident retrospectives, our team can create comprehensive timelines, custom tags and questions, and collaborate to learn more from each incident.

View Video

Blameless

Read more about How We Use Blameless Incident Retrospectives for Remote Work

SLA vs SLO vs SLI

Jan 27, 2021 By OneUptime In OneUptime

SLA vs SLO vs SLI Service Level Agreement (SLA): Formal agreement betwen you and your customer. Service Level Objective (SLO): Reliability goal of a resource in your organization (eg: API uptime should be 99.0% annually) Service Level Indicator (SLI): Current reliability (ie: what your monitoring tool tells you)

View Video

OneUptime

Read more about SLA vs SLO vs SLI

PagerDuty + AWS Outposts Integration Workflow Demo

Jan 27, 2021 By PagerDuty In PagerDuty

PagerDuty for AWS Outposts empowers teams to manage incidents in real-time for AWS infrastructure used in a private data center, co-location space, or on-premises facility.

View Video

PagerDuty

Read more about PagerDuty + AWS Outposts Integration Workflow Demo

PagerDuty + Amazon EventBridge Quick Start Integration Workflow Demo

Jan 27, 2021 By PagerDuty In PagerDuty

View Video

PagerDuty

Read more about PagerDuty + Amazon EventBridge Quick Start Integration Workflow Demo

Does your MSP Have a Backup and Disaster Recovery Plan?

Jan 27, 2021 By AlertOps In AlertOps

Data loss can cause big problems for managed service providers (MSPs) and their customers. With an MSP backup and disaster recovery (BDR) solution in place, MSPs can guard against data loss following a cyber attack, hardware failure, or any other IT incident.

Read Post

AlertOps

Read more about Does your MSP Have a Backup and Disaster Recovery Plan?

The U.S. COVID Vaccine Distribution Plan: Challenges and Solutions

Jan 27, 2021 By Ritika Bramhe In OnPage

As coronavirus (COVID-19) continues to spread and new virus strains emerge, the public is frantically looking for answers regarding the U.S. government’s vaccine distribution plan. A sound vaccine distribution plan is especially crucial in times like these. All U.S. states, stretching from both coasts, are experiencing a vast number of COVID-related deaths and hospitalizations. The dire situation underscores the importance of having an effective, accelerated vaccine delivery process.

Read Post

OnPage

Read more about The U.S. COVID Vaccine Distribution Plan: Challenges and Solutions

New Feature: Incident types

Jan 27, 2021 By Robert Ross In FireHydrant

Incidents are inevitable, and the reality is some of them are inevitably going to repeat themselves. FireHydrant has always strived to make the entire incident response lifecycle smooth, but up until today, common incident types were slightly burdensome for our customers. We decided it was time to help people make it easy to declare incidents using easy-to-use templates, which we’re deeming Incident types.

Read Post

FireHydrant

Read more about New Feature: Incident types

Who Else Wants to Increase Development Velocity?

Jan 26, 2021 By Emily Arnott In Blameless

Implementing SRE is fundamentally about shifting culture, but it often means adding new tooling and processes to your team's workflows to support that cultural change. Teams add new steps and checks to incident response procedures. Incident responders write retrospectives and create new meetings to review them. Engineers consult new tools like monitoring dashboards and SLOs. In other words, SRE creates another layer of consideration in development and operations.

Read Post