Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How to build a successful on-call team - incident.fm

In this podcast, our panellists discuss what it means to build a successful on-call team. Drawing on their experiences at fast growing start-ups and scale-ups, incident.io co-founders Pete and Chris cover everything from who should be on the rota and how to build a compassionate on-call culture, to compensation structures and tips for operationalising on-call.

PagerDuty and DataOps: Enabling Organizations to Improve Decision Making with Better Data

Many organizations have been digitally transforming their operations and the majority of them are moving to the cloud. With this transformation, data teams have to analyze ever larger and more complex data sets to allow downstream teams to make faster and more accurate decisions on a daily basis. Consequently, most organizations need to work with: customer data, product data, usage data, advertising data, and financial data.

Do You Understand Your Essential Business Processes?

Before you can choose the proper tools for your organization, you have to understand its essential business processes. Once you know an essential business process, you can review software applications that will help make your organization more efficient and accurate. Unfortunately, many organizations do not understand their essential business processes. This makes it nearly impossible for them to streamline their organizations, which puts them at a disadvantage in the marketplace.

A deep-dive into event correlation

Event correlation is a powerful capability that can help reduce IT noise, detect incidents in real-time, and improve the performance of critical applications and services. Read on for a deep dive into event correlation as we explore everything from its origins to its current state-of-the-art techniques. We’ll also discuss how event correlation fits into the bigger picture of integrated service management.

The Roblox Outage

Just before Halloween 2021, Roblox engineers experienced a horror story: a service outage that also took down critical monitoring systems. It seemed like the issue was a hardware problem, but it wasn’t. Users were frustrated, and the clock was ticking. After three full days of downtime, service was finally restored on Halloween day. While the incident itself was an IT nightmare, Roblox’s detailed technical post-mortem several months later was an excellent way to bounce back.
Sponsored Post

Introduction to Automation Testing Strategies For Microservices

Microservices are distributed applications deployed in different environments and could be developed in different programming languages having different databases with too many internal and external communications. A microservice architecture is dependent on multiple interdependent applications for its end-to-end functionalities. This complex microservices architecture requires a systematic testing strategy to ensure end-to-end (E2E) testing for any given use case. In this blog, we will discuss some of the most adopted automation testing strategies for microservices and to do that we will use the testing triangle approach.

From checklist to playbook: Creating structure for your processes

Playbooks aim to be a super-powered checklist for repetitive tasks. Before you can get to the “super-powered checklist,” though, you need to identify the process that you’ll use to build your first playbook and create a structured process as a Playbook checklist. Let’s go on that journey today.

Enterprise Alert 9.4 Update introduces Remote Actions for hybrid scenarios and proxy support for MS Teams

We have released another update for Enterprise Alert 9 (version 9.4) which enhances the cloud bridge and MS Teams integrations. This will help you to setup scenarios where you wish to active your Enterprise Alert remote actions from with the Signl4 app as well as allowing for using a proxy to configure the MS Teams integration. Read all details in this article.

Webinar: AIOps in healthcare

Healthcare around the world is constantly evolving. The amount of data being generated daily from every appointment and interaction, no matter how small or large, needs to be processed and analyzed in order to improve patient outcomes. The data must be accurate, stored, accessible and secure. Without a core infrastructure of smart IT, any outcomes are extremely challenging to generate, and data must be available in seconds for doctors to make life-saving decisions. The bottom line?