Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How to Standardize Service Ownership at Scale for Improved Incident Response

Service ownership is a DevOps best practice where team members take responsibility for supporting the software they deliver at every stage of the development lifecycle. This level of ownership brings development teams much closer to their customers, the business, and the value being delivered. Service owners are the subject matter experts (SMEs) for their services – and in a service ownership model, they are also responsible for responding to any production issues.

A Day in the Life with PagerDuty (2022)

Learn about the day in the life with the PagerDuty Operations Cloud to be ready for anything in a world of digital everything. Watch as the platform helps an organization face increasing digital complexity and dependencies and leverages PagerDuty to transform their operations from manual, rigid, and ticket queue-based, to a continuously improving system that focuses on outcomes and customer experience, delivers operational speed AND resilience, and is heavily automated and augmented by machine learning and AI.

How IT Operations can demonstrate business value with Unified Analytics

As an IT Ops exec, imagine your jubilation upon learning that after a year of hard work across your NOC, DevOps and SRE teams, you are able to automate incident response by 25%. You’re elated as you enter your CTO’s office to share this information, and their response is.

3 Effective Ways to Enhance Patient Safety with EHR Alerts

Hospitals that adopt electronic health records (EHR) to optimize clinical workflows face the decision of how to integrate EHR alerts into their workflows. The rationale is to surface actionable data from EHR systems and present healthcare providers with this information to supplement their day-to-day clinical decisions.

Cloudflare outage? The Domino Effect!

This day started a bit abruptly, with several services experiencing outages due to a Cloudflare outage. It started approximately at 06:34 AM UTC. Check the official announcement. What came next was a domino effect through many popular services over the internet. Major services like Gitlab, Notion, Hubspot, Digital Ocean, Monday, Recurly, and a lot more. We registered incidents from 230 services between the outage was published until it was marked as resolved.

Choosing the Right Incident Notification Tool for Your Incident Response Plan

Is your IT team ready to respond to an increasing volume of data security incidents? According to the 2021 Annual Data Breach report from the Identity Theft Resource Center, 2021 saw a record number of data breaches, representing a 68% increase from the year prior. The most recent Cost of a Data Breach report from IBM shares the Ponemon Institute’s finding that the average data breach is a $4.24 million expense, up 9.8% from the previous year.

Mattermost Playbooks How-to: Release Management

Releasing software to users has become a sophisticated and intricate process that requires high levels of consistency and coordination. A release has to be built, brought together, documented, tested and deployed, which requires coordination of at least four separate teams and a generous handful of pipelines and other tools. Without a well-documented process things can get messy very quickly, causing stress for everyone involved.