Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Transforming the Incident Lifecycle With AI Agents

We’re in the midst of a fundamental shift in how organizations run operations. 51% of companies have already deployed AI agents. What was once reactive and manual is becoming intelligent, automated, and AI-driven. The organizations that embrace this shift gain more than just operational efficiency; they develop a strategic competitive advantage that directly impacts business outcomes.

Operational excellence in the age of AI and Automation

The future of operations is here with PagerDuty's groundbreaking AI and automation innovations. Learn how PagerDuty AI agents, powered by PagerDuty Advance, and new use cases like security incident management and LLMOps can help your organization achieve operational excellence to reduce cost, mitigate the risk of outages, and accelerate innovation.

xMatters Zaxxon Release

Incident management can sometimes feel like piloting a spaceship through enemy fortresses while trying to hit as many targets as possible without, you know... game over. But, even if your response processes don't quite involve pixelated robots and laser beams like in the video game, Zaxxon, our latest release is here to make sure your feet stay firmly on the ground whatever incidents may appear in your stratosphere! Let’s take a look...

How to Combat MSP Alert Fatigue

Managed service providers (MSPs) are responsible for monitoring hundreds or even thousands of devices, meaning that they must have a practical way of identifying incidents, vulnerabilities, and outages. The obvious choice is employing an incident alerting tool that can deliver alerts to the on-call engineers responsible for maintaining system health and performance.

From AI-pocalypse to AI-driven Resilience: 4 Lessons from The Last of Us

Critically-acclaimed TV show The Last of Us is back. As a huge fan, I find striking parallels between the series’ post-apocalyptic environment and modern digital operations. Just as Ellie and Joel’s (the main characters) world was fundamentally changed by an unstoppable force of nature, today’s operations are being radically transformed by increasingly complex, interconnected systems, and the power of AI and automation.

Reduce the impact of hybrid cloud incidents with AI-powered ITSM

Hybrid and multicloud IT environments have become standard for enterprises, and with good reason. These environments offer greater flexibility, improved resilience, and optimized performance by allowing organizations to leverage the best features of multiple cloud providers while maintaining the security of on-premises infrastructure.

Incident Alerting and On-Call Management for MSP (Managed IT Services) Explainer

Managing incidents, on-call, and mass notifications as an MSP just got easier. OnPage helps Managed Service Providers cut down MTTR, hit SLAs, and make sure critical alerts from tools like Jira, ConnectWise, Autotask, and ServiceNow reach the right people—fast. Plus, when urgent updates need to go out to your entire business ecosystem, BlastIT delivers instant mass notifications.

Incident management tool integration

Picture the scene: a high‑severity alert fires, Slack lights up, and dashboards scream red. You’re juggling Datadog, PagerDuty, Jira, and status pages while trying to coordinate fixes. The problem isn’t a lack of tools; it’s that they aren’t talking to each other. This guide explains why incident management tool integration matters, how it cuts response times, and where to start.

AT&T Email-to-Text Service ended: Why SIGNL4 is the Best Alternative

In a move that caught many businesses and IT teams off guard, U.S. mobile carrier AT&T officially discontinued its email-to-text gateway service. ATT email to text was shut down on June 17, 2025 ( read more ). This change means that sending sms messages and mobile text alerts to AT&T subscribers using the format number@txt.att.net or number@mms.att.net no longer works.

Why Reliability Starts with the Network, even in the AI era, with Marino Wijay

In this episode, we explore how networking has shaped reliability as we know it. Marino Wijay cloud networking expert and Staff Solutions Architect at Kong shares how his journey began not as an SRE, but with cables, routers, and switches. Marino explains the evolution of the fabric holding systems together through virtualization, and how software-defined networking, which is now a key element to resilient applications.