Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How Do I Route Alerts by Location to the Right On-Call Team?

When your company has multiple offices or operational sites – whether that’s across the U.S. or around the world – getting alerts to the right team isn’t as easy as just checking who’s on duty. Events can come from a wide range of sources tied to different physical locations, time zones, or even separate departments, and not every alert is meant for every team. Let’s say your company has operations in New York, Dallas, and San Francisco.

PagerDuty Joins AWS QuickSuite: Connect Your Incident Management with 1,000+ Applications

Today, we’re announcing that PagerDuty is now available in AWS QuickSuite through the Model Context Protocol (MCP). This means PagerDuty’s incident management capabilities can now connect with the 1,000+ applications and data sources that QuickSuite integrates with, from AWS services to enterprise SaaS platforms, all accessible through natural language.

AWS Outage: How do you prepare for the failure of your own safety net?

When AWS’s massive outage struck, it didn’t just take down cloud services, apps, and enterprise platforms. It also knocked out many of the monitoring systems organizations depend on for real-time answers. Observability companies, including Datadog, New Relic, Checkly, Dynatrace, SpeedCurve, and Splunk Observability, lost visibility or functionality precisely when organizations needed them most.

A Launch Day in the Life with AI Teammates

Alex, an SRE at Greenagonia, starts the day knowing there’s a big launch coming. Pre-orders suggest a 5-10x increase in normal traffic, which means coffee needs to be extra strong this morning. As Alex scans through overnight alerts, he realizes he’s completely forgotten about a dentist appointment that overlaps with his upcoming on-call shift. Six months ago, this would have meant frantic Slack messages or at least one phone call. Today? Alex’s AI teammate has it covered.

7 Ways Your Incident Management Just Got a Boost (New Feature Rundown)

All the things you may have missed that will make your incident management smarter, faster, and simply easier. We ship updates every week because we want you to get the most out of FireHydrant. But we also know it's hard to stay up to date and read every week's changelog (even though we know reading changelogs is the highlight of your week ).

Experimenting With Different Scripts

It all began when I spun up an AWS t4g.small burstable instance for a side project. Nothing unusual just another day in the cloud. But the moment I connected through SSH, something caught my eye. The system greeted me with a temperature reading of -273.5°C. Wait… what? That’s 0 Kelvin, the point where atomic motion completely stops. In other words, absolute zero , a state that’s theoretically impossible for anything to operate in.

Understand the ROI of BigPanda: Top quantitative and qualitative findings

We published the first report showcasing the business value of the BigPanda platform, based on both quantitative and qualitative feedback from more than 20 enterprise customers. The Business Value of the BigPanda Platform report provides tangible insights into our platform’s impact on business outcomes.

Agentic ITOps: The evolution of AIOps

Enterprise IT departments are struggling to keep up with the dramatic increases in complexity, fragmentation, and chaos in their IT environments. Legacy tools and processes designed for monolithic systems and static infrastructures cannot meet these challenges. Enterprise ITOps requires a more agile and intelligent approach that leverages advances in AI and automation to remain scalable, effective, and sustainable.