Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Introducing the BigPanda Triage Agent and the future of agentic L1 operations

If you’ve been following the development of BigPanda AI Detection and Response (ADR), you’re aware of our mission to automate Level 1 (L1) operations and eliminate the need for manual, time-consuming investigations. In our last update, we highlighted the manual, complex, and time-consuming processes that hinder modern IT teams. Enterprises spend billions on observability tools based on the false belief that more coverage equals total visibility.

PagerDuty Becomes Newest AWS Software Partner to Earn Resilience Competency

As enterprise system failures cost businesses an estimated $400 billion annually in lost revenue and productivity, PagerDuty announced it has achieved the Amazon Web Services (AWS) Resilience Services Competency in the software category - becoming one of the first AWS Software Partners to earn the designation. This achievement validates PagerDuty's ability to help enterprises architect, deploy and maintain mission-critical systems that can withstand failures and recover rapidly with minimal business disruption.

From Noise to Notified: Making Azure Sentinel Alerts Actionable

Modern security operations are overflowing with data, and organizations rely heavily on Azure Sentinel alerts and Microsoft Sentinel alerts to maintain visibility across hybrid environments. From firewalls and endpoints to cloud workloads and identity systems, thousands of signals compete for attention every second. For most security teams, the challenge isn’t detection anymore – it’s action.

Turning Incidents Into Insight: The Continuous AI Operations Loop Explained

Modern systems generate enormous volumes of operational data. Yet, most incident workflows still treat every outage like a one‑off fire drill: an alert fires, responders scramble, the issue is resolved, the status page goes green—and the organization learns almost nothing from the experience. Meanwhile, the same patterns quietly repeat in code releases, logs, traces, and support tickets until they erupt into the next ‘unexpected’ incident.

Shopify Cyber Monday outage - December 1, 2025

On December 1, 2025, Cyber Monday, the biggest online shopping day of the year, Shopify suffered a widespread outage that left many merchants unable to access their stores or process orders. At a time when every minute of uptime translates directly into revenue, the disruption caused immediate concern across the ecommerce community. StatusGator detected the issue within minutes, sending an Early Warning Signal 10 minutes before Shopify published its official acknowledgement.

Introducing a More Flexible On-Call Schedule

Today, we are introducing some new on-call features: Add Gaps to on-call, Scheduled Layers, Handoff Days, and more. Flexibility in on-call schedules has been the single focus point in this release. These features give you much finer control over when people are on-call, how handoffs work, and what your schedule looks like around holidays and time off.