Operations | Monitoring | ITSM | DevOps | Cloud

Introducing Shift-Based Schedules: Smarter, Faster, and Easier for Any Team

This blog post is part of PagerDuty’s ongoing series on how we’re helping customers navigate their journey towards autonomous operations. Read on to learn about how PagerDuty’s Shift-Based Schedules (planned GA in May) builds towards this vision. PagerDuty has long been the gold standard for on-call management, helping thousands of teams build the foundations of digital reliability.

Activate Your Continuous Learning Flywheel With Post-Incident Reviews in PagerDuty UI

Earlier this year at our H1 2026 launch, we announced PagerDuty’s vision for autonomous operations: a future where AI agents learn from every incident, prevent failures before they happen, and progressively automate so teams can focus on innovation instead of firefighting.

Why Dedicated Incident Channels are the Modern Standard for Slack-Based Incident Response

Where do your teams go during a critical incident? For distributed teams, that war room is a channel in Slack or Microsoft Teams. The question is: are you creating a dedicated space for each incident, or are responders scrambling across DMs, email threads, and general channels trying to piece together what happened? The answer matters. Using dedicated incident channels has become the industry standard for high-performing incident response teams.

Prevent outages with PagerDuty incident retrospectives

Recurring incidents are a symptom of a broken process. Your teams are working hard to get services back online, but constantly battling the same problems is frustrating and not a sustainable approach. What’s reflected here is not a failure in engineering abilities, but a deficiency in the learning that should follow an incident. When incident analysis focuses on finding a single person or team to blame, it creates a culture of fear.

How to use an SRE agent to reduce downtime

An alert in the middle of the night warns of a potential business failure. Manual incident response becomes more complex due to the overwhelming data from distributed and dynamic digital services. With an SRE agent, your engineering team can cut through alert clutter. They can sort through various signals quicker, decreasing burnout and achieving faster, more affordable resolutions. Operational resilience will see its next evolution with Agentic AI.

Why Service Architecture Matters: A Practical Guide

It’s 2 a.m. An alert fires. You acknowledge it, pull up the monitoring dashboard, and immediately hit a wall: Which team owns this? What services does it impact? Worse: this is the third time this month you’ve been paged for the same issue, and you still don’t have a clear path to fix it. What should take minutes stretches into hours of Slack threads, escalation guesswork, and frantic context gathering.

Future-Proof your services with agentic AI Operations Cloud

Digital services are the engine of your modern business, but keeping them running feels like a constant battle. The rapid increase in the volume and speed of operational data is a direct result of growing architectures and more intricate workloads. Alert fatigue is causing your teams to be slow and reactive in addressing incidents, and this is a surefire path to burnout. The pace of this new reality is beyond what traditional, human-led processes can match.

Automate your critical workflows with AI agents in 5 steps

Many teams remain bogged down by operational chaos and manual drudgery, even with access to a variety of automation solutions. These tools often operate in silos, creating disconnected islands of automation that require significant human effort to bridge. Agentic AI offers a path forward, creating a cohesive system that can intelligently and autonomously handle complex operational workflows.

SRE agent vs. traditional engineer: 7 key differences

The role of a Site Reliability Engineer (SRE) is evolving. The focus has shifted from simply working harder during an outage; A new kind of teammate is here to help: the SRE Agent. But what are the key differences when you compare an SRE agent versus a traditional site reliability engineer? This isn’t just a superficial change. It signifies a fundamental alteration in how teams construct and sustain dependable services.

PagerDuty Invests in the AI-First Operations and Resilience of Healthcare and Crisis Response Organizations

At PagerDuty, we believe operational excellence and social impact are inseparable. As AI rapidly transforms how nonprofits operate, our AI and agentic technology empower mission-driven teams to automate complexity and focus their limited resources on what matters most: delivering reliable services that create meaningful impact at scale.