Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How PagerDuty's Ecosystem Partners Are Helping People During the COVID-19 Crisis

For many of us, “working” is incredibly difficult right now. That’s true at the organizational level, where maintaining business continuity and accounting for changes in customer needs are even more critical. But it’s also true at the individual level, where the sudden shift to working from home has jolted us all into working in new ways, and made virtual collaboration an essential part of each workday.

April 2020 Update: Goodbye "I never got that alert" and emergency alerting - the new Signl Center

Our April update is BIG. It introduces emergency alerting to reach you entire team. We hope this will be a bit of humble support to your organization in this Covid-19 crisis. The core of this release is the new signl center, the new place to track alerts and their delivery in real-time, to see incoming events and how they are processed. You can now send emergency alert to your entire team with a single click.

Real-Time Retail in Asia Pacific: Ensuring Exceptional Customer Experience in an Always-On World

Across Asia Pacific, “Real-Time Retail” and e-commerce have never been more essential to meet the expectations of always-on, connected customers than it is today. To be successful, e-retailers need to ensure high availability and functionality across complex, interconnected services such as payment gateways, inventory and order management, and website and mobile apps.

How AI Helps IT Ops Pros Work Remotely

While the COVID-19 pandemic reshapes work processes, digitalization is allowing businesses to adjust to the fluid situation. The deployment of AI in IT operations is a good case study of this. Human beings’ social dimension needs cultivation. Otherwise, people become unhappy and perform ineffectively. Beyond that, many tasks require social interaction to be executed successfully, including in IT operations.

Q&A with Alex Hidalgo on SLOs

Alex Hidalgo is a Site Reliability Engineer at Squarespace, and he’s currently writing a book called Implementing Service Level Objectives for O’Reilly Media. The first three chapters of the book are available now through O’Reilly’s early access program. I had a chance to read those chapters and ask Alex some questions about service level objectives and reliability. Thanks, Alex, for sharing your knowledge.

Modernizing and Consolidating Your Monitoring Without Losing It...

The current days of remote work and “IT Ops from home” may or may not be here to stay, but they definitely reinforce the need for consolidating and modernizing our monitoring. The challenges which multiple siloed tools create for understanding the big picture are only exacerbated by having just one screen to look at when monitoring our IT from our kitchen table.