Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Generative AI for the PagerDuty Operations Cloud

When it comes to keeping your business’s lights on, you need to manage and orchestrate your operational activities, prioritize high-impact and urgent work, and maintain day-to-day precision. Trust is paramount during mission-critical, time-sensitive crisis response and the narrow margin for error means there is little room and low acceptance for generative AI hallucinations or false positives.

Using PostgreSQL advisory locks to avoid race conditions

The first moments of incident response can be among the most crucial, which in turn can also make them among the most stressful. There are many ways to ensure incidents are kicked off smoothly, but a recent focus of ours was to ensure they could be kicked off quickly. After all, the faster you're able to start mitigating your incident, the more successful you'll be!

The 5 Incident Severity Levels - And a Free Matrix

Just as a red flag warns of imminent danger, incident severity levels in IT Service Management (ITSM) act as crucial indicators that alert organizations to potential problems. By understanding and leveraging them, businesses can swiftly and effectively respond to incidents, minimizing their impact on operations. In the dynamic business operations landscape, unexpected disruptions are an unavoidable reality.

The PagerDuty Operations Cloud | Strategic Overview

In this two-minute video, learn more about the PagerDuty Operations Cloud - the platform used by modern digital enterprises to automate and accelerate mission-critical operations work. The PagerDuty Operations Cloud is essential infrastructure that detects and diagnoses disruptive events, mobilizes the right team members to respond, and automates workflows across your digital operations - so that your business moves forward, faster.

Callable Flows - xMatters Support

In xMatters Flow Designer, you can use callable flows to initiate a major incident process in any workflow. Instead of including the same sequence of steps in each workflow, such as posting to a status page or opening a help desk ticket, you can build the sequence once as a separate workflow and then include that as a step in any of your workflows.

Reduce MTTR and Address the Talent Gap with Logz.io Alert Recommendations

When our CEO and co-founder Tomer Levy delivered his “Observability is Broken” presentation at last year’s AWS re:Invent, he highlighted numerous challenges faced by today’s organizations as they seek to advance their observability practices. Of the six individual points that he noted, two specifically dealt with the current shortage of available engineering expertise, with another two focused on data overload.

Use incident cycle time to optimize your incident response process

Although the causes and solutions for incidents vary widely, most incidents follow a similar timeline from declaration to resolution. We call the period of time it takes to move from one phase or milestone of an incident to the next cycle time.