Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

IT Service Management (ITSM): A Complete Guide

As digital transformation accelerates, organizations face increasing complexity, tighter budgets, and relentless pressure to provide exceptional service. This creates a constant challenge in balancing cost, stability, and service. IT Service Management (ITSM) strategically designs, delivers, manages, and improves IT services by aligning them with business goals and optimizing service delivery.

Best incident management tools in 2025 [45 analyzed, top 3 picks]

PagerDuty, Splunk, ServiceNow — with dozens of incident management tools on the market, how do you know which one to choose? Here's the reality — downtime costs organizations an average of $9,000 per minute. That's why companies are increasingly investing in incident management tools to reduce disruption and improve their incident response. But with the market evolving rapidly and new players emerging constantly, selecting the right tool has become more challenging than ever.

I Want My Shoes Fast! Observability, SRE Burnout, and OTel with Dynatrace's Adriana Villela

In this episode, we sit down with Adriana Villela, Principal DevRel at Dynatrace and OpenTelemetry contributor to break down how observability impacts reliability. We dive into what contributes to SRE burnout and how managers can create psychologically safer spaces for responders. Adriana also shares her perspective on AI as an observability-buddy to navigate incidents.

ITSM vs. ITOM: What are the key differences?

IT service management (ITSM) and IT operations management (ITOM) both have the mandate to ensure your organization’s IT systems and infrastructure run smoothly and efficiently. These two frameworks are essential for any modern IT environment, but their roles are often confused or misunderstood. Simply put, ITSM focuses on the user-facing side of IT, streamlining services and aligning IT processes with business objectives.

Shorten your MTTR with Checkly Traces

We all know that Checkly is a ‘secret weapon’ for engineering teams who want to shorten their mean time to detection (MTTD). With Checkly, you can know within minutes if your service is unavailable for users, or acting unexpectedly. In this article we’ll talk about how Checkly traces can help you expand on the benefits of Checkly, adding insights that will help you diagnose root causes, and further reduce your mean time to resolution (MTTR) for outages and other incidents.

Your New Retrospective Experience: More Collaborative, Customizable, and Powerful

Run smarter, more effective retros. Customize retros, collaborate in real time, and surface key insights faster with AI. The new experience empowers you to spend less time documenting and more time working together as a team to uncover the insights that lead to real improvements in your process, roles, and technology.

Weaving AI into SIGNL4

Over the past two years, artificial intelligence (AI) has experienced remarkable growth, significantly influencing various sectors and daily life. In 2023, the release of advanced large language models (LLMs), such as OpenAI’s GPT-4 and Google DeepMind’s Gemini, marked a pivotal shift by enabling AI systems to process and generate diverse data types, including text, images, and audio.

PagerDuty Operations Cloud Spring 25 Release: Reimagining Operations in the Age of AI and Automation

Operational excellence isn’t just a goal—it’s critical for survival for all companies. And, when powered by AI and automation, it’s a strategic competitive differentiator. With over a decade of AI and ML experience in our platform, PagerDuty pioneered the Incident Response space. And now, PagerDuty is redefining what modern operations can look like in the era of AI and automation.