Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Multi-Agent Architectures - What we shipped, what broke, and what we'd do differently

At LLMday Lisbon, our Software Engineer, Viktor Vasylkovskyi, highlights the realities of building production AI agents with LangGraph - sometimes getting it right, often learning the hard way. This talk is about what was actually shipped, including a distributed multi-agent setup at PagerDuty. Viktor breaks down the real tradeoffs between LLM-driven and deterministic orchestration, what broke, and how he’d approach it differently now.

6 use cases for agentic AI in major IT incident management

Enterprise IT operations leaders are realizing that legacy incident management processes cannot keep pace with today’s sprawling, hybrid-cloud enterprise environments. Enterprise IT doesn’t look anything like it did even five years ago. Hybrid cloud architectures, distributed microservices, and increasingly rapid CI/CD cycles have increased the speed and complexity of IT operations by orders of magnitude, leaving ITOps teams struggling to keep up.

Making Critical Incidents Impossible to Ignore - Derdack SIGNL4 - The Alerting Experts

In this episode, Doreen Jacobi talks with Henri-Paul Bourassa, IT Administrator at exo, the public transit organization serving the Greater Montréal area. Like many IT teams responsible for around-the-clock operations, Henri-Paul's team already had monitoring in place. The challenge wasn't finding issues - it was making sure the right people were alerted quickly enough to respond.

Incident Management Teams: Ready for Critical Situations

A malfunction in the baggage handling system at Berlin Brandenburg Airport disrupts the conveyor network that transports luggage across the airport. With more than 70,000 passengers traveling through BER every day and flight schedules timed down to the minute, even a small disruption can quickly lead to delays, missed connections, cancellations, and high costs. Fortunately, the Incident Management team receives the alert in real time and responds immediately.
Featured Post

From firefighting to forward planning: a practical route to operational innovation

Operational innovation is often treated as a back-office efficiency exercise, but in practice, it is becoming a strategic discipline. As AI moves deeper into day-to-day operations, technical leaders need a clearer way to cut toil, reduce risk and build the capacity to innovate. For many operations teams, it starts with incident management. When responders are trapped in noisy alert streams, manual escalations and fragmented workflows, innovation is pushed aside by the urgent work of keeping services available.

We redesigned Spike

Last Christmas, after everyone had gone quiet for the holidays, I sat down with a pen and some paper and started drawing Spike. Not the Spike we actually had, but the Spike I wanted, the one I had been carrying around in my head for a long time without ever really putting it down anywhere. A little while later I brought a few of those screens into Figma and showed them to the team over coffee one afternoon.

Top Mobile Incident Notification Systems for IT Teams 2026

Modern IT incidents don’t stick to a 9-to-5 schedule. System failures, security breaches, and performance degradations can happen at any time, and today’s distributed teams must respond instantly, wherever they are. The ability to receive, acknowledge, and manage incidents directly from a smartphone is no longer a luxury—it’s a core requirement for effective incident response in 2026.

How to Reduce On-Call Burnout in IT Teams

On-call duty is a high-stakes reality in modern IT and digital ops teams. While essential for ensuring system reliability, the chronic stress it creates doesn’t have to be a given. On-call burnout is a serious threat to your team’s well-being and your organization’s performance, but it isn’t inevitable. It’s a systemic problem, not a personal failing.