Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Twelve Key Learnings from PagerDuty People Team's Generative AI HackWeek

Sometimes innovation requires ideas unconstrained by traditional structures and removed from day-to-day responsibilities. It was in this spirit that PagerDuty’s People HackWeek–a friendly competition to explore how generative AI might impact the future of HR–was born.

The balancing act of reliability and availability

As consumers, we expect the products and software we buy to work 100% of the time. Unfortunately, that’s impossible. Even the most reliable products and services experience some disruption in service. Crashes, bugs, timeouts. There are a ton of contributing factors, so it's impossible to distill disruptions down to a single cause. That said, technology is becoming more and more sophisticated, and so is the infrastructure that supports it.

The Unplanned Show, Episode 13: Jake Cohen and Generative AI for Automation

On the heels of the public beta opening for AI-generated runbooks in Runbook Automation, we asked Jake Cohen from product management about how this is different from generating code with something like chatGPT or various AI-powered code completion tools available. We get into prompt engineering, managing output quality, and privacy and security concerns.

A better Grafana OnCall: Delivering on features for users at scale

Enterprise IT is just a different animal. Whether it’s operating at scale, undertaking massive migrations, working across scores of teams, or addressing tight security requirements, engineers at these organizations can face different obstacles than their counterparts at smaller organizations and startups.

Transformation in Travel: Our Q&A with TUI's Head of Technology

The travel industry is experiencing an unprecedented surge in demand from people seeking adventure and eager to explore new destinations. Given an abundance of choice and the desire to have a personalized experience, customers are turning to tour operators to remove complexity from planning so they can focus on the holiday and not on the process of planning it.

TUI Powers Outstanding Digital Experience for Customers with the PagerDuty Operations Cloud

PagerDuty Operations Cloud is essential infrastructure for TUI, enabling agility and cost efficiency to deliver outstanding digital experiences for customers. With PagerDuty’s AI and automation capabilities, TUI has streamlined incident management—reducing downtime and boosting customer bookings. Hear more in this video from Yasin Quareshy, Head of Technology at TUI.

Implementing Zero Trust: A Practical Guide

According to the Harvard Business Review, 2022 saw more than 83% of businesses experiencing multiple data breaches. Ransomware attacks, in particular, were up 13%. With cyber security being such a hot topic for business owners, it’s no surprise implementing a zero trust policy has become so important. In this guide, we’ll cover how to implement zero trust and why it’s important for your business to do so. Let’s get started.

Mastering Incident Resolution: Process and Best Practices

For DevOps and IT teams, incident resolution is an important aspect of predicting, resolving, and documenting service disruptions. It refers to the part of the incident management process where responders restore the service to functioning. Modern technology has come a long way, but it’s not without flaws. When businesses suffer from cyber-attacks, system crashes, and network outages, it impacts the organization on many levels.

The connection between incident management and problem management

Sometimes, two concepts overlap so much that it’s hard to view them in isolation. Today, incident management and problem management fit this description to a tee. This wasn’t always the case. For a long time, these two ITIL concepts were seen as distinct—with specialized roles overseeing each. Incident management existed in one corner and problem management in the other. Then came the DevOps movement and the lines suddenly became blurred. So where do they stand today?