Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Best Practices to Avoid Website Outages on Black Friday

The most frenzied shopping day of the year – Black Friday – is fast approaching, and businesses around the globe are bracing themselves. However, imagine this – a massive number of eager shoppers ready to snag the hottest deal, and just when your website should be working at its best, it crashes, leaving behind frustrated customers and potential revenue slipping through your virtual fingers. This scenario is not entirely fictional.

Resilience Engineering in 2024: Challenges, Trends, & Priorities

Is your organization ready to fortify, expand, and cultivate a robust resilience engineering culture in 2024? In this webinar Chris Evans (Co founder & Chief Product Officer, incident.io) and Courtney Nash (Internet Incident Librarian, The VOID) will delve into crucial considerations and top priorities for improving your organization’s ability to build safer and more reliable complex systems while unlocking insights for shaping your plans for 2024 and beyond.

Quick start guide to Unified Analytics dashboards

When it comes to observability, we’ve found that most organizations have ~20 tools installed in their IT environments. With so many tools, it’s difficult for IT leaders to gain insight into how their tools are performing and determine how much value ITOps is bringing to the organization.

Weathering Black Friday and Other Storms Reliably

If you work in eCommerce, you can see the storm on the horizon. Black Friday, the biggest shopping day of the year both online and off, is only a few days away. Your services are going to hit usage spikes you possibly have never seen before. And it will be all aspects of your services pushed to your limit – people won’t just be searching, or just buying, or signing up for programs, they’ll be doing all of these at once. ‍ Most crucially, everyone else is offering deals too.

Should data teams consider incident management tools to respond to pipeline issues?

Data teams are adopting more processes and tools that align with software engineering, and from talks at the dbt Coalesce conference in 2023, there’s clearly a big push towards adopting software engineering practices at enterprise scale companies. At the moment, there are a lot of tools in the data space for identifying errors in data pipelines, but no tools for responding to these errors, such as coordinating fixes. This is exactly where an incident management platform makes sense to implement.

Guide To Best Incident Management Software

Avoiding downtime is imperative. To keep you sturdy against any unplanned disruptions there are Incident Management tools ensuring quick response, efficient resolution, and minimal impact on operations. This blog aims to be your go-to guide for navigating the diverse landscape of Incident Management platforms.

Captains Log: How we are leveraging CEL for Signals

As engineers, we didn't want to make Signals only a replacement for what the existing incumbents do today. We've had our own gripes for years about the information architecture many old companies still force you to implement today. You should be able to send us any signal from any data source and create an alert based on some conditions. We're no strangers to building features that include conditional logic, but we upped the ante when it came to Signals.

IAG Relies on PagerDuty Operations Cloud for Sustainable Growth

Part of the International Airlines Group (IAG), IAG Loyalty operates the loyalty programs for IAG’s airlines—British Airways, Iberia, Vueling and Aer Lingus—and 125+ global brand partners in travel, retail, and financial services. With the PagerDuty Operations Cloud, IAG Loyalty has built a framework that allows engineers to build products and services in a fast and safe way. This has laid the foundation for sustainable growth as a company. Hear more in this video from Colin Lewis, Head of Core Engineering at IAG Loyalty and James Headon, Cloud Operations Manager at IAG Loyalty.