Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The alert fatigue dilemma: A call for change in how we manage on-call

Once the unsung heroes of the digital realm, engineers are now caught in a cycle of perpetual interruptions thanks to alerting systems that haven't kept pace with evolving needs. A constant stream of notifications has turned on-call duty into a source of frustration, stress, and poor work-life balance. In 2021, 83% percent of software engineers surveyed reported feelings of burnout from high workloads, inefficient processes, and unclear goals and targets.

Lessons learned from building our first AI product

Since the advent of ChatGPT, companies have been racing to build AI features into their product. Previously, if you wanted AI features you needed to hire a team of specialists to build machine learning models in-house. But now that OpenAI’s models are an API call away, the investment required to build shiny AI has never been lower. We were one of those companies. Here’s our journey to building our first AI feature, and some practical advice if you’ll be doing the same.

Does Every Incident Need a Retrospective? Here's What the Experts Have to Say

Every quarter, we host a roundtable discussion centered around the challenges encountered by incident responders at the world’s leading organizations. These discussions are lightly facilitated and vendor-agnostic, with a carefully curated group of experts. Everyone brings their own unique perspective and experience to the group as we dive deep into the real-world challenges incident responders are facing today.

Never miss machines malfunctioning with ilert integration for Tulip

Downtime costs money. That's why an effective incident management system is crucial. We're excited to announce our new partnership with Tulip to help manufacturers manage incidents better. This integration is an important advancement for complex production processes that require an in-depth operational strategy.

Mastering incident resolution through Root Cause Changes

Discover a new way to handle incident resolution with our Root Cause Changes (RCC) feature. This tool optimizes incident management by linking incidents with relevant changes, resulting in a significant reduction in resolution time and an overall improvement in operational efficiency. Explore the world of incident resolution with our advanced RCC feature and unlock its benefits.

StatusCast : Conquer the Storm

Embark on a journey to conquer the storm with StatusCast! Watch our latest video to discover how our powerful incident communication and status page solutions empower you to navigate through challenges seamlessly. Unleash the potential to communicate effectively during disruptions and emerge stronger. Don't miss out—watch now and revolutionize your incident management game!

From Amazon to Apple: Key Strategies for Operational Excellence in Tech

Jim Gochee, CEO of Blameless with a history at New Relic and Apple, Ken Gavranovic, COO of Blameless and an Amazon Best Selling Author with experiences at Cox, Web.Com, and Unqork, and Lee Atchison, Chief Reliability Officer at Blameless, noted for his work on Amazon BeanStalk and as the author of "Architecting for Scale," with roles at AWS, HP, and New Relic, will guide this session.

Incident Response Plans: The Complete Guide To Creating & Maintaining IRPs

Speedily minimizing the negative impact of an information security incident is a fundamental element of information security management. The risks — loss of credibility in the eyes of users and other stakeholders, loss of business revenue and critical data, potential regulatory penalties — can significantly jeopardize your organization’s mission and objectives.

8 Strategies for Reducing Alert Fatigue

Site Reliability Engineers (SREs) and DevOps teams often deal with alert fatigue. It's like when you get too alert that it's hard to keep up, making it tougher to respond quickly and adding extra stress to the current responsibilities. According to a study, 62% of participants noted that alert fatigue played a role in employee turnover, while 60% reported that it resulted in internal conflicts within their organization.