Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Episode 23: Zero-Downtime Updates with Todd Whitney

With limited error budgets and low user tolerance for maintenance window, the ability to execute routine updates without a maintenance window is an increasingly important socio-technical capability. Hear from Todd Whitney, who recently spoke at HashiConf about how PagerDuty performs updates while upholding its promise to customers of taking zero maintenance windows.

How MSPs and MSSPs can reduce risk and liability for their clients

For 83% of companies, a cyber incident is just a matter of time (IBM). And when it does happen, it will cost the organization millions, coming in at a global average of $4.35 million per breach. Add to that stringent data protection laws and the growing frequency and reach of ransomware and other sophisticated attacks.

Impressions from Gartner IOCS 2023

Gartner’s IT Infrastructure, Operations & Cloud Strategies Conference (IOCS) is an annual event that attracts ITOps, SRE, and DevOps leaders from around the world. As Gartner explains, IOCS “brings the world’s technology leaders together to hear top trends, find objective answers, and explore topic coverage in addition to best practices. Gain the insights and guidance to create an effective pathway to the future and network with your peers.”

Why monitoring your application is important

Effective monitoring and observability tools are critical for modern enterprises. Daily operations, digital transformation, moving to a cloud-native architecture, and an ever-evolving tech stack all require ITOps, DevOps, and SRE teams to monitor increasingly complex systems. So what happens if your applications suddenly cease to function? Every moment of downtime translates to lost income, decreased customer satisfaction, and harm to your company’s reputation.

APAC Retrospective: Learnings from a Year of Tech Turbulence

Throughout 2023, one thing has become abundantly clear: regardless of an organization’s size or industry, incidents are inevitable. Recently across the APAC region, we’ve seen numerous regulatory bodies clamp down on large companies who are failing to provide acceptable service, with some handing out quite severe penalties. For many, the cost of an incident is no longer just lost revenue and customer trust, but financial penalties and business restrictions.

The Debrief: A year in review-2023 at incident.io

What a year 2023 was at incident.io! While it's hard to summarize 365 days into just a few sentences, a handful of moments stood out from this transformative year: So as we close the curtain on a momentous 2023, we sat down with the three co-founders of incident.io—Chris, Stephen, and Pete—to do a bit of reflection on the wild ride that was this year.

All I want for Christmas... from Slack

When declaring and responding to an incident with incident.io, most of your interactions with our product will go via Slack. You might configure your forms in our web dashboard, but the responder using them to declare an incident is most likely doing so from a Slack modal, and the incident announcement will be posted as a Slack message. This means a lot of our product design falls within the constraints of what we can build using Slack’s block kit.

Understanding ServiceNow Incident Management: A comprehensive guide

You’re focused on swiftly identifying, analyzing, and resolving disruptions in IT services. And you know all too well that correctly deploying and adopting incident management holds the key to delivering a more reliable and responsive IT environment for your applications and services. That’s why you’re using or are considering using ServiceNow’s incident management to ensure a structured and efficient approach to handling your IT service incidents.