Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How Well Does Your Infrastructure Support Major Incident Management?

Effective major incident management depends on many things, including planning, precise execution, effective communication, and applying learnings from previous incidents to update those plans. Traditional major incident management wisdom addresses the importance of the remediation process, but it doesn’t speak on the issue of configuring your IT infrastructure.

SRE Adoption | A 2-Year Retrospective (From A Business Point-Of-View)

This month I hit my 2-year anniversary with Blameless and as our industry progresses and matures, I thought it would be a good opportunity to look back and review how far we have come and also ruminate on where we’re headed. Our shared vision at Blameless is to help engineering teams adopt reliability practices with ease and advance to a resilient culture.
Featured Post

The State of Incidents and Site Reliability: Q&A with Blameless SRE Architect Kurt Andersen

In the latest of an occasional series, today we hear from Kurt Andersen, SRE Architect at Blameless, discussing the evolution of incident management, current trends in site reliability affecting engineering teams, as well as an update on how Blameless is addressing the needs of SRE and DevOps.

Podcast: Break Things on Purpose | JJ Tang: People, Process, Culture, Tools

For this episode we’re continuing to “Build Things on Purpose” with JJ Tang, co-founder of Rootly, who joins us to talk about incident response, the tool he’s built, and his many lessons learned from incidents. Rootly is aiming to automate some of the more tedious work around incidents, and keeping that consistency. JJ chats about why he and his co-founder built Rootly, and the problems they’re trying to fix and eliminate when it comes to reliability.

Service level objectives: How SLOs have changed the business of observability

Forget the latest tech gadgets and the newest products. One of the most talked about trends in observability right now? “SLOs have really become a buzzword, and everyone wants them,” said Grafana Labs principal software engineer Björn “Beorn” Rabenstein on a recent episode of “Grafana’s Big Tent,” our new podcast about people, community, tech, and tools around observability.

Improve IT Operations with Response Analytics

Your IT team just finished resolving a complex incident, customer service finished their last call about the issue, and your business is back to being fully operational. Now that the storm has passed, you should be planning a postmortem to determine the cause of the incident and lessons learned. Postmortems require specific data that can highlight where your team is succeeding and where they can improve.

What's behind BigPanda's customers' success?

As the Regional VP of Customer Success for the West and Central Region at BigPanda, Chris LaPierre gets a unique opportunity to see first-hand how BigPanda customers use their AIOps platform. Charged with ensuring every BigPanda customer derives high value and return on investment from the solution, BigPanda’s customer success teams make certain customers leverage the AIOps platform to increase their bottom line.

Outage Alert: Top 5 Outages of Q1 2022

By now it’s no secret that system outages and website downtime are more widespread and frequent than ever. In fact, the frequency of outages jumped 9% in just the first week of 2022. This can be attributed to a rapid increase in traffic and reliance on tech infrastructures – resulting in connectivity, server, and other technical issues that are alternately unforeseen and unavoidable.

Managing Burnout | Tips To Minimize The Impact

Burnout is real. Today, the source of burnout can be anything from pandemic fatigue, to the onslaught of political divisiveness, or simply the pace of life worldwide. Whatever the culprit, we’re living in a stressful time. People working in cloud native environments definitely feel burnt out. Silicon Valley investor Marc Andreessen famously said, “Software is eating the world,” and that seems to be quite true. High demand is fueling churn. System and cloud operators feel pressure.