Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Unplanned, Episode 1: Damon Edwards Rages Against the Ticket Machine

In this, the inaugural episode of “Unplanned”, Dormain Drewitz talks to Damon Edwards about the “capacity conundrum” where everyone is working so hard, but everything takes too long and costs too much. We talk about the “coordination overhead” costs of getting unplanned work done, how generative AI is both adding complexity and offers to accelerate automating as much as you can, and four steps to creating capacity.

Proactive IT: Disaster Recovery Testing

In today's business environment, the continuity of IT systems is crucial to the success of an organization. Unforeseen disasters, such as infrastructure failures or cyber attacks, can severely impact the productivity of your organization. To mitigate these risks, IT departments must develop and implement robust disaster recovery (DR) plans. But, how can you ensure that these plans work effectively in times of crisis?

What Is Root Cause Analysis?

Root Cause Analysis (RCA) is a systematic process designed to uncover the fundamental, underlying issues that lead to IT incidents. These 'root causes' are often masked by surface-level symptoms, making them challenging to identify without a systematic approach. Root Cause Analysis serves as a metaphorical excavation, drilling past the initial problems to discover deeper, hidden issues.

Introducing powerful APIs and webhooks for Grafana Incident

Grafana Incident, Grafana’s powerful incident response tool, comes with a range of integrations out of the box, including Zoom and Google Meet spaces, GitHub and JIRA issues, and even a Google Doc template for post-incident review documents. However, every team has unique needs and workflows, and you may need to integrate with other systems not currently on our roadmap or even use your own in-house tools.