Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

10 Incident Management Best Practices

Before we dive into the nitty-gritty of incident management, let’s look a bit closer at the actual meaning of ‘incident.’ In the world of IT service management, the official definition for ‘incident’ is an “unplanned interruption to an IT service or reduction in the quality of an IT service.” Whether that means a slowdown in response time or a total system crash, you’re looking at an incident.

The Swedbank Outage shows that Change Controls don't work

This week I’ve been reading through the recent judgment from the Swedish FSA on the Swedbank outage. If you’re unfamiliar with this story, Swedbank had a major outage in April 2022 that was caused by an unapproved change to their IT systems. It temporarily left nearly a million customers with incorrect balances, many of whom were unable to meet payments.

Hello World

It feels great writing this. It's hard to believe that we have been working on Spike.sh full-time for 3 years now. It's been the most rewarding experience of my life. A big thank you to all of our users and your constant feedback, which has only made Spike.sh better month on month. We are - Over the years, we have always kept our heads down and built. During this entire process, we have learnt a huge deal of things when it comes to incidents and how they are being managed.

Debug State Capture for Traditional Infrastructure & Apps

In our previous blogs on Capturing Application State and using Ephemeral Containers for Debugging Kubernetes, we discussed the value of being able to deploy specific tools to gather diagnostics for later analysis, while also providing the responder to the incident the means to resolve infrastructure or application issues.

5 Immediate Business Benefits of Leveraging Domain-Agnostic AIOps

Legacy systems and point solutions are part of any business. And while they have their history and benefits, it’s critical to find a balance for your organization. IT teams have been acclimated to disparate event management and monitoring tools. Now, with massive and rapidly increasing data flow, this disconnect is slowing and paralyzing IT teams.

The Ultimate Guide to Automating and Mobilizing Your Secops Processes with Derdack SIGNL4 and Microsoft Sentinel

The threat and security landscape is becoming increasingly cluttered. As incidents increase, so do alerts and notifications, leading to too many alerts and too few hours to address them. Many businesses work remote and with the ever-present smartphones, we are always on the go. Yet it is essential that security teams receive and prioritize meaningful threats, but that task is easier said than done.

Updating Your Tools for API Scopes

The PagerDuty REST API provides 200+ endpoints for users to programmatically access objects and workflows in the PagerDuty platform. Teams leverage these APIs to streamline creating and managing users, teams, services and other components for their environment. Up until now, access to the REST API has been authorized and authenticated via API Keys.

Sponsored Post

How Runbook Automation can Simplify CloudOps Use

.Organizations in every industry continue their transition to cloud services, and while this may be a step forward in general, it does bring with it its own unique set of challenges. Cloud use, and in particular CloudOps, relies on a complex and intricate infrastructure which is difficult to manage and maintain, and it's a critical part of keeping a business' networks functioning. This makes finding a way to simplify the use of CloudOps a top priority for many businesses, but does a solution exist?