Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

RIA Vendor Selection Matrix for AIOps 2022

In July, the research firm Research In Action (RIA), published the 2022 edition of their annual Vendor Selection Matrix™. Despite AIOps being a well established technology (Moogsoft has customers who have been reaping the benefits of AIOps for many years) selecting a vendor can still be quite difficult, given the plethora of vendors who quickly re-branded their solutions as AIOps. So a vendor selection guide is a valuable resource.

Honeycomb Announces Major Updates to PagerDuty Integration

Today, we’re announcing major new updates to Honeycomb’s PagerDuty integration. These updates put more of the information you need into PagerDuty notifications and allow for greater configurability. These enhancements are available to all users who leverage Honeycomb Triggers and Burn Alerts to send notifications via PagerDuty.

New Feature: New Component Status Types

What’s just as important as resolving an impacted service? Providing detailed yet digestible updates to your communities and stakeholders. A recent update to StatusCast, involves the addition of three new status types that can be assigned to your components. Detailed communications is an essential component of incident response and management, and additional status types provide your users with a more granular view of incident activity.

SignalFlows to SLOs

How are you tracking the long-term operation and health indicators for your micro and macro services? Service Level Indicators (SLIs) and Service Level Objectives (SLOs) are prized (but sometimes “aspirational”) metrics for DevOps teams and ITOps analysts. Today we’ll see how we can leverage SignalFlow to put some SLOs Error Budget tracking together (or easily spin up same with Terraform)!

What you need to know & do to be a world-class cyber incident responder

World-class incident responders are a strategic asset in today’s world where the frequency and sophistication of cyber security attacks continue to increase every year, as do the associated financial damages: As such, more and more organizations are looking to grow their cyber incident response expertise, both with inhouse staff as well as by engaging with third-party experts.

We're making our on-call calculator free

We've all done it: "that'll be simple, I'll just write a quick script and..." In the case of calculating on-call pay, we really have done it before: our team have built the on-call pay scripts for several companies, and each attempt was a painful, error prone process. While we believe everyone on-call should be paid for their inconvenience, relying on someones side-project or back-of-napkin maths to calculate pay leads to mistakes, frustration, and wasted time.

What is a Security Operation Center and how do SOC teams work?

With the growing complexity of IT environments, it is essential to have robust security processes that can safeguard IT environments from cyber threats. In this blog, we will explore how security operation centers (SOCs), help you monitor, identify and prevent cyber threats to safeguard your IT environments. This blog covers the following pointers.

Why you need an incident timeline

We get it – incidents happen. What differentiates resilient teams from others is how they learn from them: using them as an opportunity to find the biggest improvements in how they work. Incident timelines are one of the most simple and effective tools available to you when it comes to learning from an incident. It’s vital that you ensure they’re accurate and useful, in order to make the biggest improvements after an incident.

RESOLVE '22: Measuring what matters

Companies can take big strides toward “preventing preventable” incidents by minding what they measure. What’s in a name? In Measuring what matters, one of the panels at our RESOLVE ‘22 event, the three words in the title reflect a plan successful IT Ops teams have embraced to reduce the complexity of their reporting systems—resulting in a faster path for companies to make more effective use of all the IT resources at their disposal.