Operations | Monitoring | ITSM | DevOps | Cloud

Incident Ready: How to Chaos Engineer Your Incident Response Process

We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? In this video, FireHydrant CEO, Robert Ross, shared how our customers leverage best practices to break, mitigate, resolve, and fireproof incident processes.

Incident Ready: How to Chaos Engineer Your Incident Response Process | FireHydrant

We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? In this video, FireHydrant CEO, Robert Ross, will share how FireHydrant customers leverage best practices to break, mitigate, resolve, and fireproof incident processes. We’ll show you how to use chaos engineering philosophies to stress test 3 critical parts of a great process.

Automating incident response with Relay and PagerDuty

DevOps and SRE teams are under intense pressure to reduce the Mean Time to Recovery (MTTR) in resolving incidents. The latest integration between Relay and PagerDuty eliminates the “digital duct tape” by creating reusable, event-driven workflows to close the loop on incidents faster through Relay’s event-based automation approach.

Adaptable Incident Response With Splunk Phantom Modular Workbooks

Splunk Phantom is a security orchestration, automation and response (SOAR) technology that lets customers automate repetitive security tasks, accelerate alert triage, and improve SOC efficiency. Case management features are also built into Phantom, including “workbooks,” that allow you to codify your security standard operating procedures into reusable templates.

Datadog and Relay for Incident Response

Datadog is an awesome tool for aggregating and visualizing the metrics that matter to you. Recently, Datadog launched a new Incident Management feature, which allows you to coordinate the activities around a problem that affected your service. In this example, I’ll walk through using Relay to roll back a Kubernetes deployment that caused a service impact, and show how the Datadog Incident timeline can keep everyone working on the incident in sync.

Keeping PagerDuty Always On With Remote Incident Response

Earlier this month, many areas of the internet experienced a major incident caused by a router misconfiguration within a highly used service provider. This led to cascading service failures, causing widespread outages and disruptions for several well-known SaaS organizations. When the outage occurred, our teams at PagerDuty immediately noticed a global spike in events and incidents.

Key Fortinet and Flowmon Integrations: Automated Incident Detection and Response

Flowmon has recently joined Fortinet’s Open Fabric Ecosystem by integrating with FortiGate and FortiSIEM. This cooperation brings automated system for threat detection and response, blocking security risks in their infancy, and giving time to administrators to carry out forensics.