Operations | Monitoring | ITSM | DevOps | Cloud

September 2020

Automating incident response with Relay and PagerDuty

DevOps and SRE teams are under intense pressure to reduce the Mean Time to Recovery (MTTR) in resolving incidents. The latest integration between Relay and PagerDuty eliminates the “digital duct tape” by creating reusable, event-driven workflows to close the loop on incidents faster through Relay’s event-based automation approach.

Adaptable Incident Response With Splunk Phantom Modular Workbooks

Splunk Phantom is a security orchestration, automation and response (SOAR) technology that lets customers automate repetitive security tasks, accelerate alert triage, and improve SOC efficiency. Case management features are also built into Phantom, including “workbooks,” that allow you to codify your security standard operating procedures into reusable templates.

Datadog and Relay for Incident Response

Datadog is an awesome tool for aggregating and visualizing the metrics that matter to you. Recently, Datadog launched a new Incident Management feature, which allows you to coordinate the activities around a problem that affected your service. In this example, I’ll walk through using Relay to roll back a Kubernetes deployment that caused a service impact, and show how the Datadog Incident timeline can keep everyone working on the incident in sync.