Operations | Monitoring | ITSM | DevOps | Cloud

Incident Response

Datadog and Relay for Incident Response

Datadog is an awesome tool for aggregating and visualizing the metrics that matter to you. Recently, Datadog launched a new Incident Management feature, which allows you to coordinate the activities around a problem that affected your service. In this example, I’ll walk through using Relay to roll back a Kubernetes deployment that caused a service impact, and show how the Datadog Incident timeline can keep everyone working on the incident in sync.

Keeping PagerDuty Always On With Remote Incident Response

Earlier this month, many areas of the internet experienced a major incident caused by a router misconfiguration within a highly used service provider. This led to cascading service failures, causing widespread outages and disruptions for several well-known SaaS organizations. When the outage occurred, our teams at PagerDuty immediately noticed a global spike in events and incidents.

Sending Nagios alerts to Microsoft Teams and rapid incident response with Zenduty

Nagios is one of the most widely used open-source network monitoring software used by thousands of NOC teams globally to monitor the health of a vast array of their hosts and services. Most teams rely on Emails as their primary Nagios alert notification channel, which may take a few minutes to respond to by your NOC team.

Key Fortinet and Flowmon Integrations: Automated Incident Detection and Response

Flowmon has recently joined Fortinet’s Open Fabric Ecosystem by integrating with FortiGate and FortiSIEM. This cooperation brings automated system for threat detection and response, blocking security risks in their infancy, and giving time to administrators to carry out forensics.