Operations | Monitoring | ITSM | DevOps | Cloud

SecOps for the Cloud: PagerDuty and AWS Security Hub

This week at re:Inforce in Boston, the AWS team showed off its Security Hub service—a powerful service that provides SecOps teams a comprehensive view of their high-priority security alerts and compliance status across their AWS accounts. We’re excited to join AWS at re:Inforce this week as a Security Hub partner, where we’ll show users how PagerDuty and AWS Security Hub work together to provide real-time SecOps to any team using AWS.

Glitch List: June 2019

To keep you up-to-date with what’s going on in anomaly detection, we keep an ongoing list of the biggest glitches happening in the business world. Here is what made waves in June. June 25, 2019 When Dutch telco KPN suffered a major outage on the evening of Tuesday, June 25, the 112 emergency number was also knocked out across the country. “We have no reason to think it was (a hack) and we monitor our systems 24/7,” the company spokesperson told Reuters.

Amazon Quicksight ML Anomaly Detection vs. Anodot Autonomous Analytics

Companies invest in anomaly detection in order to proactively identify risks, such as revenue loss, customer churn and operational performance issues. Anomaly detection essentially enhances traditional BI and visualization tools, venturing beyond a summary view of your data. It constantly scans every metric, at a granular level, to find abnormalities. But in order for this technology to have an impact, you must be able to trust it.

Listen to a Recorded Incident Response Call

The PagerDuty Incident Response Process is a detailed document that provides a framework for how to structure your incident response process. But sometimes it helps to understand how these seemingly abstract concepts play out during real-world scenarios. You can now hear an incident call recording that’s based on a real PagerDuty incident. Due to the nature of incident response practices, the process guide we publish is filled with very explicit details regarding a variety of situations.

How Does Google Handle Critical Incidents?

While there are some very good sources out there on how to manage a critical incident, Google also wrote a chapter on incident management in their book, “Site Reliability Engineering”. In this chapter, the folks at Google present their approach to a well-designed critical incident management process.

Russ Savage [InfluxData] | Monitoring, Alerting, and Tasks as Code | InfluxDays London 2019

In this talk Russ will explore how to build tasks, alerting rules, and triggering events inside of InfluxDB 2.0 with the new Flux language. Russ will then showcase how to work this into a regular development flow by using command line tools for testing, source control as the source of truth, and testing against production data.