Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

How to use CloudWatch to generate alerts from logs

There are more than a million people using Amazon Cloud products, so it follows that many customers are employing an AWS integration with their Opsgenie instance. One common use case involves creating Opsgenie alerts from CloudWatch Logs to help stay ahead of issues and prevent incidents. CloudWatch Logs is an AWS log storage and monitoring feature that collects logs from all systems, applications, and AWS services in a single place.

The State of Unplanned Work: Key Findings

It’s a new world order: Skynet has taken over. Just kidding. But it sometimes feels that way, doesn’t it? In the words of Marc Andreessen, software is eating the world, and technology problems are now business problems. This means developers are now the architects of the digital experience and, by extension, the customer experience—and when said developers are unable to innovate quickly, companies are more exposed to competitive threats.

Why Escalations are Important to Clinical Communications

Unexpected events make the healthcare profession one of the most challenging industries to navigate and plan for. Sudden, abrupt patient situations tend to occur, increasing the workload of healthcare providers. Similar, process efficiencies and productivity are a reflection of the care team’s ability to communicate. When teams are on the same page, patient wait times are significantly reduced and results are improved.

RetroDuty: How We Scale Continuous Improvement Beyond Engineering at PagerDuty

If you’ve worked on a team that has adopted Agile techniques, you’ve probably heard of a retrospective. If not, here’s the TL;DR: A retrospective is a meeting in which a team connects regularly to reflect on what happens throughout a project and continuously improve how they work moving forward.

Meet Root Cause Changes from BigPanda - IT Ops, NOC and DevOps Teams' Best friend For Supporting Fast-Moving IT Stacks

TL;DR: Fast-moving IT stacks see frequent, long and painful outages. Thousands of changes – planned, unplanned and shadow changes – are one of the main reasons behind this. Until now, IT Ops, NOC & DevOps teams didn’t have an easy way to get a real-time answer to the “What Changed?” question – the answer that can help reduce the duration of outages and incidents in these fast-moving IT stacks. Now, with BigPanda Root Cause Changes, they do.

Rise of the Digital Operations Ecosystem

Many organizations today are dealing today a lot of complexity and disconnected tools. Teams and departments are running in parallel but siloed from each other. People are burned out from a lot of manual work, and everyone is crunched for time. This is not a happy ecosystem to live in. If this digital ecosystem doesn’t work together, your teams don’t know what’s going on and they lack the right information.

Enable SSO and MFA by adding SIGNL4 as an enterprise app in Azure Active Directory

This article describes how SIGNL4 can be generally authorized as an enterprise app for Azure AD users (Marketplace Link). This is important if you want to implement the use of SIGNL4 in your company with existing user accounts from the Azure AD.

Drive continuous improvement with shareable postmortems in Opsgenie

It’s a given that customers expect software and IT services to be high-performing and always on. And, because incidents and downtime will always be a thing, we believe that how you respond can make or break the customer experience. We’ve learned this lesson first hand while refining our own incident management process over the last decade.