Alerting

opsramp

OpsRamp August 2019: Alert Management, AIOps, Cloud Monitoring, Integrations, APIs

The Summer 2019 release introduced: OpsQ Observed Mode, Learning-Based Auto-Alert Suppression and many more updates to the OpsRamp Platform. This week all OpsRamp customers are being updated to our August 2019 release. Customers and partners should review all the details in our release notes. Here’s a high-level summary of what’s new this month: Observed Mode is now available for Alert Escalation and First Response policies.

victorops

The Complete Template for On-Call Incident Response

Modern Agile practices and DevOps methodologies are leading to faster feature releases even though systems are becoming more complex. With high velocity comes more change and more change leads to more alerts and incidents in applications and infrastructure. So, the only surefire way for DevOps and IT teams to build reliable services is through proactive testing and an efficient on-call incident response plan.

pagerduty

Understanding Systemic Issues: The PagerDuty Health Check Process

Continuous improvement is one of the fundamental tenets of Agile methodology that PagerDuty’s product development teams emphasize. This already works fairly well at the individual team level via retrospective meetings and postmortems but sometimes we don’t notice larger or systemic issues that are outside the control of a single team. This blog will share the process that we use at PagerDuty to uncover those issues, the outcomes we have seen, and how we have evolved that process.

victorops

The Template for Humane Root Cause Analysis

In the traditional IT Infrastructure Library (ITIL) approach to IT service management (ITSM) and IT operations, root cause analysis is required for effective incident management. But, over time, DevOps and IT teams are learning that there’s rarely one single root cause. Sure, one singular action (e.g. a new deployment) can result in one, short-lived incident. But, what about all the other actions leading up to that action?

pagerduty

Optimizing Business Response When Technical Incidents Happen

Most technical incident response plans typically account for stakeholder communications—for both internal teams and external customers. But at PagerDuty, what we’ve learned from our customers is that there’s still a painful and expensive gap in alignment between IT and business teams. To close that gap, we need to focus on what incident response means for business teams.

victorops

Crafting a Comprehensive IT Monitoring Plan

Sysadmins, database admins and other IT professionals are constantly tweaking monitoring tools and trying to create more reliable systems. But, IT infrastructure and applications are constantly shifting underneath the people maintaining them – making it hard to maintain robust services. And, to top it all off, microservices, containerized applications, hybrid cloud infrastructure and faster deployment lifecycles are leading to more complex systems.

PagerDuty Pulse Aug 2019

Catch up on all the exciting things we’ve released over the past several months. In this edition of PagerDuty Pulse, you’ll get insight into our most recent releases, which help teams across the enterprise effectively take action during the most critical moments with the power of data, intelligence, and automation at scale. We’re excited to release and share new enhancements to the core platform, as well as across many of our products (Event Intelligence, Modern Incident Response, Analytics, and Visibility).

Performance Marketing Company Replaces In-House Business Monitoring System With Anodot

In performance marketing, KPIs are directly proportional to revenue. Avantis is a performance marketing company that, prior to Anodot, had used a variety of in-house monitoring solutions and SAS tools to monitor the KPIs that drove their business. But they found their top metrics, such as impressions, clicks, click-through rate (CTR) and postbacks, were often much different than their publishers’.