Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Root Cause Changes: are they the "Elephant in the NOC?" Here's the CTO Perspective

Ask any IT Ops practitioner what the first question they ask is when joining an emergency bridge call, and you’ll get the same answer: “What changed?” Our customers report that changes in their IT environments cause 60% to 90% of the incidents they see. Yet for some reason enterprises still find it difficult to deal with changes and correlate them to the IT incidents they may have caused.

New Integration: Create Zoom incident bridges automatically

Incident response doesn’t only happen in Slack, so today we’re happy to announce our integration with Zoom to create incident bridges automatically. Using the power of FireHydrant Runbooks, a Zoom meeting can be added with fully customizable titles and agendas based on your incident details. Let’s dive into how it works.

Evan Niedojadlo from Peddle shares his thoughts on being an SRE

Evan Niedojadlo is an SRE at Peddle based in Austin, TX. He is currently on a small team and works on the SRE, Ops, and Security area of the organization. In his free time, he enjoys building communities, reading, music, helping others learn, and being outside.

OpenEMR-OnPage Integration

The real value of OnPage-OpenEMR integration goes beyond sending information back and forth between a mobile device and OpenEMR. It includes the ability to send detailed, contextual, high-priority intelligent alerts to the care team within seconds, so that when a time-sensitive event, such as when a STAT order or a lab-result is detected on the system, the care team is instantly notified on mobile. No more logging into EHR systems multiple times in a day to keep track of important patient-related updates!

Rein in Your Incidents: Incidents and Alerts Foundations

Solving incidents is hard. Depending on your current situation, you may also be losing a lot of time figuring out what notifications constitute an incident. This results in more and more lost time as every notification must be triaged as a potential incident before you can proceed to move to resolve or disregard (as a non-incident). All this may sound very cumbersome, but the fastest way to improve is to learn and define what incidents are. And you’re in luck!

On-Call Scheduling: Building a Winning On-Call Schedule for Your Team

On-call scheduling enables 24/7/365 availability of service providers for critical issues like system downtime, technician response for critical systems, and patient care. Learn about the importance of on-call schedules for your organization and its customers, how to design an on-call schedule, and multiple ways you can build an on-call scheduling program that will improve customer response and make staff happier.

OnPage Mentioned in Gartner's Hype Cycle for Clinical Communication and Collaboration

Clinical communication and collaboration (CC&C) systems enhance care coordination to improve the patient experience. The systems are equipped with secure mobile messaging, allowing care teams to ditch their insecure pagers for HIPAA-compliant smartphone applications. Gartner, the global leader in tech research, has released its Hype Cycle for Real-Time Health System (RTHS) Technologies, 2020.

Summit EMEA: How Vodafone Is Enabling Immutable Telemetry

In June, we were delighted to host our first ever virtual PagerDuty Summit EMEA! Llywelyn Griffith-Swain, SRE Manager, and David Jambor, Head of Systems Engineering at Vodafone, were among our speakers. They outlined Vodafone’s approach to achieving immutable telemetry. David opened the session by defining Vodafone’s strategic goals. “Our vision is to create an engineering-driven culture,” he explained. “We want to empower development teams to be self-sufficient.

Why Observability Matters to Site Reliability Engineers

This is the first in a three-post series themed around Ops-led DevOps, where I’ll explore the relationship between observability and a set of software delivery lifecycle practices that support the adoption of DevOps practices and the transition from project to product-centric ways of working. I’ll start with Site Reliability Engineering, move onto Value Stream Management and finish with Continuous Delivery.