Operations | Monitoring | ITSM | DevOps | Cloud

PD Summit21: The Netflix Reliability Story: A Brief History of How We Evolved Resilience to Failure

In Netflix engineering, we’re driven by ensuring Netflix is there when you need it to be. We strive to provide a service that people love and can enjoy anytime, anywhere. An important foundation for bringing our customers joy is a strong focus on reliability that ensures Netflix will be available when they need it. In this talk, I’ll tell the story of how we've grown our reliability practices over time to meet the changing demands of microservices and distributed computing.

PD Summit21: Adopting and Maturing to Service Ownership with PagerDuty and Rundeck

Among the common goals of today's engineering and operations teams is to adopt a culture of service ownership: ""You build it, you own it."" As with many ancillary objectives to driving DevOps across an organization, this is easier said than done. Sometimes this is in small part due to the technology stack/architecture of a given company. But more often than not, this is because teams lack the human-to-technology mechanisms that allow for a culture of service ownership.

PD Summit21: Migrating to L1 Support to PagerDuty

Learn how Maersk transitioned from operating with an L1 support team to using PagerDuty to drive an efficient operational support model. In this talk you will learn how implementing PagerDuty within the platform SRE team was part of a major re-org with the goal of driving a new operations model for a highly available (99.999%) platform that lead to outstanding results. At Maersk, we saw increased efficiencies and reduced TTR along with other significant advantages of using PagerDuty from both on-call and management perspectives.

PD Summit21: AWS and PagerDuty: Better Together -- A Digital Transformation Journey

PagerDuty’s platform for real-time operations helps teams manage a complex transition from siloed and centralized approaches to multiple, distributed teams supporting a hybrid cloud infrastructure. To make this journey successful, one thing is clear: your people, technology, and operational processes need to be aligned in real time. That’s why we’re continuing to invest in our partnership with AWS. The integrations we’re bringing to market have always been centered on unlocking AWS’s unprecedented scale and agility for our joint customers.

PD Summit21: Responding to Chaos with Gremlin and PagerDuty

Incident response is something you hope to never need, but when you do, you want it to go smoothly and seamlessly. Normally the knowledge of how to handle incidents within your company will be built up over time, getting better with each incident. While tools such as PagerDuty's Major Incidents Application can help you recover quickly, the process you follow is just as important. This documentation will allow you to learn from the start something which has taken us years to build up. Giving you a head start on how to deal with a major incident in a way which leads to the fastest possible incident recovery.

Evolving in CloudOps Maturity? Investing in People and Teams Pays Off

CloudOps is on the up. This is in part due to the rapid acceleration of the shift to cloud that was caused by the pandemic. The shift allowed companies to innovate faster, enjoy greater flexibility and scalability, and become more cost efficient. Many organizations who rapidly adopted cloud or increased their usage now realize that they need to better manage their cloud investments in order to fully embrace these benefits.

HUG Relies on PagerDuty When Healthcare Incidents Arise

The Geneva University Hospital (HUG) is one of the five university hospitals in Switzerland and one of the largest hospitals in Europe. Pierryves Fournier, SRE Team Lead at HUG, explains how PagerDuty and Rundeck help automate his team's incident response process, empowering the right action when seconds matter.