Operations | Monitoring | ITSM | DevOps | Cloud

Chaos Engineering With Ana Medina

Recently, I sat down with Ana Medina of Gremlin for a PagerDuty Community AMA! Ana is currently working as a Chaos Engineer at Gremlin, helping companies avoid outages by running proactive chaos engineering experiments. Previously, she worked at Uber as an engineer on the SRE and Infrastructure teams, where she specifically focused on chaos engineering and cloud computing. Catch her tweeting at @Ana_M_Medina about traveling, diversity in tech, and mental health.

The Power Of Operational Reviews

Last fall, we introduced PagerDuty Analytics, a product that combines machine and human response data to provide operational insights that enable organizations to drive process maturity and improved business outcomes. Today, we’re excited to announce that it’s generally available! As part of our expanded Analytics product offering, we’re rolling out a set of prescriptive operational performance scorecards.

Postmortems Part 2: How to Adopt a Learning Culture

Culture is the way we do things together. It’s the secret sauce that results in happy, healthy teams that consistently meet their goals. It’s also the hardest thing to define, cultivate, and change in an organization. True cultural change requires more than creating and communicating policies. It takes collaboration, persistence, and experimentation.

Introducing The PagerDuty Postmortem Guide

Your team had been fighting this major incident for hours, but your investigation was hitting one dead end after another. Finally, you managed to isolate the problem and your graphs started to improve. When all systems went back to normal, everyone let out a collective sigh of relief, shut down the response call, and went back to bed, never to think of this incident again. Or so you thought.

January 2019 Product Update: New Integrations & APIs

To kick off the year, we’re launching a monthly blog series to share new product announcements on an ongoing basis. This month, we’re excited to announce several new integrations, as well as the new global events rule API that empowers admins and developers to easily manage event rules at scale. (Be sure to also check out our platform release notes to stay up-to-date on what’s new.)

The Competitive Advantage Of Teamwork

Have you ever worked on a team where it was a challenge to give constructive feedback or confidently share ideas? At PagerDuty Summit 2018, Patrick Lencioni, author of The Five Dysfunctions of a Team,1 spoke about the importance of encouraging a culture of teamwork, and the role trust and vulnerability play in creating that culture.

The Cost of Operational Immaturity

Digital operational maturity is defined as an organization’s effectiveness at real-time work and ability to focus on performance metrics that improve as the organization becomes more adept at responding to incidents. Based on extensive research and nine years of industry data, in conjunction with a survey of 600+ respondents from across industries, PagerDuty developed a model that identified the four following levels of operational maturity.

SecOps Is Getting Real (Time)

Companies migrating to the cloud need to ensure they have a strong security posture and can meet compliance requirements. Along with ensuring compliance, companies also are faced with the challenge of tying together multiple security tools that generate a high volume of event data across disparate interfaces and platforms. To help address this challenge, a new security service was introduced at AWS re:Invent 2018: AWS Security Hub.