With an almost 20-year career in social services—including working in institutions such as The University of Chicago, the United States Peace Corps, Chicago public schools, Child Protective Services, the Catholic Church, and for the U.S. federal government in probation and parole programs—my leap into Silicon Valley was as much a culture shock as the 2.5 years I lived in the Andes Mountains of Ecuador.
If you’ve ever been on call, you know that the incidents don’t stop because you have the flu. Or when you’re attending your child’s high school graduation. Or, as I found out firsthand, even when you’re at your own wedding. Confucius once said, “If you have never had a major occasion happen while you are on call, then you may not have ever lived.” (Okay, I totally made that one up.)
IT Operations, DevOps, and Developer teams count on PagerDuty’s 300+ integrations to power their end-to-end real-time digital operations, no matter which tool stack they use. Because PagerDuty’s customers span all sizes, industries, and digital maturity levels, our product team is constantly talking to customers about which tools they use for needs like communications, APM, and IT Service Management (ITSM).
Do any of these sound familiar? One of your best engineers just put in notice that they are taking a job elsewhere because the on-call load while working for you is destroying their personal life—but you honestly thought things were fine.
“I need to be notified if there’s a significant event ongoing with SignalFx.” This is what I tell my team. However, despite being the CTO of a monitoring company, creating the right set of alerts for me to stay informed of incidents in progress or potential issues was harder than it seemed at first glance. Why?
Have you ever gotten that dreaded text from your boss: “The site is down”? Maybe you were meeting with a customer. Or having dinner with your family. Maybe you were presenting at a conference. Doesn’t matter. Whatever else you were doing, now you’re doing emergency incident communication too. You check in with your team leads and confirm there is a problem. You let your boss know the response is under way.
We’re back with another employee spotlight! Last month, we spoke to Senior Front-End Developer Mark Smith, who works out of our San Francisco office. This month, we (virtually) crossed over to the opposite coast and spent some time getting to know Marguerite des Trois Maisons, who works out of our new Toronto office as the product owner on our Site Reliability Engineering (SRE) team.