Operations | Monitoring | ITSM | DevOps | Cloud

October 2019

It Came From Below

I’m going to assume most people who read this blog are familiar with PagerDuty. But just in case anyone isn’t, PagerDuty is a tool we use in IT to notify us if some predefined check has failed. Maybe a key process has died or maybe we’re not seeing our expected traffic volume or maybe our server has stopped responding to ping. Whatever it is, PagerDuty will relentlessly, remorselessly, and loudly notify whoever is on call that something needs attention.

Achieve Better Accountability With Full-Service Ownership

Software teams seeking to provide better products and services must focus on faster release cycles. But running reliable systems at ever-increasing speeds presents a big challenge. Software teams can have both quality and speed by adjusting the policies around ongoing service ownership. While on-call plays a large part in this model, advancement in knowledge, more resilient code, increased collaboration, and practice also mean engineers don’t have to wake up to a nightmare.

Blameless Culture Key to Addressing Outage Outrage in Australia

After the unfortunate Commonwealth Bank of Australia outage last week, the powerful Payment Systems Board—whose members include the chairs of the RBA and APRA – announced it would make all outage data public to prevent banks, payment schemes, and telecommunications carriers from “hiding behind” the performance statistics shared by each institution.

Service Monitoring and You

Monitoring is an art form. That sounds cheesy and lazy, but the right kind of monitoring is very context-dependent and rarely does the same practice work across multiple pieces of software or people. This gets even harder when you think about modern software architectures. Microservices? Container schedulers? Autoscaling groups? Serverless? ${New-technology-that-will-solve-all-of-my-problems-but-probably-creates-other-problems}?

Happy National Cybersecurity Awareness Month!

October is the month of spooky scares, so it makes sense that National Cybersecurity Awareness Month is also recognized at this time—after all, what’s more scary than, for example, having someone phish for your personal information and using said info to ruin your credit or losing your password to hackers so they have access to your bank account?

Modernizing Your Digital Operations with Sumo Logic and PagerDuty

As digital transformation continues to be central to an organization’s growth mandate, it’s critical to ensure that customer-facing, revenue-generating, mission-critical applications are operationally reliable and secure. That’s where Sumo Logic comes in—for almost 10 years, we have been providing a Continuous Intelligence platform for DevSecOps that’s utilized by over 2000+ customers in almost every vertical.

Making the Most of PagerDuty + Datadog

For your team to effectively respond to incidents, you need a shared, unambiguous incident definition so you can recognize when an incident has occurred and assign the appropriate severity. Definitions of an incident differ across teams, but whatever definition you use, identifying and monitoring key service level indicators (SLIs) can help you understand when your service is operating normally—and when its performance has degraded to the point where you need to trigger an incident.

Vodafone Utilizes PagerDuty to Better Understand Their Real-Time Operations

Vodafone is a telecommunications company providing 4G network coverage for 18 million customers and 99% of the United Kingdom’s population. Ben Connolly, Head of Digital Engineering at Vodafone, details the challenges that his engineering teams were facing and why PagerDuty was the perfect fix. PagerDuty helps Vodafone deliver a better customer experience by allowing their teams to see the impact that they're having in real time.