Operations | Monitoring | ITSM | DevOps | Cloud

March 2022

Getting started with DNS attacks

Whenever an online service goes down, you're likely to hear three words: "it was DNS!" Blaming DNS might be a running joke among network admins and engineers, but it's one rooted in experience. DNS problems are known for causing massive, Internet-wide outages such as the 2021 Akamai outage that temporarily made the websites for Delta Air Lines, American Express, Airbnb, and others unreachable.

Getting Started with Gremlin Attacks

Gremlin provides a variety of ways to test the resilience of your systems, which we call "attacks". Running different attacks lets you uncover unexpected behaviors, validate resilience mechanisms, and improve the overall reliability of your systems and services. This ebook explains each of Gremlin's attacks in complete detail, including what each attack does, how it impacts your systems, and the technical and business objectives the attack helps solve.

Podcast: Break Things on Purpose | Chris Martello: Day of Darkness

Dad jokes lead the way in this episode as we interview Chris Martello, manager of application performance at Cengage. Chris is a wearer of many testing hats, but his passion is chaos and breaking things on purpose. Chaos was a natural fit for Chris with his background as a middle school science teacher, so when he made the jump to tech chaos engineering was a natural fit.

Getting started with Packet Loss attacks

Imagine this: you're in the middle of an important presentation when all of a sudden your video feed starts to stutter. You hear other people speaking, but their words are choppy. A message comes through Slack from one of your co-workers: "I think your connection cut out." You scramble to try different solutions—restarting your videoconferencing application, checking your Internet connection, switching to your phone—but ultimately, your presentation gets cut short.

The Dual Approach in Scaling: Chaos Engineering and Performance Engineering

For any enterprise, they're more than likely all too familiar with the struggles and complexities of scaling their environments and applications. Whether these applications live on premise, in a cloud environment, or somewhere between in a hybrid state, an age-old question engineering ponders on is, “Can my application and environment scale?

Podcast: Break Things on Purpose | Alex Solomon & Kolton Andrus: Break it to the Limit

Time for a cross over! Today Page it to the Limit host Mandi Walls, DevOps Advocate at PagerDuty joins Julie for a special episode. In this two part episode, Julie and Mandi interview Kolton Andrus, co-founder of Gremlin and Alex Solomon, co-founder of PagerDuty. Each of them share the origins of their respective companies, how they build amazing cultures, and some of the fun anecdotes along the way.

Getting started with Latency attacks

As the world becomes more dependent on cloud-native systems, the tolerance for slow services is decreasing. Users expect instantaneous access to services, whether it's for work, entertainment, or even cloud infrastructure. Even small amounts of latency can significantly decrease user satisfaction: nearly half of all users expect web pages to load in under two seconds, and as many as 28% of users will permanently abandon a slow site.