Operations | Monitoring | ITSM | DevOps | Cloud

Chaos Engineering

Why Reliability Engineering Matters: an Analysis of Amazon's Dec 2021 US-East-1 Region Outage

In the field of Chaos Theory, there’s a concept called the Synchronization of Chaos—disparate systems filled with randomness will influence the disorder in other systems when coupled together. From a theoretical perspective, these influences can be surprising. It’s difficult to understand exactly how a butterfly flapping its wings could lead to a devastating tornado. But we often see the influences of seemingly unconnected systems play out in real life.

Podcast: Break Things on Purpose | Carissa Morrow: Learning to be Resilient

Being new in tech an be intimidating! Thankfully, folks like Carissa Morrow are shining examples of how to come into tech from the ground up. Carissa began with a career shift and just started coding, went through the Boise Codeworks bootcamp, and made the jump to tech. Carissa talks about the resilience it took in her early days, and how those experiences reinforced her attitude on continually learning.

Podcast: Break Things on Purpose | Gunnar Grosch: From user to hero to advocate

Reliability and serverless are at the forefront of today’s conversation. For this episode Gunnar Grosch, Senior Developer Advocate at AWS, is here to talk about Chaos Engineering, AWS Serverless, and the work that AWS is doing when it comes to reliability.

If you're adopting Kubernetes, you need Chaos Engineering

When Ticketmaster started their Kubernetes migration, they had to address a huge problem: whenever ticket sales opened for a popular event, as many as 150 million visitors flooded their website, effectively causing distributed denial of service (DDoS) attacks. With new events happening every 20 minutes and $7.6 billion in revenue at stake, outages could mean hundreds of thousands in lost sales.

Getting started with Time Travel attacks

It's the middle of the night when your phone goes off. You rub your eyes and unlock the screen to see a SEV 1 alert from your incident management tool. The application is down, multiple cloud server instances are offline, and the remaining instances are being overwhelmed by the sudden increase in demand. You jump out of bed and start trying to troubleshoot. You log into your cloud provider and try to provision systems manually, only to find out you can't.

Podcast: Break Things on Purpose | Unpopular Opinions

Time for a bit of a review! Join Jason as he looks back on some previous guests who have shared some opinions that range from the idiosyncratic to down right unpopular. Pulling from a handful of “Breaking Things” interviews, Jason covers death to VPNs, to the validity of “AI Ops,” check out the litany!

Chaos & Order: Breaking and Fixing Things in K8s Environments With Komodor & Gremlin

You can’t build a CI/CD pipeline and support fast-paced development cycles without considering continuous reliability. On the one hand, this means being rehearsed and prepared for every scenario. On the other, this calls for a contingency plan for when (inevitably) something will go wrong. Join this live event and see how DevOps tools can help you plan for the best and prepare for the worst, as Julie from Gremlin injects chaos into the Bank of Anthos’ system and Rona from Komodor troubleshoots things back into order.

Podcast: Break Things on Purpose | 2021 Year In Review

For this episode your hosts, Jason Yee and Julie Gunderson, are sitting down for a year in review! With the new year just around the corner, lets take a glance back at a year of chaos...engineering that is. The rest of the chaos we will leave out of the conversation. Julie and Jason talk about their favorite outages of the year. From Fastly to texts from Julie’s mom, we’ve definitely got a heck of a year to consider!