Operations | Monitoring | ITSM | DevOps | Cloud

August 2023

Why Resilience Engineering Needs To Be A C-Level Strategy & How To Get There

The consequences of downtime and data breaches can be devastating to organizations, leading to substantial financial losses and irreparable damage to a business’s reputation. If last week's outage by the Bank of England is anything to go by, after losing trillions of £’s per day due to downtime, resilience shouldn’t just be an afterthought for organizations.

Impact of Kubernetes cluster maintenance on application availability

#kubernetes #eks #chaosengineering
In this video, we will be exploring an interesting scenario that might happen in real life. Let's imagine we have an application running in a Kubernetes cluster inside EKS. If for any reason, two of our three nodes are cordoned and can't be scheduled anymore, what would happen to our users should the last node be cordoned as well? And what if we need to reschedule something?

Checking your observability and communication platforms with Reliably

#reliably #chaosengineering #honeycomb #slack #resilience
In this video, we will use a chaos engineering experiment, that we expect to fail, to verify our open tracing and communication platforms are correctly set up. Using the Honeycomb and Slack integrations provided by Reliably, we will send traces and messages and observe if they are triggered as expected.