Kubernetes Master Class - Disaster Recovery with Rancher and Kubernetes

Kubernetes Master Class - Disaster Recovery with Rancher and Kubernetes

Jul 8, 2021

Everything breaks at some point; whether it is infrastructure (DNS, network, storage, etc.) or Kubernetes itself, something will fail eventually.

In this session, we will walk through some common failure scenarios, including identifying failures and how to respond to them in the fastest way possible using the same troubleshooting steps, scripts, and tools Rancher Support uses when supporting our Enterprise customers.

We will also review how to recover from these types of failures in place or scratch. Documentation and scripts for reproducing all of these failures (based on actual events) in a lab environment will also be shared following this Master Class.

Agenda:

  • Rebuilding an RKE cluster from only an etcd backup
  • How to recover from an etcd split-brain or even data corruption
  • Restoring service after multiple node failures
  • How to detect failures before your application teams can