Monthly Archive

Kubernetes Operators for Automated SRE

May 27, 2020 By Squadcast In Squadcast

It can be quite challenging for an SRE team to maintain the well-being of a large-scale Kubernetes based system with hundreds or thousands of services. In this blog post, Gigi Sayfan, author of “Mastering Kubernetes”, outlines the SRE challenge and how we can achieve the ultimate goal of automated SRE with Kubernetes operators.

Read Post

Squadcast

Read more about Kubernetes Operators for Automated SRE

On-call On-boarding Checklist

May 20, 2020 By Squadcast In Squadcast

And it starts with the company culture. Irrespective of how small or large your team is, it’s wise to invest some time in creating a good on-call onboarding plan. A humane on-call is the mark of a good engineering culture. Being on-call means that you’re expected to be reachable for any issues that may occur during your shift. It’s easy to lose any and all motivation by just anxiously anticipating that mid-dinner ping.

Read Post

Squadcast

Read more about On-call On-boarding Checklist

Best Practices in Incident Management

May 7, 2020 By Squadcast In Squadcast

In an always-on world, companies look to systems and processes to keep their services up and running at all times. The most important part of maintaining this uptime is having an Incident Management process in place to restore your services in the event of an interruption or unplanned downtime. Incident Management processes are typically used by SRE, DevOps, NOC and other IT teams to respond to incidents that affect services and work on restoring their uptime.

Read Post

Squadcast

Read more about Best Practices in Incident Management

Operations | Monitoring | ITSM | DevOps | Cloud

Kubernetes Operators for Automated SRE

On-call On-boarding Checklist

Best Practices in Incident Management

Monthly Archive

Follow Us