Operations | Monitoring | ITSM | DevOps | Cloud

Squadcast

Pavlos Ratis shares his experience on being an SRE

Pavlos is a Site Reliability Engineer based in Munich, Germany. He likes building software and expanding his knowledge around the reliability of services and their infrastructure. He has created a few open-source SRE projects such as the awesome-sre, Wheel of Misfortune, Availability Calculator, and awesome-chaos-engineering to assist teams and individuals in getting on board with the SRE culture.

Managing technical risk effectively with Error Budgets

Tradeoffs are hard. Think about the time when you had to choose between two equally compelling options - (a) addressing technical debt or (b) pushing out that long-awaited feature release, and risk breaking production. Or when your team couldn’t agree on where to draw the line on improving request latency versus shipping a major new update.

Intent-based Capacity Planning and Autoscaling with Kubernetes

Intent-based Capacity Planning is Google's approach to declare reliability intent for a service and then solve for the most efficient resource allocation plan dynamically. Learn how you can start using this approach to effectively manage the reliability of your services running on your Kubernetes cluster.

Mark Henderson from Stack Overflow shares his experience on being an SRE

Mark Henderson has been a Site Reliability Engineer at Stack Overflow since 2015. Before this he worked as the sole systems administrator at a small software company in Sydney, Australia. These days, he lives in South Australia and works from home with his wife and two children.

Squadcast

Squadcast is an Intelligent Incident management, monitoring & Alerting platform that improves your reliability by helping SRE and DevOps teams to adopt IT Incident Management best practices like intelligent alert routing, on-call rotations, collaboration, response automation, root cause analysis, blameless postmortems, etc.