Dec 10, 2019 | By Vishwa Krishnakumar
Site reliability engineers have one of, if not the, toughest roles in any organization. While dealing with incidents is one part of the job, the other is to build reliable systems. Google’s SRE book sums this approach nicely. One of the most important challenges for an SRE when it comes to balancing work between firefighting and toil reduction is the issue of alert noise.
Nov 29, 2019 | By Amrit Balraj
“Being on-call is a critical duty that many operations and engineering teams must undertake to keep their services reliable and available. However, there are several pitfalls in the organization of on-call rotations and responsibilities that can lead to serious consequences for the services and the teams if not avoided.
Nov 18, 2019 | By Amrit Balraj
GameDays were first coined by Amazon’s “Master of Disaster” Jesse Robbins when he created them intending to increase reliability by purposefully creating major failures on pre-planned dates. Game Days help facilitate the values of chaos engineering. Chaos engineering is the disciplined practice of injecting failure into healthy systems. With modern IT services becoming increasingly sophisticated continuously changing systems, outages are inevitable.
Nov 11, 2019 | By Amrit Balraj
Site reliability engineering was a term coined by Google engineer Benjamin Treynor in 2003 when he was tasked with making sure that Google services were reliable, secure and functional. He and his team eventually wrote the book on SRE which is available online for free for anyone interested in research and implementation of SRE best practices.
Oct 16, 2019 | By Amrit Balraj
Modern businesses are evolving rapidly with the advent of cloud, CI/CD and microservices. However, there still exists an extensive and obvious divide between principle business stakeholders and developmental teams. Development teams are often unaware of the challenges faced by operations teams and vice-versa. This is where a need for adoption of DevOps principles comes into the picture. DevOps which came into existence as the natural successor to Agile practices in software development.