Operations | Monitoring | ITSM | DevOps | Cloud

Blameless

5 Reliability Insights That Immediately Transform Your SRE

As infrastructure engineers, there’s so much you can learn from studying past incidents. Luckily, Blameless Reliability Insights helps you find patterns that better equip you to deal with incidents to come. If you’ve never used it before and you’re curious what it looks like, you can watch a video demo here! These statistical insights give you the power to learn everything you can when something goes wrong. ‍

SRE: From Theory to Practice | What's difficult about incident command

A few weeks ago we released episode two of our ongoing webinar series, SRE: From Theory to Practice. In this series, we break down a challenge facing SREs through an open and honest discussion. Our topic this episode was “what’s difficult about incident command?” When things go wrong, who is in charge? And what does it feel like to do that role?

A Chat with Lex Neva of SRE Weekly

Since 2015, Lex Neva has been publishing SRE Weekly. If you’re interested enough in reading about SRE to have found this post, you’re probably familiar with it. If not, there’s a lot of great articles to catch up on! Lex selects around 10 entries from across the internet for each issue, focusing on everything from SRE best practices to the socio- side of systems to major outages in the news. ‍ I had always figured Lex must be among the most well-read people in SRE, and likely #1.

How The Experts Build Reliable Cloud Apps

We live in the cloud era, where your services don’t live in machines in your garage, but are spread across huge data centers around the world. Cloud providers can help meet increasing demands for reliability – for example, they offer dynamic resource allocation that can handle usage spikes. At the same time, going cloud native means not having a physical server onsite that you can fiddle with, introducing its own unique challenges. ‍