Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Creating Custom Slack Commands

Site Reliability Engineers are expected to know everything that’s happening, all of the time. That’s a lot of things! To help you sift through the noise, we’ve developed a feature that lets you find accurate data about your organization on-demand. You can do this by sending custom-designed commands to FireHydrant directly from your integrated Slack account.

How Netflix Uses Fault Injection To Truly Understand Their Resilience

Distributed systems such as microservices have defined software engineering over the last decade. The majority of advancements have been in increasing resilience, flexibility, and rapidity of deployment at increasingly larger scales. For streaming giant Netflix, the migration to a complex cloud based microservices architecture would not have been possible without a revolutionary testing method known as fault injection. With tools like chaos monkey, Netflix employs a cutting edge testing toolkit.

A Day in the Life: Intelligent Observability at Work with a Super SRE

After we’d fixed Aparna’s network issue, James came to see me at my desk. Masks on, socially distanced and all that, but it was nice to have some face-to-face time. James is cool – that dry British humor and not your classic IT Ops dude. He’s been here forever and mentored me when the CIO, Charlie, hired me as the first SRE here a year or so ago. I lucked out really.

SRE Survey 2021: Where do we go from here

What a difference a year makes. In a matter of 365 days, the entire planet stared down at uncertainty, and while most of the world is far from recovered, we are starting to see a time where some level of normalcy will return. But what will this look like? How will the past year transform our social interactions, our time out of the house, and how we conduct business?