Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

Can Security Teams Benefit from SRE? You bet!

When we talk about the reliability of services, SRE encourages us to take a holistic view. Unreliability in service delivery can be due to anything, from hardware malfunctions to errors in code. One source of unreliability that is often overlooked is security. A security breach can damage customer trust far beyond the impact of the breach itself. Even smaller infractions, like failing a service audit, can make users wary.

Site reliability engineering-what is SRE?

As companies today are racing to build site reliability engineering(SRE) practices within their engineering teams, site reliability engineering has become one of the hottest and highest paying jobs in tech. Site reliability engineering was a term coined by Google engineer Benjamin Treynor in 2003 when he was tasked with making sure that Google services were reliable, secure and functional.

DevOps/SRE Model: Bursting the Developer's Bubble. Here's the CTO Perspective.

Many organizations are transitioning toward a DevOps operational model, where software developers are responsible for operating the applications they develop, instead of a centralized IT operations group. In this “CTO Perspective” interview we talk to BigPanda’s CTO Elik Eizenberg about the challenges in that transition, and what it takes to make it easier. Lean back and watch the interview, or if you prefer reading, take a few minutes to read the transcript.

Ask an SRE Panel Talk

Our SRE Leaders Panel series gathers leading minds in the SRE and resilience community to share their insights. In this edition, we are so excited to have an amazing all-women panel who will be diving deep into testing in production: The event will consist of 40 minutes of roundtable discussion with Shelby and Talia facilitated by Blameless' Staff SRE Amy Tobey, followed by 20 minutes of Q&A from the audience. This is an open and candid discussion so come with your questions. We look forward to seeing you there!

This is your Guide for Implementing SRE in NOCs

Network Operation Centers, or NOCs, serve as hubs for monitoring and incident response. A NOC is usually a physical location in an organization. NOC operators sit at a central desk with screens showing current service data. But, the functionality of a NOC can be distributed. Some organizations build virtual NOCs. These can be staffed fully remotely. This allows for distributed teams and follow-the-sun rotations. NOC as a service is another structure gaining in popularity.

SRE Leaders Panel: Testing in Production

Blameless recently had the privilege of hosting some fantastic leaders in the SRE and resilience community for a panel discussion. Our panelists discussed testing in production, how feature flagging and testing can help us do that, and how to get managers to be on board with testing in production. The transcript below has been lightly edited, and if you’re interested in watching the full panel, you can do so here.