Latest Posts

How ITIL, DevOps, and SRE Work Together for your Organization

Mar 10, 2020 By Hannah Culver In Blameless

When someone asks what type of “shop” your organization is, can you answer confidently that it’s ITIL, DevOps, or SRE? Maybe some people can, but if you’re a large enterprise, the answer is likely a combination of several of these operating models, especially since SRE has become a key implementation of DevOps. ITIL can work effectively alongside DevOps and SRE principles, though at first glance they appear to be different species.

Read Post

Blameless

Read more about How ITIL, DevOps, and SRE Work Together for your Organization

How do we Apply SRE Outside of Engineering with Google's Dave Rensin

Mar 3, 2020 By Blameless In Blameless

The first keynote speaker, he is a senior director of engineering at Google. You might know him as they guy who founded and leads the customer reliability engineering function at Google. CRE, this is a team that teaches the world SRE principles and practices. Now I want to tell you a bit more about him, because I think he has a very unique view and perspective. He is deeply compassionate and intuitive as a teacher, not just a lecturer.

Read Post

Blameless

Read more about How do we Apply SRE Outside of Engineering with Google's Dave Rensin

Using AI to Auto-Detect and Remediate Incidents

Feb 27, 2020 By Ancy In Blameless

Today, the number of possible failure modes in cloud and microservices applications are exploding, making it increasingly difficult to gain true observability and take the right action across IT environments. According to Lightstep’s Global Microservices Trends report, 91% of teams are using or have plans to use microservices, but 73% report it is harder to troubleshoot application performance problems due to greater complexity.

Read Post

Blameless

Read more about Using AI to Auto-Detect and Remediate Incidents

5 Surefire Ways to Improve Your Product Reliability with Logging and Automation

Feb 19, 2020 By Josh Hendrick In Blameless

In the fast-moving world of software development, as your product and organization grow and evolve, there are almost always competing priorities. Zeroing in on what is most important to your business in order to take it to the next level can at times seem like a non-stop process of trial and error. Oftentimes the customer who screams the loudest becomes a priority and gets the most focus.

Read Post

Blameless

Read more about 5 Surefire Ways to Improve Your Product Reliability with Logging and Automation

Evolving Blameless' SRE Practices with Amy Tobey

Feb 18, 2020 By Blameless In Blameless

At Blameless, we drink our own champagne, and aim to adopt a mindset of continuous learning to foster resilience. We believe that the adoption of SRE practices is one of the best ways to get there. Like most organizations, our early efforts to implement SRE were imperfect. However, through hard work, teamwork, and investing in what we believe is the most important feature (reliability), we have made significant changes to how we do SRE. And we’re getting better at it every day.

Read Post

Blameless

Read more about Evolving Blameless' SRE Practices with Amy Tobey

Structuring Your Teams for Software Reliability

Feb 12, 2020 By Hannah Culver In Blameless

How well positioned is your team to ship reliable software? What are the different roles in engineering that impact reliability, and how do you optimize the ratio of software engineers to SREs to DevOps within teams? These questions can be hard to answer in a quantifiable way, but projecting different scenarios using systems thinking can help. Will Larson’s blog post Modeling Reliability does just that, and serves as inspiration for this article.

Read Post

Blameless

Read more about Structuring Your Teams for Software Reliability

How to Network Effectively as an SRE

Feb 4, 2020 By Hannah Culver In Blameless

For many SREs, networking prompts a similar response as going to the dentist. You know you should do it, but you don’t really want to. But networking is much less like a root canal and more like a regular teeth cleaning; you may not want to go, but once you’re there, it’s not so bad. In fact, you may walk away feeling good knowing that you’ve done something that helps future you.

Read Post

Blameless

Blog
DevOps

Read more about How to Network Effectively as an SRE

New Postmortems Design and Commenting Functionality

Jan 29, 2020 By Blameless In Blameless

One of the most important steps in an incident’s lifecycle is the postmortem. It provides an essential time to reflect on what happened, what could have been done better, and how to build more resilience into a system. But we consistently hear from engineers that incredible toil is typically involved in coordinating stakeholders to write good postmortems.

Read Post

Blameless

Read more about New Postmortems Design and Commenting Functionality

2020 SRE Predictions

Jan 28, 2020 By Hannah Culver In Blameless

It’s a new year, so what will 2020 have in store for SRE? Here’s our two cents: SRE adoption will only continue to grow. However, the practice and culture shift, rather than the role, will take priority in 2020. More people (not just SREs) will have a reliability mindset, shifting reliability left through the software lifecycle. SLIs, SLOs, and error budget policies will become common practice to make this shift actionable.

Read Post

Blameless

Read more about 2020 SRE Predictions

What Are Service-Level Objectives? Lessons Learned

Jan 21, 2020 By Emily Arnott In Blameless

Service Level Objectives, or SLOs, are an internal goal for the essential metrics of a service, such as uptime or response speed. We’re probably familiar with this definition, but what is the value of setting these goals? We’ll take a look at SLOs as both a powerful safety net and a tool to inform the allocation of engineering resources, while also considering the cultural learnings of SLO adoption.

Read Post

Blameless

Read more about What Are Service-Level Objectives? Lessons Learned

Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

How ITIL, DevOps, and SRE Work Together for your Organization

How do we Apply SRE Outside of Engineering with Google's Dave Rensin

Using AI to Auto-Detect and Remediate Incidents

5 Surefire Ways to Improve Your Product Reliability with Logging and Automation

Evolving Blameless' SRE Practices with Amy Tobey

Structuring Your Teams for Software Reliability

How to Network Effectively as an SRE

New Postmortems Design and Commenting Functionality

2020 SRE Predictions

What Are Service-Level Objectives? Lessons Learned

Monthly Archive

Follow Us