June 2020

Twitter's Reliability Journey

Jun 30, 2020 By Blameless Community In Blameless

Twitter’s SRE team is one of the most advanced in the industry, managing the services that capture the pulse of the world every single day and throughout the moments that connect us all. We had the privilege of interviewing Brian Brophy, Sr. Staff SRE, Carrie Fernandez, Head of Site Reliability Engineering, JP Doherty, Engineering Manager, and Zac Kiehl, Sr. Staff SRE to learn about how SRE is practiced at Twitter.

Read Post

Blameless

Read more about Twitter's Reliability Journey

Tech by Choice with Amy Tobey 2

Jun 30, 2020 By Blameless In Blameless

Blameless Staff SRE Amy Tobey speaks on an edition of Tech by Choice to share an introduction of SRE.

View Video

Blameless

Incident Management

Read more about Tech by Choice with Amy Tobey 2

Convincing Management to Invest in Reliability

Jun 30, 2020 By Blameless In Blameless

Led by Blameless COO Lyon Wong, this workshop walks you through convincing management to invest in reliability at every level of leadership.

View Video

Blameless

Incident Management

Read more about Convincing Management to Invest in Reliability

SLO Workshop

Jun 30, 2020 By Blameless In Blameless

Join Blameless co-founder and CEO Ashar Rizqi as he walks through the basics of SLIs, SLOs, error budgets, and error budget policies.

View Video

Blameless

Incident Management

Read more about SLO Workshop

How SLIs Help You Understand Users' Needs

Jun 29, 2020 By Emily Arnott In Blameless

In our article on SLOs, we discussed the need for service level indicators to be relevant to the users’ experience. By consolidating a number of internal metrics into one indicator that reflects the typical use of the service, we can ensure that meeting our SLO means keeping users happy. A good way to think about this is by looking at the user’s experience or journey.

Read Post

Blameless

Read more about How SLIs Help You Understand Users' Needs

Reduce Engineering Problems with a Resiliency Mindset

Jun 26, 2020 By Hannah Culver In Blameless

Resiliency isn’t something that just happens; it’s a result of dedication and hard work. To reach your optimal state of resilience, there are some crucial SRE best practices you should adopt to strengthen your processes.

Read Post

Blameless

Read more about Reduce Engineering Problems with a Resiliency Mindset

Top Practices for Runbook Automation

Jun 26, 2020 By Emily Arnott In Blameless

Runbooks, also known as playbooks, are documents that walk you through a certain task with specific steps. For example, a runbook for spinning up a new server might ask some questions about the purpose of the server and its estimated load, then lead you to the appropriate instructions and settings. Runbooks ease the cognitive load of these common tasks by clearly outlining the process for each.

Read Post

Blameless

Read more about Top Practices for Runbook Automation

SRE: A Human Approach to Systems

Jun 25, 2020 By Hannah Culver In Blameless

In the world of technology, the stakes have never been higher. The move to the cloud and microservices to maximize agility has given way to digital disruptors and unprecedented competitive threats. As distributed systems become increasingly complex, the scale of ‘unknown unknowns’ increases. On top of this, customer expectations are sky-high. The cost of downtime is catastrophic, with customers willing to churn if their needs are not promptly met.

Read Post

Blameless

Read more about SRE: A Human Approach to Systems

SREview Issue #2, June 2020

Jun 24, 2020 By Blameless Community In Blameless

Here’s the second issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Read Post

Blameless

Read more about SREview Issue #2, June 2020

Best Practices for Effective Incident Management

Jun 19, 2020 By Hannah Culver In Blameless

Incident management is a set of processes used by operations teams to respond to latency or downtime, and return a service to its normal state. Incident management practices have long been well-defined through frameworks such as ITIL, but as software systems become more complex, teams increasingly need to adapt their incident management processes accordingly.

Read Post

Blameless

Read more about Best Practices for Effective Incident Management

At Blameless, Reliability is Personal

Jun 16, 2020 By Blameless Community In Blameless

During our 2019 Blameless Summit, CEO Ashar Rizqi spoke on his relationship with reliability and how it impacts his personal experiences.

Read Post

Blameless

Read more about At Blameless, Reliability is Personal

Blameless Is Awarded CIOReview Top 2020 DevOps Solution Provider

Jun 12, 2020 By Blameless Community In Blameless

We're proud to announce that this week, we were selected by CIOReview as one of the Top 20 DevOps Solution Providers of 2020 alongside other innovators in the DevOps space such as Chef, Jfrog, Saltstack, Splunk, and Xebia Labs.

Read Post

Blameless

Read more about Blameless Is Awarded CIOReview Top 2020 DevOps Solution Provider

Announcing our new integration with GoToMeeting

Jun 11, 2020 By Claudia Wibowo In Blameless

Communication during incidents is critical. With the rise of remote work, war rooms are no longer the central hub for all incident communication. Instead, we’re adapting to these new challenges and embracing video conferencing and messaging software in order to stay in tight lock-step with our teammates and collaborators. With this in mind, we are excited to announce that Blameless is adding a new way for you to communicate even faster and more effectively.

Read Post

Blameless

Read more about Announcing our new integration with GoToMeeting

A Journey Through Blameless from Incident to Success

Jun 9, 2020 By Dyllen Owens In Blameless

Here at Blameless, every aspect of our product has SLOs (Service Level Objects) and error budgets in order to help us understand and improve customer experience. Sometimes, these error budgets are at risk, triggering an incident. While incidents are often painful, we treat them as unplanned investments, striving to learn as much as we can from them. We empower all of our engineers to handle an on-call rotation, no matter how difficult the issue.

Read Post

Blameless

Read more about A Journey Through Blameless from Incident to Success

SRE Leaders Panel: Work as Done vs Work as Imagined

Jun 5, 2020 By Blameless Community In Blameless

Blameless recently had the privilege of hosting some fantastic leaders in the SRE and resilience community for a panel discussion. Our panelists discussed the effects of imposter syndrome especially during high tempo situations, how to use it to our advantage and overcome doubt, and how culture directly affects the availability of our systems. The transcript below has been lightly edited, and if you’re interested in watching the full panel, you can do so here.

Read Post

Blameless

Read more about SRE Leaders Panel: Work as Done vs Work as Imagined

Operations | Monitoring | ITSM | DevOps | Cloud

June 2020

Twitter's Reliability Journey

Tech by Choice with Amy Tobey 2

Convincing Management to Invest in Reliability

SLO Workshop

How SLIs Help You Understand Users' Needs

Reduce Engineering Problems with a Resiliency Mindset

Top Practices for Runbook Automation

SRE: A Human Approach to Systems

SREview Issue #2, June 2020

Best Practices for Effective Incident Management

At Blameless, Reliability is Personal

Blameless Is Awarded CIOReview Top 2020 DevOps Solution Provider

Announcing our new integration with GoToMeeting

A Journey Through Blameless from Incident to Success

SRE Leaders Panel: Work as Done vs Work as Imagined

Monthly Archive

Follow Us