April 2022

Objectively Speaking: Understanding the Power of Objectives

Apr 29, 2022 By Mick Roper In Reliably

Objectives help monitor different aspects of your services and systems such as latencies, error rates, PRs that are open, the age of a bug, and more. These are examples of things that drift away from what we think is good; which is essentially what an objective is. Objectives help us to define what ‘good’ looks like.

Read Post

Reliably

Read more about Objectively Speaking: Understanding the Power of Objectives

How Do You Measure Technical Debt?

Apr 29, 2022 By Kerem Gocen In Reliably

Technical debt is one of the trade-offs today’s software teams make to speed up development, which helps go-to-market time in return. That is mission-critical for most start-ups. Instead of dwelling on implementation details, or trying to cover edge cases that may affect a small fraction of the end-users in an early development stage, agile teams prioritize early and continuous delivery.

Read Post

Reliably

Read more about How Do You Measure Technical Debt?

Post-Incident Review | Why It's Important & How It's Done

Apr 28, 2022 By Emily Arnott In Blameless

Curious about the post-incident review process? We give a complete explanation of post-incident reviews and why they are important and discuss best practices. What is a post-incident review? A post-incident review is an evaluation of the incident response process. The goal of the process is to have clear actions to improve the incident response process and to also help prevent further incidents.

Read Post

Blameless

Read more about Post-Incident Review | Why It's Important & How It's Done

Reliability. It's Critical; Now How Do You Prioritize?

Apr 28, 2022 By

The eBook dives into the 4 main investment areas required to make an impact. You'll learn how to achieve measurable results for the benefit of your customers and your valuable engineers.

Get EBook

Blameless

Read more about Reliability. It's Critical; Now How Do You Prioritize?

Jira Integrations with Blameless platform

Apr 27, 2022 By Blameless In Blameless

In this video, our Solutions Engineer walks you through the steps of creating Jira tickets and follow up actions in Blameless. You'll learn how to leverage our Slack integration for quick ticket creation and also how to create tickets from within the Blameless platform. You'll also see how closing a ticket in Jira will automatically close a ticket in Blameless. Additionally, you'll discover how to manage open tickets and incidents in an organized Blameless dashboard.

View Video

Blameless

Read more about Jira Integrations with Blameless platform

Software Reliability Testing | What It Is & How It Works

Apr 26, 2022 By Emily Arnott In Blameless

We explain what software reliability testing is and how it works, how to conduct it, and how you can use it to identify problems in the software design process.

Read Post

Blameless

Read more about Software Reliability Testing | What It Is & How It Works

The Critical Role of the SRE & Error Budgeting

Apr 26, 2022 By Derek Rodner In Circonus

The role of SRE, Site Reliability Engineer, was first created by Benjamin Treynor in 2003 at Google after he was tasked with ensuring that their websites were available and reliable. The SRE is a multi-disciplined role that needs to have the ability to automate monitoring and observability across hundreds and thousands of complex systems.

Read Post

Circonus

Read more about The Critical Role of the SRE & Error Budgeting

SRE vs DevOps: What's The Difference?

Apr 22, 2022 By Stephen Watts In Splunk

Whether you’ve heard of or fully jumped on the DevOps or SRE bandwagon, you may have also wondered how the two relate. What’s the difference? Are they really just different ways of looking at the same problem? The term DevOps hit the market first, but SRE wasn’t too far behind. And though they have different origin stories, they both focus on autonomy, automation, and iteration. So why do these paradigms exist? And why do we need both? Let’s look at this further.

Read Post

Splunk

Read more about SRE vs DevOps: What's The Difference?

Improving Reliability With OKR Initiatives

Apr 22, 2022 By Aimee Pearcy In Reliably

‘OKR’, which stands for ‘Objectives and Key Results,’ is a goal management framework designed to define goals and track outcomes. It differs from typical goal-setting techniques because the aim is to set very ambitious goals that encourage teams to flex their creativity. OKRs are used by Google, LinkedIn, Twitter, and other successful companies to create measurable goals, and to make sure team members are aligned and engaged.

Read Post

Reliably

Read more about Improving Reliability With OKR Initiatives

Reliably - Objectively speaking

Apr 21, 2022 By Reliably In Reliably

Objectives are fantastic! They convey the desired end goal while allowing progress to be measured. During this lightning talk, Mick Roper, lead software engineer at Reliably, will discuss the power of objectives.

View Video

Reliably

DevOps
SRE

Read more about Reliably - Objectively speaking

SRE: From Theory to Practice | What's difficult about on-call?

Apr 21, 2022 By Emily Arnott In Blameless

We launched the first episode of a webinar series to tackle one of the major challenges facing organizations: on-call. SRE: From Theory to Practice - What’s difficult about on-call sees Blameless engineers Kurt Andersen and Matt Davis joined by Yvonne Lam, staff software engineer at Kong, and Charles Cary, CEO of Shoreline, for a fireside chat about everything on-call. As software becomes more ubiquitous and necessary in our lives, our standards for reliability grow alongside it.

Read Post

Blameless

Read more about SRE: From Theory to Practice | What's difficult about on-call?

SRE Adoption | A 2-Year Retrospective (From A Business Point-Of-View)

Apr 20, 2022 By Jason Montgomery In Blameless

This month I hit my 2-year anniversary with Blameless and as our industry progresses and matures, I thought it would be a good opportunity to look back and review how far we have come and also ruminate on where we’re headed. Our shared vision at Blameless is to help engineering teams adopt reliability practices with ease and advance to a resilient culture.

Read Post

Blameless

Read more about SRE Adoption | A 2-Year Retrospective (From A Business Point-Of-View)

Site Reliability Chats (Apr 20, 2022)

Apr 20, 2022 By Gremlin In Gremlin

In this episode Julie and Jason share updates on the Atlassian outage, a new incident at Cerner, and problems at the IRS. They also cover post-incident investigations from Cloudflare and Datadog.

View Video

Gremlin

Read more about Site Reliability Chats (Apr 20, 2022)

The State of Incidents and Site Reliability: Q&A with Blameless SRE Architect Kurt Andersen

Apr 19, 2022 By Blameless In Blameless

In the latest of an occasional series, today we hear from Kurt Andersen, SRE Architect at Blameless, discussing the evolution of incident management, current trends in site reliability affecting engineering teams, as well as an update on how Blameless is addressing the needs of SRE and DevOps.

Read Post

Blameless

Read more about The State of Incidents and Site Reliability: Q&A with Blameless SRE Architect Kurt Andersen

Shift Right Testing (Do I need it? How Is It Done?)

Apr 19, 2022 By Myra Nizami In Blameless

Curious about shift right testing? We explain what shift right is, how it’s done, how it differs from a shift left approach, and whether it is important.

Read Post

Blameless

Read more about Shift Right Testing (Do I need it? How Is It Done?)

What Is A Site Reliability Engineer And Why You Need One

Apr 15, 2022 By Aimee Pearcy In Reliably

Site reliability engineering (SRE) does the work that would typically be done by operations but instead uses engineers with software experience to solve problems. The concept of SRE was created by Google in 2003 after a team of software engineers was asked to make Google’s sites more scalable, reliant, and efficient. They described SRE as ‘when you treat operations as if it’s a software problem’.

Read Post

Reliably

Read more about What Is A Site Reliability Engineer And Why You Need One

Managing Burnout | Tips To Minimize The Impact

Apr 14, 2022 By Blameless Community In Blameless

Burnout is real. Today, the source of burnout can be anything from pandemic fatigue, to the onslaught of political divisiveness, or simply the pace of life worldwide. Whatever the culprit, we’re living in a stressful time. People working in cloud native environments definitely feel burnt out. Silicon Valley investor Marc Andreessen famously said, “Software is eating the world,” and that seems to be quite true. High demand is fueling churn. System and cloud operators feel pressure.

Read Post

Blameless

Read more about Managing Burnout | Tips To Minimize The Impact

Site Reliability Chats (Apr 13, 2022)

Apr 13, 2022 By Gremlin In Gremlin

In this episode, Julie and Jason cover recent outages of the Dutch NS trains, American Express, and the on-going, long-running incident at Atlassian. In positive news, they cover the acquisitions of Puppet by Perforce and Chaos Native by Harness, and Grafana Lab's series D funding.

View Video

Gremlin

Read more about Site Reliability Chats (Apr 13, 2022)

The Pros and Cons of Embedded SREs

Apr 12, 2022 By Quentin Rousseau In Rootly

To embed or not to embed: That is the question. At least, that’s one of the questions that companies have to answer as they decide how to implement Site Reliability Engineering. They can either embed SREs into existing teams, or they can build a new, separate SRE team. Both approaches have their pros and cons. The right strategy for your company or team depends, of course, on your needs and priorities.

Read Post

Rootly

Read more about The Pros and Cons of Embedded SREs

Software Performance Testing | Types, Tools & Best Practices

Apr 12, 2022 By Myra Nizami In Blameless

Looking into software performance testing? We explain what software performance testing is, the different types, how it works, and the benefits it can have.

Read Post

Blameless

Read more about Software Performance Testing | Types, Tools & Best Practices

Reliability Testing For SRE

Apr 12, 2022 By Aimee Pearcy In Reliably

Reliability testing is a software testing technique designed to make sure that a piece of software meets customer requirements, and to identify any faults within the product before it is delivered to the customer. It is the key to improving the design, functionality, and ultimately the quality of software. It should be performed at each level of software creation, and it encompasses everything from unit testing, to full system testing.

Read Post

Reliably

Read more about Reliability Testing For SRE

Automated Incident Management | Everything You Should Know

Apr 7, 2022 By Noor-ul-Anam Ruqayya In Blameless

Looking into automated incident management? We explain everything you need to know about what automated incident management is, why it’s important, and how to do it.

Read Post

Blameless

Read more about Automated Incident Management | Everything You Should Know

How to use PagerDuty with Blameless

Apr 7, 2022 By Blameless In Blameless

Blameless integrates with PagerDuty so you can notify teams and key stakeholders during an incident. We also help you search escalation policies and on-call rotation schedules. In this video, our Solutions Engineer walks you through navigating the initial setup and configuration in the Blameless UI. He'll then demonstrate how the integration works in real-time. If you use Slack or Microsoft Teams for internal communications, you'll also learn how to access and manage the PagerDuty integration from within those tools.

View Video

Blameless

Read more about How to use PagerDuty with Blameless

Freshdesk + Squadcast: Enabling Streamlined Incident Response for Enterprises

Apr 5, 2022 By Nir Sharma In Squadcast

Freshdesk is a cloud-based customer service platform used by enterprises that provides a centralized help desk(with the help of support tickets) across multiple channels, including email, phone, chat, and social media. Squadcast is an incident management platform that integrates with major monitoring, ChatOps and project management tools to provide a centralized place for reliability.

Read Post

Squadcast

Read more about Freshdesk + Squadcast: Enabling Streamlined Incident Response for Enterprises

Shift Left Testing (What It Is & How To Do It)

Apr 5, 2022 By Myra Nizami In Blameless

Wondering about shift left testing? We explain what shift left testing is, how it relates to DevOps and SRE, why it’s done, and how to get started.

Read Post

Blameless

Read more about Shift Left Testing (What It Is & How To Do It)

Site Reliability Engineering: An Imperative in Today's Enterprise IT

Apr 5, 2022 By Pepperdata In Pepperdata

Site reliability engineering (SRE) is fast becoming an essential aspect of modern IT operations, particularly in highly scaled, big data environments. As businesses and industries shift to the digital and embrace new IT infrastructures and technologies to remain operational and competitive, the need for a new approach for IT teams to find and manage the balance between launching new systems and features and ensuring these are intuitive, reliable, and friendly for end users has intensified as well.

Read Post

Pepperdata

Read more about Site Reliability Engineering: An Imperative in Today's Enterprise IT

Show character with Blameless Postmortems (part one)

Apr 4, 2022 By Dave Harrison In Raygun

This is Part 1 of a two-part series on Blameless Postmortems. Today, we'll discuss why blameless postmortems are so important and their implications for your team; the second part will go into detail on how to set them up as a process and make them successful. Somebody wise may have once told you that how we handle adversity shows our character. Being able to acknowledge and admit mistakes is the first step towards learning - it's a key part of success both in personal relationships and in large companies.

Read Post

Raygun

Read more about Show character with Blameless Postmortems (part one)

Blameless Walkthrough: Managing incidents, follow up actions, Jira tickets, and comms in Blameless

Apr 1, 2022 By Blameless In Blameless

Our Solutions Engineer walks you through the Blameless UI to filter incidents, set up follow up actions, open and update Jira tickets, and set up comms.

View Video

Blameless

Read more about Blameless Walkthrough: Managing incidents, follow up actions, Jira tickets, and comms in Blameless

Blameless Walkthrough: Starting an incident, assigning roles, and updating task checklists via Slack

Apr 1, 2022 By Blameless In Blameless

Our Solutions Engineer walks you through how to open and update incidents in Blameless through Slack.

View Video

Blameless

Read more about Blameless Walkthrough: Starting an incident, assigning roles, and updating task checklists via Slack

Operations | Monitoring | ITSM | DevOps | Cloud

April 2022

Objectively Speaking: Understanding the Power of Objectives

How Do You Measure Technical Debt?

Post-Incident Review | Why It's Important & How It's Done

Reliability. It's Critical; Now How Do You Prioritize?

Jira Integrations with Blameless platform

Software Reliability Testing | What It Is & How It Works

The Critical Role of the SRE & Error Budgeting

SRE vs DevOps: What's The Difference?

Improving Reliability With OKR Initiatives

Reliably - Objectively speaking

SRE: From Theory to Practice | What's difficult about on-call?

SRE Adoption | A 2-Year Retrospective (From A Business Point-Of-View)

Site Reliability Chats (Apr 20, 2022)

The State of Incidents and Site Reliability: Q&A with Blameless SRE Architect Kurt Andersen

Shift Right Testing (Do I need it? How Is It Done?)

What Is A Site Reliability Engineer And Why You Need One

Managing Burnout | Tips To Minimize The Impact

Site Reliability Chats (Apr 13, 2022)

The Pros and Cons of Embedded SREs

Software Performance Testing | Types, Tools & Best Practices

Reliability Testing For SRE

Automated Incident Management | Everything You Should Know

How to use PagerDuty with Blameless

Freshdesk + Squadcast: Enabling Streamlined Incident Response for Enterprises

Shift Left Testing (What It Is & How To Do It)

Site Reliability Engineering: An Imperative in Today's Enterprise IT

Show character with Blameless Postmortems (part one)

Blameless Walkthrough: Managing incidents, follow up actions, Jira tickets, and comms in Blameless

Blameless Walkthrough: Starting an incident, assigning roles, and updating task checklists via Slack

Monthly Archive

Follow Us