Blameless

https://www.blameless.com

San Mateo, CA, USA

2017

Apr 10, 2024 | By Lee Atchison

In the fast-paced world of IT operations, the culture permeating an organization is critical to its success. It drives behavior, efficiency, and organizational accomplishment. A blame-centric culture is particularly detrimental, creating an environment where finger-pointing is more important than problem-solving and fear reduces innovation. This negative culture damages individual morale and erodes the organization's collective resilience.

Read Post

What are Blameless Retrospectives? How Do You Run Them?

Mar 29, 2024 | By Lee Atchison

In most engineering organizations, everyone agrees that in complex systems, failure is inevitable. It’s possible to prevent the recurrence of certain incidents, reduce their impact, or shorten the time to resolution. However, it’s impossible to avoid them altogether. In the past, we asserted failures are a result of people’s mistakes. It was all about “the bad apple theory,” focused on finding the “guilty party” and removing them to prevent future failures.

Read Post

Incident Response Team | Roles & Responsibilities Defined

Mar 29, 2024 | By Lee Atchison

When your organization faces outages, errors, security breaches, and other incidents, you need to have a plan in place to take appropriate actions as needed. However, you also need a capable team of experts filling critical roles and responsibilities to execute those actions and effectively collaborate to resolve issues quickly. An incident response team, therefore should be developed in a way that avoids skills gaps in expertise.

Read Post

Incident Management Automation - What You Should Know

Mar 29, 2024 | By Lee Atchison

Automated incident management is the process of automating incident response to ensure that critical events are detected and addressed in the most efficient and consistent manner. In incident management, time is of the essence and the primary benefit of automated incident management is speed. With automation, you can accomplish time-consuming tasks much quicker. This brings down the incident response time and allows the team to focus their attention on matters that require their expertise.

Read Post

The Role of the SRE in the Incident Management Process

Mar 14, 2024 | By Lee Atchison

In the world of modern businesses, where IT systems play a major role in all types of businesses, the role of the Site Reliability Engineer (SRE) has become central to managing the effectiveness and reliability of the entire business. SREs are the bridge between the rapid deployment of software and systems and the stable operation of those systems in a production environment. They ensure that reliability and performance criteria are defined and are met.

Read Post

SRE or SWE? Making the Right Career Choice for You

Feb 26, 2024 | By Blameless

Your first years following graduation are critical to finding the most lucrative and fulfilling career path. Here, we explore SRE (Site Reliability Engineer) vs SWE (Software Engineering) opportunities to help focus your career goals.

Read Post

NIST Incident Response Steps & Template | Blameless

Feb 21, 2024 | By Lee Atchison

The National Institute of Standards and Technology (NIST) provides the framework to help businesses mitigate cybersecurity risks. The framework also protects networks and data, outlining best practices to inform decisions that save time and money. Creating a cybersecurity strategy that identifies, protects, detects, responds, and helps you recover from cybersecurity incidents is critical in the evolving threat landscape.

Read Post

How to Comply With the SEC's New Cybersecurity Rule

Feb 21, 2024 | By Lee Atchison

On July 26, 2023, the Securities and Exchange Commission (SEC) introduced new rules regarding cybersecurity risk management, strategy, governance, and incidents. Public companies subject to reporting requirements must comply with the changes to avoid rescission and other monetary penalties, not to mention the risk of legal action and reputation damage. Here, we look at the two new cybersecurity rules and how your company can comply. ‍

Read Post

The Power of Building a Blameless Culture in IT Operations

Feb 15, 2024 | By Lee Atchison

In the world of high-scale, high-availability, high-performance web applications, mistakes in IT operations are inevitable. Systems fail, bugs slip through, and outages occur. Your team's approach to responding to these incidents significantly impacts their overall productivity, morale, and effectiveness. Company culture, such as that associated with a blameless culture, is crucial to driving the behaviors that make your business a success.

Read Post

Getting Buy-in from Management on Reliability Investments

Feb 1, 2024 | By Emily Arnott

If you’re reading the Blameless blog, you probably have a good idea of how important reliability is to your customers’ happiness, your business’s bottom line, and your overall sanity. Unfortunately, this perspective is frequently downplayed by management. Even if they understand the importance of reliability, they often see it as something that should emerge automatically from having the right mindset, and not something that requires investment.

Read Post

Unleashing the Change Maker Within Webinar Preview

Apr 2, 2024 | By Blameless

Join us on April 16th at 10 a.m. PT for a 60-minute live webinar, where we'll discuss the secrets to driving change in your organization. We'll tackle two of reliability's biggest issues: getting budget and garnering support. Join us for Unleashing the Change Maker Within at 10 a.m. PST. We'll show you how to empower yourself to drive organizational change. Discover the secrets to selling your boss on the tools you need to automate your workflow and streamline your processes. We'll equip you with the strategies and insights to turn your great ideas into actionable plans.

View Video

Giving Power Back To The Engineers: A Fireside Chat with MyFitnessPal

Mar 26, 2024 | By Blameless

The real secret to mastering engineering operations is putting engineers in the driver's seat. On March 26th at 10 am, Chris Karper, Sr. Director of Engineering at MyFitnessPal, joins Chief Reliability Officer, Lee Atchison to discuss how MyFitnessPal is overcoming incidents by giving power back to the engineers. They'll explore how Chris has navigated MyFitnessPal through its technological advancements, growth of the team, and the maturity of its incident management program.

View Video

Next-Gen Incident Management: Blueprints for High-Powered Incident Response

Mar 8, 2024 | By Blameless

Join us for an exclusive webinar designed for IT Operations leaders, SREs, DevOps & software engineering leaders, featuring Jim Gochee, CEO of Blameless, Ken Gavranovic, COO of Blameless, and Nick Mason, Principal Sales Engineer at Blameless. Uncover the technical scaffolding essential to propel your incident management strategy forward, faster. Dive deep into the core technical components vital for a robust incident response framework, and discover firsthand how Generative AI can dramatically save hours for your team during critical incidents.

View Video

Manage Blameless Incidents in Microsoft Teams

Feb 29, 2024 | By Blameless

View Video

Approaches to Enterprise Reliability Management in Microsoft Teams

Feb 28, 2024 | By Blameless

Nick Mason highlights how Microsoft Teams user can put Blameless to work to make incident response less stressful and more efficient.

View Video

From New Relic to AWS: Secrets of creating a blameless culture

Feb 13, 2024 | By Blameless

We are excited to feature our COO Ken Gavranovic, with his rich experience at New Relic, Cox, and Web.com, our CEO Jim Gochee, who brings insights from his time at Apple and New Relic, and Lee Atchison, a seasoned expert from New Relic, Amazon, and AWS.

View Video

Fireside Series: The secret to being a successful change agent in IT Operations

Jan 30, 2024 | By Blameless

Are you tired of putting out the same fire day after day? You're not alone. Engineering leaders from every industry are working tirelessly to evolve their approach to incident management and IT Operations. Each installment of our Fireside Series is a conversation with one of your peers. We'll get under the hood of their team's strategy for building and operating some category-defining products. Then, we'll use their experiences to build and expand a roadmap for how you can lead your own company's operational evolution.

View Video

From Amazon to Apple: Key Strategies for Operational Excellence in Tech

Jan 17, 2024 | By Blameless

Jim Gochee, CEO of Blameless with a history at New Relic and Apple, Ken Gavranovic, COO of Blameless and an Amazon Best Selling Author with experiences at Cox, Web.Com, and Unqork, and Lee Atchison, Chief Reliability Officer at Blameless, noted for his work on Amazon BeanStalk and as the author of "Architecting for Scale," with roles at AWS, HP, and New Relic, will guide this session.

View Video

Navigating the New SEC Data Breach Rule A Blameless Blueprint for Compliance

Nov 29, 2023 | By Blameless

The new SEC rule on material security breaches goes into effect on December 18, 2023 for larger publicly traded companies and all other public companies within 180 days. If you're not already in compliance, it’s important for you to prepare for the new rule now by developing a plan for incident response and disclosure.

View Video

Elevating Incident Management: Leveraging automation and AI to put reliability on autopilott

Oct 26, 2023 | By Blameless

If your company operates in a modern digital environment, then there’s a good chance questionable reliability is hurting you competitively. On the other hand, every hour your engineering team spends on operations comes at the expense of developing your product. So, what are you supposed to do?

View Video

More Videos

The Blameless Complete Guide to Incident Management

Oct 10, 2022 | By

Incidents are inevitable. As your service expands and becomes more complex, you are more likely to encounter outages, slowdowns, errors, and other disruptions to healthy operation. At the same time, as your service becomes more popular and relied on by users, the cost of incidents becomes higher. Studies have shown that the cost of downtime is high, and growing fast in the digital-first world. Since you can never fully prevent incidents, it's important to resolve them as efficiently as possible.

Get EBook

Reliability. It's Critical; Now How Do You Prioritize?

Apr 28, 2022 | By

The eBook dives into the 4 main investment areas required to make an impact. You'll learn how to achieve measurable results for the benefit of your customers and your valuable engineers.

Get EBook

Beyond the 4 SRE Golden Signals

Mar 24, 2022 | By

SRE's Golden Signals are four key metrics used to monitor the health of your service and underlying systems. We will explain what they are, and how they can help you improve service performance.

Get EBook

A practical Guide to implementing SRE

Nov 7, 2021 | By

This eBook gives you practical steps to implementing SRE practices in an organization that's already invested in DevOps. It outlines the clear benefits and lays out how they can be achieved. Three main topics are covered: Incident Management, Service Level Objectives (SLOs), and SRE Culture. Leveling up these critical aspects of SRE will reap both immediate and long-term benefits.

Get EBook

More Publications

Blameless offers the only complete reliability engineering platform that brings together AI-driven incident resolution, blameless postmortems, SLOs/Error Budgets, and reliability insights reports and dashboards, enabling businesses to optimize reliability and innovation.

Enabling modern software businesses to adopt SRE best practices:

Incident Resolution: Use AI to engage the right people and teams in the right way to stop problems fast, ensure customer satisfaction and prevent incidents from happening again.
Blameless Postmortems: Learn without pointing fingers, ensuring continuous improvements. We automatically bring relevant information, proper context and industry best practices to your postmortem process.
SLOs/Error Budgets: Create SLOs and see your remaining error budgets with the SLO dashboard. Teams gain insight into what parts of the business are consuming the error budget, allowing them to make informed decisions between releasing new features and reliability.
Reliability Insights: Blameless will allow your business to consume event data across your entire DevOps stack, query the data, and create custom dashboards, meaning teams can quickly find signals amongst their DevOps data noise.

The Complete Site Reliability Engineering (SRE) Platform.

Blameless

Monthly Archive

Follow Us