Blameless

San Mateo, CA, USA
2017
  |  By Lee Atchison
In the fast-paced world of IT operations, the culture permeating an organization is critical to its success. It drives behavior, efficiency, and organizational accomplishment. A blame-centric culture is particularly detrimental, creating an environment where finger-pointing is more important than problem-solving and fear reduces innovation. This negative culture damages individual morale and erodes the organization's collective resilience.
  |  By Lee Atchison
In most engineering organizations, everyone agrees that in complex systems, failure is inevitable. It’s possible to prevent the recurrence of certain incidents, reduce their impact, or shorten the time to resolution. However, it’s impossible to avoid them altogether. In the past, we asserted failures are a result of people’s mistakes. It was all about “the bad apple theory,” focused on finding the “guilty party” and removing them to prevent future failures.
  |  By Lee Atchison
When your organization faces outages, errors, security breaches, and other incidents, you need to have a plan in place to take appropriate actions as needed. However, you also need a capable team of experts filling critical roles and responsibilities to execute those actions and effectively collaborate to resolve issues quickly. An incident response team, therefore should be developed in a way that avoids skills gaps in expertise.
  |  By Lee Atchison
Automated incident management is the process of automating incident response to ensure that critical events are detected and addressed in the most efficient and consistent manner. In incident management, time is of the essence and the primary benefit of automated incident management is speed. With automation, you can accomplish time-consuming tasks much quicker. This brings down the incident response time and allows the team to focus their attention on matters that require their expertise.
  |  By Lee Atchison
In the world of modern businesses, where IT systems play a major role in all types of businesses, the role of the Site Reliability Engineer (SRE) has become central to managing the effectiveness and reliability of the entire business. SREs are the bridge between the rapid deployment of software and systems and the stable operation of those systems in a production environment. They ensure that reliability and performance criteria are defined and are met.
  |  By Blameless
Your first years following graduation are critical to finding the most lucrative and fulfilling career path. Here, we explore SRE (Site Reliability Engineer) vs SWE (Software Engineering) opportunities to help focus your career goals.
  |  By Lee Atchison
The National Institute of Standards and Technology (NIST) provides the framework to help businesses mitigate cybersecurity risks. The framework also protects networks and data, outlining best practices to inform decisions that save time and money. Creating a cybersecurity strategy that identifies, protects, detects, responds, and helps you recover from cybersecurity incidents is critical in the evolving threat landscape.
  |  By Lee Atchison
On July 26, 2023, the Securities and Exchange Commission (SEC) introduced new rules regarding cybersecurity risk management, strategy, governance, and incidents. Public companies subject to reporting requirements must comply with the changes to avoid rescission and other monetary penalties, not to mention the risk of legal action and reputation damage. Here, we look at the two new cybersecurity rules and how your company can comply. ‍
  |  By Lee Atchison
In the world of high-scale, high-availability, high-performance web applications, mistakes in IT operations are inevitable. Systems fail, bugs slip through, and outages occur. Your team's approach to responding to these incidents significantly impacts their overall productivity, morale, and effectiveness. Company culture, such as that associated with a blameless culture, is crucial to driving the behaviors that make your business a success.
  |  By Emily Arnott
If you’re reading the Blameless blog, you probably have a good idea of how important reliability is to your customers’ happiness, your business’s bottom line, and your overall sanity. Unfortunately, this perspective is frequently downplayed by management. Even if they understand the importance of reliability, they often see it as something that should emerge automatically from having the right mindset, and not something that requires investment.
  |  By Blameless
Join us on April 16th at 10 a.m. PT for a 60-minute live webinar, where we'll discuss the secrets to driving change in your organization. We'll tackle two of reliability's biggest issues: getting budget and garnering support. Join us for Unleashing the Change Maker Within at 10 a.m. PST. We'll show you how to empower yourself to drive organizational change. Discover the secrets to selling your boss on the tools you need to automate your workflow and streamline your processes. We'll equip you with the strategies and insights to turn your great ideas into actionable plans.
  |  By Blameless
The real secret to mastering engineering operations is putting engineers in the driver's seat. On March 26th at 10 am, Chris Karper, Sr. Director of Engineering at MyFitnessPal, joins Chief Reliability Officer, Lee Atchison to discuss how MyFitnessPal is overcoming incidents by giving power back to the engineers. They'll explore how Chris has navigated MyFitnessPal through its technological advancements, growth of the team, and the maturity of its incident management program.
  |  By Blameless
Join us for an exclusive webinar designed for IT Operations leaders, SREs, DevOps & software engineering leaders, featuring Jim Gochee, CEO of Blameless, Ken Gavranovic, COO of Blameless, and Nick Mason, Principal Sales Engineer at Blameless. Uncover the technical scaffolding essential to propel your incident management strategy forward, faster. Dive deep into the core technical components vital for a robust incident response framework, and discover firsthand how Generative AI can dramatically save hours for your team during critical incidents.
  |  By Blameless
Nick Mason highlights how Microsoft Teams user can put Blameless to work to make incident response less stressful and more efficient.
  |  By Blameless
We are excited to feature our COO Ken Gavranovic, with his rich experience at New Relic, Cox, and Web.com, our CEO Jim Gochee, who brings insights from his time at Apple and New Relic, and Lee Atchison, a seasoned expert from New Relic, Amazon, and AWS.
  |  By Blameless
Are you tired of putting out the same fire day after day? You're not alone. Engineering leaders from every industry are working tirelessly to evolve their approach to incident management and IT Operations. Each installment of our Fireside Series is a conversation with one of your peers. We'll get under the hood of their team's strategy for building and operating some category-defining products. Then, we'll use their experiences to build and expand a roadmap for how you can lead your own company's operational evolution.
  |  By Blameless
Jim Gochee, CEO of Blameless with a history at New Relic and Apple, Ken Gavranovic, COO of Blameless and an Amazon Best Selling Author with experiences at Cox, Web.Com, and Unqork, and Lee Atchison, Chief Reliability Officer at Blameless, noted for his work on Amazon BeanStalk and as the author of "Architecting for Scale," with roles at AWS, HP, and New Relic, will guide this session.
  |  By Blameless
The new SEC rule on material security breaches goes into effect on December 18, 2023 for larger publicly traded companies and all other public companies within 180 days. If you're not already in compliance, it’s important for you to prepare for the new rule now by developing a plan for incident response and disclosure.
  |  By Blameless
If your company operates in a modern digital environment, then there’s a good chance questionable reliability is hurting you competitively. On the other hand, every hour your engineering team spends on operations comes at the expense of developing your product. So, what are you supposed to do?
  |  By
Incidents are inevitable. As your service expands and becomes more complex, you are more likely to encounter outages, slowdowns, errors, and other disruptions to healthy operation. At the same time, as your service becomes more popular and relied on by users, the cost of incidents becomes higher. Studies have shown that the cost of downtime is high, and growing fast in the digital-first world. Since you can never fully prevent incidents, it's important to resolve them as efficiently as possible.
  |  By
The eBook dives into the 4 main investment areas required to make an impact. You'll learn how to achieve measurable results for the benefit of your customers and your valuable engineers.
  |  By
SRE's Golden Signals are four key metrics used to monitor the health of your service and underlying systems. We will explain what they are, and how they can help you improve service performance.
  |  By
This eBook gives you practical steps to implementing SRE practices in an organization that's already invested in DevOps. It outlines the clear benefits and lays out how they can be achieved. Three main topics are covered: Incident Management, Service Level Objectives (SLOs), and SRE Culture. Leveling up these critical aspects of SRE will reap both immediate and long-term benefits.

Blameless offers the only complete reliability engineering platform that brings together AI-driven incident resolution, blameless postmortems, SLOs/Error Budgets, and reliability insights reports and dashboards, enabling businesses to optimize reliability and innovation.

Enabling modern software businesses to adopt SRE best practices:

  • Incident Resolution: Use AI to engage the right people and teams in the right way to stop problems fast, ensure customer satisfaction and prevent incidents from happening again.
  • Blameless Postmortems: Learn without pointing fingers, ensuring continuous improvements. We automatically bring relevant information, proper context and industry best practices to your postmortem process.
  • SLOs/Error Budgets: Create SLOs and see your remaining error budgets with the SLO dashboard. Teams gain insight into what parts of the business are consuming the error budget, allowing them to make informed decisions between releasing new features and reliability.
  • Reliability Insights: Blameless will allow your business to consume event data across your entire DevOps stack, query the data, and create custom dashboards, meaning teams can quickly find signals amongst their DevOps data noise.

The Complete Site Reliability Engineering (SRE) Platform.