Blameless

San Mateo, CA, USA
2017
  |  By Blameless
The age of information technology has rapidly expanded to include a wide range of necessary roles to manage and optimize operational frameworks. Site Reliability Engineers (SREs), Development Operations (DevOps), and Platform Engineers have become invaluable within this digital landscape. Here, you’ll learn more about each role, how they differ, and what they bring to the table.
  |  By Blameless
Incidents are unavoidable in software development and IT. As a Site Reliability Engineer (SRE), one of the tools you’ll use frequently is an incident timeline. The incident timeline provides a real-time report on any incident, including alerts, system updates, issue severity changes, manual chat entries, and more.
  |  By Lee Atchison
In the fast-paced world of technology, innovation is not just a buzzword—it's a necessity. As organizations strive to stay ahead of the curve and deliver cutting-edge solutions, they must foster a culture that empowers engineers to drive change and lead transformative projects. Throughout my career, I have witnessed firsthand the impact that empowered engineers can have on an organization, and I believe that unlocking their potential is key to achieving long-term success.
  |  By Lee Atchison
In the fast-paced world of IT operations, the culture permeating an organization is critical to its success. It drives behavior, efficiency, and organizational accomplishment. A blame-centric culture is particularly detrimental, creating an environment where finger-pointing is more important than problem-solving and fear reduces innovation. This negative culture damages individual morale and erodes the organization's collective resilience.
  |  By Lee Atchison
In most engineering organizations, everyone agrees that in complex systems, failure is inevitable. It’s possible to prevent the recurrence of certain incidents, reduce their impact, or shorten the time to resolution. However, it’s impossible to avoid them altogether. In the past, we asserted failures are a result of people’s mistakes. It was all about “the bad apple theory,” focused on finding the “guilty party” and removing them to prevent future failures.
  |  By Lee Atchison
When your organization faces outages, errors, security breaches, and other incidents, you need to have a plan in place to take appropriate actions as needed. However, you also need a capable team of experts filling critical roles and responsibilities to execute those actions and effectively collaborate to resolve issues quickly. An incident response team, therefore should be developed in a way that avoids skills gaps in expertise.
  |  By Lee Atchison
Automated incident management is the process of automating incident response to ensure that critical events are detected and addressed in the most efficient and consistent manner. In incident management, time is of the essence and the primary benefit of automated incident management is speed. With automation, you can accomplish time-consuming tasks much quicker. This brings down the incident response time and allows the team to focus their attention on matters that require their expertise.
  |  By Lee Atchison
In the world of modern businesses, where IT systems play a major role in all types of businesses, the role of the Site Reliability Engineer (SRE) has become central to managing the effectiveness and reliability of the entire business. SREs are the bridge between the rapid deployment of software and systems and the stable operation of those systems in a production environment. They ensure that reliability and performance criteria are defined and are met.
  |  By Blameless
Your first years following graduation are critical to finding the most lucrative and fulfilling career path. Here, we explore SRE (Site Reliability Engineer) vs SWE (Software Engineering) opportunities to help focus your career goals.
  |  By Lee Atchison
The National Institute of Standards and Technology (NIST) provides the framework to help businesses mitigate cybersecurity risks. The framework also protects networks and data, outlining best practices to inform decisions that save time and money. Creating a cybersecurity strategy that identifies, protects, detects, responds, and helps you recover from cybersecurity incidents is critical in the evolving threat landscape.
  |  By Blameless
Hello, Innovators! If you've ever believed in the potential for change within your organization but weren’t sure how to advocate for it, this webinar is designed with you in mind. "Unleashing the Change Maker Within: Secrets to Driving Change in Your Organization” is not just another webinar; it's a beacon for engineers, SREs, and tech enthusiasts eager to make a tangible difference in their companies.
  |  By Blameless
Join us on April 16th at 10 a.m. PT for a 60-minute live webinar, where we'll discuss the secrets to driving change in your organization. We'll tackle two of reliability's biggest issues: getting budget and garnering support. Join us for Unleashing the Change Maker Within at 10 a.m. PST. We'll show you how to empower yourself to drive organizational change. Discover the secrets to selling your boss on the tools you need to automate your workflow and streamline your processes. We'll equip you with the strategies and insights to turn your great ideas into actionable plans.
  |  By Blameless
The real secret to mastering engineering operations is putting engineers in the driver's seat. On March 26th at 10 am, Chris Karper, Sr. Director of Engineering at MyFitnessPal, joins Chief Reliability Officer, Lee Atchison to discuss how MyFitnessPal is overcoming incidents by giving power back to the engineers. They'll explore how Chris has navigated MyFitnessPal through its technological advancements, growth of the team, and the maturity of its incident management program.
  |  By Blameless
Join us for an exclusive webinar designed for IT Operations leaders, SREs, DevOps & software engineering leaders, featuring Jim Gochee, CEO of Blameless, Ken Gavranovic, COO of Blameless, and Nick Mason, Principal Sales Engineer at Blameless. Uncover the technical scaffolding essential to propel your incident management strategy forward, faster. Dive deep into the core technical components vital for a robust incident response framework, and discover firsthand how Generative AI can dramatically save hours for your team during critical incidents.
  |  By Blameless
Nick Mason highlights how Microsoft Teams user can put Blameless to work to make incident response less stressful and more efficient.
  |  By Blameless
We are excited to feature our COO Ken Gavranovic, with his rich experience at New Relic, Cox, and Web.com, our CEO Jim Gochee, who brings insights from his time at Apple and New Relic, and Lee Atchison, a seasoned expert from New Relic, Amazon, and AWS.
  |  By Blameless
Are you tired of putting out the same fire day after day? You're not alone. Engineering leaders from every industry are working tirelessly to evolve their approach to incident management and IT Operations. Each installment of our Fireside Series is a conversation with one of your peers. We'll get under the hood of their team's strategy for building and operating some category-defining products. Then, we'll use their experiences to build and expand a roadmap for how you can lead your own company's operational evolution.
  |  By Blameless
Jim Gochee, CEO of Blameless with a history at New Relic and Apple, Ken Gavranovic, COO of Blameless and an Amazon Best Selling Author with experiences at Cox, Web.Com, and Unqork, and Lee Atchison, Chief Reliability Officer at Blameless, noted for his work on Amazon BeanStalk and as the author of "Architecting for Scale," with roles at AWS, HP, and New Relic, will guide this session.
  |  By Blameless
The new SEC rule on material security breaches goes into effect on December 18, 2023 for larger publicly traded companies and all other public companies within 180 days. If you're not already in compliance, it’s important for you to prepare for the new rule now by developing a plan for incident response and disclosure.
  |  By
Incidents are inevitable. As your service expands and becomes more complex, you are more likely to encounter outages, slowdowns, errors, and other disruptions to healthy operation. At the same time, as your service becomes more popular and relied on by users, the cost of incidents becomes higher. Studies have shown that the cost of downtime is high, and growing fast in the digital-first world. Since you can never fully prevent incidents, it's important to resolve them as efficiently as possible.
  |  By
The eBook dives into the 4 main investment areas required to make an impact. You'll learn how to achieve measurable results for the benefit of your customers and your valuable engineers.
  |  By
SRE's Golden Signals are four key metrics used to monitor the health of your service and underlying systems. We will explain what they are, and how they can help you improve service performance.
  |  By
This eBook gives you practical steps to implementing SRE practices in an organization that's already invested in DevOps. It outlines the clear benefits and lays out how they can be achieved. Three main topics are covered: Incident Management, Service Level Objectives (SLOs), and SRE Culture. Leveling up these critical aspects of SRE will reap both immediate and long-term benefits.

Blameless offers the only complete reliability engineering platform that brings together AI-driven incident resolution, blameless postmortems, SLOs/Error Budgets, and reliability insights reports and dashboards, enabling businesses to optimize reliability and innovation.

Enabling modern software businesses to adopt SRE best practices:

  • Incident Resolution: Use AI to engage the right people and teams in the right way to stop problems fast, ensure customer satisfaction and prevent incidents from happening again.
  • Blameless Postmortems: Learn without pointing fingers, ensuring continuous improvements. We automatically bring relevant information, proper context and industry best practices to your postmortem process.
  • SLOs/Error Budgets: Create SLOs and see your remaining error budgets with the SLO dashboard. Teams gain insight into what parts of the business are consuming the error budget, allowing them to make informed decisions between releasing new features and reliability.
  • Reliability Insights: Blameless will allow your business to consume event data across your entire DevOps stack, query the data, and create custom dashboards, meaning teams can quickly find signals amongst their DevOps data noise.

The Complete Site Reliability Engineering (SRE) Platform.