Blameless

San Mateo, CA, USA
2017
  |  By Lee Atchison
In the world of high-scale, high-availability, high-performance web applications, mistakes in IT operations are inevitable. Systems fail, bugs slip through, and outages occur. Your team's approach to responding to these incidents significantly impacts their overall productivity, morale, and effectiveness. Company culture, such as that associated with a blameless culture, is crucial to driving the behaviors that make your business a success.
  |  By Emily Arnott
If you’re reading the Blameless blog, you probably have a good idea of how important reliability is to your customers’ happiness, your business’s bottom line, and your overall sanity. Unfortunately, this perspective is frequently downplayed by management. Even if they understand the importance of reliability, they often see it as something that should emerge automatically from having the right mindset, and not something that requires investment.
  |  By Emily Arnott
When you’re in the thick of an incident, communication is both essential and challenging. A wide variety of stakeholders will need timely updates on the situation in order to respond effectively. At the same time, breaking away from the actual diagnostic and resolving work to send these updates can massively slow progress.
  |  By Emily Arnott
Only emerging into the mainstream in the 2010s, SRE is a relatively new discipline in tech. It’s been rapidly adopted by a widening variety of organizations, implementing constantly evolving practices. For the last six years, Catchpoint has been running a survey to take the temperature of the latest developments and trends. Check out the full report here, and read on to see our analysis on five key takeaways.
  |  By Emily Arnott
‍ Let’s call this the mother of all understatements. If you’re reading this blog, there’s a good chance that you: ‍ a.) Agree wholeheartedly with this sentiment and think it should go without saying, AND… b.) Are surrounded by folks who pay lip service to this idea while not taking it as seriously as they should.
  |  By Blameless Community
SRE’s Golden Signals are four key metrics used to monitor the health of your service and underlying systems. We will explain what they are, and how they can help you improve service performance.
  |  By Emily Arnott
No company plans for a security breach, major outage, or other cyber incident, but they happen. When an incident occurs, having a standardized, regulated method of managing the fallout is critical. This is where the incident response life cycle comes in ‍
  |  By Emily Arnott
If you work in eCommerce, you can see the storm on the horizon. Black Friday, the biggest shopping day of the year both online and off, is only a few days away. Your services are going to hit usage spikes you possibly have never seen before. And it will be all aspects of your services pushed to your limit – people won’t just be searching, or just buying, or signing up for programs, they’ll be doing all of these at once. ‍ Most crucially, everyone else is offering deals too.
  |  By Emily Arnott
When you think about making your service reliable, what standards and benchmarks are most important? The availability of services? Consistently fast responses? Accurate data? Prioritizing critical and common use cases? These are all important and deserve some focus, but today we’ll put the spotlight on an often overlooked pillar: security. ‍ Cybersecurity incidents can be the most devastating types of incident for your organization.
  |  By Emily Arnott
The Securities and Exchanges Commission published new rules for SEC registrants around disclosing incident details and response policies. Compliance with these new rules should be top of mind for any company – even if your org hasn’t hit the milestone of registering with the SEC, you should be prepared to be compliant when you take that step. ‍
  |  By Blameless
We are excited to feature our COO Ken Gavranovic, with his rich experience at New Relic, Cox, and Web.com, our CEO Jim Gochee, who brings insights from his time at Apple and New Relic, and Lee Atchison, a seasoned expert from New Relic, Amazon, and AWS.
  |  By Blameless
Are you tired of putting out the same fire day after day? You're not alone. Engineering leaders from every industry are working tirelessly to evolve their approach to incident management and IT Operations. Each installment of our Fireside Series is a conversation with one of your peers. We'll get under the hood of their team's strategy for building and operating some category-defining products. Then, we'll use their experiences to build and expand a roadmap for how you can lead your own company's operational evolution.
  |  By Blameless
Jim Gochee, CEO of Blameless with a history at New Relic and Apple, Ken Gavranovic, COO of Blameless and an Amazon Best Selling Author with experiences at Cox, Web.Com, and Unqork, and Lee Atchison, Chief Reliability Officer at Blameless, noted for his work on Amazon BeanStalk and as the author of "Architecting for Scale," with roles at AWS, HP, and New Relic, will guide this session.
  |  By Blameless
The new SEC rule on material security breaches goes into effect on December 18, 2023 for larger publicly traded companies and all other public companies within 180 days. If you're not already in compliance, it’s important for you to prepare for the new rule now by developing a plan for incident response and disclosure.
  |  By Blameless
If your company operates in a modern digital environment, then there’s a good chance questionable reliability is hurting you competitively. On the other hand, every hour your engineering team spends on operations comes at the expense of developing your product. So, what are you supposed to do?
  |  By Blameless
Revolutionizing Business: The Rise of Generative AI - Actionable Strategies to Integrate Advanced AI Seamlessly into Your Engineering Operations.
  |  By Blameless
  |  By Blameless
The Blameless retrospective is one of the most often discussed and rarely executed components of the SRE practice. Getting real value from the retrospective process takes time, focus and the right approach. This webinar features Ken Gavranovic and author of Architecting For Scale Lee Atchison, where they discuss the blueprint for high-performing engineering teams to maximize the value of retrospectives.
  |  By Blameless
Assemble the right team for incident management fast with the new bidirectional integration of Blameless and OpsGenie. In this 30-minute live webinar, Blameless's Aaron Lober, Paul Chu, and Nicolas Philip show you how to seamlessly connect your alerting and service registry to your incident response processes. Webinar includes a live demo.
  |  By
Incidents are inevitable. As your service expands and becomes more complex, you are more likely to encounter outages, slowdowns, errors, and other disruptions to healthy operation. At the same time, as your service becomes more popular and relied on by users, the cost of incidents becomes higher. Studies have shown that the cost of downtime is high, and growing fast in the digital-first world. Since you can never fully prevent incidents, it's important to resolve them as efficiently as possible.
  |  By
The eBook dives into the 4 main investment areas required to make an impact. You'll learn how to achieve measurable results for the benefit of your customers and your valuable engineers.
  |  By
SRE's Golden Signals are four key metrics used to monitor the health of your service and underlying systems. We will explain what they are, and how they can help you improve service performance.
  |  By
This eBook gives you practical steps to implementing SRE practices in an organization that's already invested in DevOps. It outlines the clear benefits and lays out how they can be achieved. Three main topics are covered: Incident Management, Service Level Objectives (SLOs), and SRE Culture. Leveling up these critical aspects of SRE will reap both immediate and long-term benefits.

Blameless offers the only complete reliability engineering platform that brings together AI-driven incident resolution, blameless postmortems, SLOs/Error Budgets, and reliability insights reports and dashboards, enabling businesses to optimize reliability and innovation.

Enabling modern software businesses to adopt SRE best practices:

  • Incident Resolution: Use AI to engage the right people and teams in the right way to stop problems fast, ensure customer satisfaction and prevent incidents from happening again.
  • Blameless Postmortems: Learn without pointing fingers, ensuring continuous improvements. We automatically bring relevant information, proper context and industry best practices to your postmortem process.
  • SLOs/Error Budgets: Create SLOs and see your remaining error budgets with the SLO dashboard. Teams gain insight into what parts of the business are consuming the error budget, allowing them to make informed decisions between releasing new features and reliability.
  • Reliability Insights: Blameless will allow your business to consume event data across your entire DevOps stack, query the data, and create custom dashboards, meaning teams can quickly find signals amongst their DevOps data noise.

The Complete Site Reliability Engineering (SRE) Platform.