San Mateo, CA, USA
Aug 3, 2021   |  By Emily Arnott
On the heels of our Microsoft Teams integration release to streamline incident management, we’re excited to share that we now support Microsoft Teams Video capabilities. We generate Microsoft Teams video conference links for each Blameless incident for fast and easy collaboration. Microsoft Teams Video joins Zoom, Google Meet, and GoToMeeting in our video integration suite.
Jul 29, 2021   |  By Blameless
SRE’s Golden Signals are four key metrics used to monitor the health of your service and underlying systems. We will explain what they are, and how they can help you improve service performance.
Jul 27, 2021   |  By Lyon Wong
When Blameless started in 2018, the team set out on a mission to help all engineers achieve reliability with less toil and risk. Three years in, that mission has become more important than ever. What has changed is the rate of SRE adoption, now the fastest growing team and practice inside engineering. This represents a clear recognition of the many upsides that an SRE practice brings with its combination of continuous learning, velocity, and resilience.
Jul 21, 2021   |  By Blameless
Wondering what the difference is between observability and monitoring? In this post, we explain how they are related, why they are important, and some suggested tools that can help. The difference between observability and monitoring is that observability is the ability to understand a system’s state from its outputs, often referred to as understanding the “unknown unknowns”.
Jul 13, 2021   |  By Noor-ul-Anam Ruqayya
Do blameless retrospectives (or postmortems) help your team? We will explain what they are, if they really work, and how to do them right. A blameless postmortem (or retrospective) is a post-incident document that helps teams figure out why an incident happened, and brainstorm how to improve the process to prevent similar incidents from happening again. In most engineering organizations, everyone agrees that in complex systems, failure is inevitable.
Jul 8, 2021   |  By Blameless Community
Error Budgets That Work for You. Plus Support for New Relic Metrics and NR Query Language Did you know that error budget policy is the key to making SLOs actionable? In fact, Twitter’s engineering team did not successfully adopt SLOs until they introduced error budgets. SLOs enable teams to quantify customer happiness, and error budgets enable teams to make data-backed tradeoffs between reliability and feature velocity. We believe that teams optimizing for reliability must adopt both.
Jul 1, 2021   |  By Christina Tan
We’ve always advocated that every company can benefit from a blameless culture . Fostering a blameless culture can profoundly boost your organization in powerful ways, from employee retention to developer velocity and innovation. However, there’s an elephant in the room when we talk about blamelessness with executives: accountability. When things go wrong, people still need to get fired, right?
Jun 23, 2021   |  By Blameless Community
‍Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Kurt Andersen. Kurt is a practitioner and an active thought leader in the SRE community. He speaks at major DevOps & SRE conferences and publishes his work through O'Reilly in quintessential SRE books such as Seeking SRE, What is SRE?, and 97 Things Every SRE Should Know.
Jun 22, 2021   |  By Blameless Community
Hoping you're headed towards a fun summer season and some time without masks. Let's avoid a new kind of tan-line! This newsletter shares useful industry content and an exciting Blameless product announcement. Find our fave tweets and events in the SRE and resilience engineering community. We're hiring! Check out the job openings here.
Jun 11, 2021   |  By Noor-ul-Anam Ruqayya
Wondering what Service Level Objectives (SLOs) are? In this article, we will explain service level objectives and how they relate to SLAs, SLIs, and error budgets. A Service Level Objective (SLO) is a reliability target, measured by a Service Level Indicator (SLI) and sometimes serves as a safeguard for a Service Level Agreement (SLA). SLOs represent customer happiness and guide the development team’s velocity.

Blameless offers the only complete reliability engineering platform that brings together AI-driven incident resolution, blameless postmortems, SLOs/Error Budgets, and reliability insights reports and dashboards, enabling businesses to optimize reliability and innovation.

Enabling modern software businesses to adopt SRE best practices:

  • Incident Resolution: Use AI to engage the right people and teams in the right way to stop problems fast, ensure customer satisfaction and prevent incidents from happening again.
  • Blameless Postmortems: Learn without pointing fingers, ensuring continuous improvements. We automatically bring relevant information, proper context and industry best practices to your postmortem process.
  • SLOs/Error Budgets: Create SLOs and see your remaining error budgets with the SLO dashboard. Teams gain insight into what parts of the business are consuming the error budget, allowing them to make informed decisions between releasing new features and reliability.
  • Reliability Insights: Blameless will allow your business to consume event data across your entire DevOps stack, query the data, and create custom dashboards, meaning teams can quickly find signals amongst their DevOps data noise.

The Complete Site Reliability Engineering (SRE) Platform.