San Mateo, CA, USA
Nov 18, 2020   |  By Hannah Culver
Black Friday—we all know what it looks like. Hundreds of people swarming stores after Thanksgiving, jostling for the best deals. But in light of COVID-19, this arrangement could be dangerous. Over the last few years, Black Friday has become a digital event, and this year should be even moreso. According to Forbes writer Richard Kestenbaum, “88% of global consumers told a Visa study they’re planning to buy gifts this holiday season.” Yet “only 20% of U.S.
Nov 17, 2020   |  By Blameless Community
In a recent fireside chat with Mohan Bhatkar, Head of Engineering for the Customer Reliability Platform at Mercari, Inc. sat down with Blameless Co-Founder Ashar Rizqi. They talked about scaling while avoiding silos, exciting day-to-day challenges, instilling a culture of empowerment, and more. Here are their top insights and the lightly edited transcript of their conversation.
Nov 16, 2020   |  By Blameless Community
At Blameless, we value every opportunity to learn. Whether it’s taking time on Focus Fridays to attend a cool webinar, or conducting retrospectives for incidents, lost deals, events, and more, learning is core to our mission. To learn even more about our craft, we decided to start a book club at Blameless. People from every team (engineering, sales, SRE, marketing, product, people, and more) attended.
Nov 10, 2020   |  By Blameless Community
We’re drinking Pumpkin Spice Lattes, lighting candles, and wearing flannel. Oh, and reading a bunch of great stuff. Here’s the November issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.
Nov 4, 2020   |  By Ancy Dow
PagerDuty is a leading on-call management platform that aggregates monitoring and alerting data, notifies on-call teams, and accelerates incident resolution. The platform is used by thousands of teams responsible for software experiences. It integrates incident triage with rapid responder mobilization, so teams can resolve incidents in real time.
Nov 2, 2020   |  By Chris Hendrix
Metrics are the golden ticket to knowing what’s going on with your system… or so everyone thinks. But there can be too much of a good thing. Are your metrics really doing you any favors? Are they letting you see into what your customers truly want from you? If not, you might have a problem. You might be fetishizing your metrics. The good news is you’re definitely not alone
Oct 28, 2020   |  By Blameless Community
Blameless recently had the pleasure of interviewing Yury Niño Roa, Site Reliability Engineer, Solutions Architect and Chaos Engineering Advocate at ADL Digital Labs. She’s worked in roles ranging from solutions architect, to software engineering professor, to DevOps engineer, to SRE. Additionally, Yury is an avid blogger and conference speaker who regularly presents at events such as Chaos Conf, DevOpsDays Bogotá, and more.
Oct 27, 2020   |  By Emily Arnott
Onboarding is an essential yet challenging part of the hiring process. As your organization matures, more of its processes become unique. This makes it harder for new employees to get up to speed. Investing in custom processes and tooling to achieve your specific goals is a valuable practice. But, you must balance this with an investment in onboarding.
Oct 26, 2020   |  By Hannah Culver
Atlassian JIRA, one of the most popular ticketing systems, allows teams to catalogue incidents, follow-up actions, bugs, stories, and more. As a common tool in any DevOps/SRE operation’s toolchain, JIRA is a key integration at Blameless. Blameless’ integration with JIRA allows teams to automatically generate a ticket within both Blameless and JIRA. This integration also allows teams to track follow-up actions via Blameless’ postmortem tool.
Oct 19, 2020   |  By Emily Arnott
Adopting SRE principles into your organization can be a big undertaking. You’ll need to develop new practices and procedures to minimize the costs of incident coordination. You’ll need to create a retrospective process that encourages continuous learning. You’ll need to shift culture to begin appreciating failure as an opportunity to grow. Your transition to the world of SRE will also require buy-in from all levels of your organization.

Blameless offers the only complete reliability engineering platform that brings together AI-driven incident resolution, blameless postmortems, SLOs/Error Budgets, and reliability insights reports and dashboards, enabling businesses to optimize reliability and innovation.

Enabling modern software businesses to adopt SRE best practices:

  • Incident Resolution: Use AI to engage the right people and teams in the right way to stop problems fast, ensure customer satisfaction and prevent incidents from happening again.
  • Blameless Postmortems: Learn without pointing fingers, ensuring continuous improvements. We automatically bring relevant information, proper context and industry best practices to your postmortem process.
  • SLOs/Error Budgets: Create SLOs and see your remaining error budgets with the SLO dashboard. Teams gain insight into what parts of the business are consuming the error budget, allowing them to make informed decisions between releasing new features and reliability.
  • Reliability Insights: Blameless will allow your business to consume event data across your entire DevOps stack, query the data, and create custom dashboards, meaning teams can quickly find signals amongst their DevOps data noise.

The Complete Site Reliability Engineering (SRE) Platform.