Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Best Practices for Effective Incident Management

Incident management is a set of processes used by operations teams to respond to latency or downtime, and return a service to its normal state. Incident management practices have long been well-defined through frameworks such as ITIL, but as software systems become more complex, teams increasingly need to adapt their incident management processes accordingly.

Announcing our new integration with GoToMeeting

Communication during incidents is critical. With the rise of remote work, war rooms are no longer the central hub for all incident communication. Instead, we’re adapting to these new challenges and embracing video conferencing and messaging software in order to stay in tight lock-step with our teammates and collaborators. With this in mind, we are excited to announce that Blameless is adding a new way for you to communicate even faster and more effectively.

A Journey Through Blameless from Incident to Success

Here at Blameless, every aspect of our product has SLOs (Service Level Objects) and error budgets in order to help us understand and improve customer experience. Sometimes, these error budgets are at risk, triggering an incident. While incidents are often painful, we treat them as unplanned investments, striving to learn as much as we can from them. We empower all of our engineers to handle an on-call rotation, no matter how difficult the issue.

SRE Leaders Panel: Work as Done vs Work as Imagined

Blameless recently had the privilege of hosting some fantastic leaders in the SRE and resilience community for a panel discussion. Our panelists discussed the effects of imposter syndrome especially during high tempo situations, how to use it to our advantage and overcome doubt, and how culture directly affects the availability of our systems. The transcript below has been lightly edited, and if you’re interested in watching the full panel, you can do so here.

Introducing Blameless Service Level Objectives

Over a year ago, Blameless launched the industry’s first end-to-end SRE platform to help software teams innovate without sacrificing reliability. As Service Level Objectives (SLOs) provide an anchor for reliability targets and corresponding decisions, they are the foundational step toward helping teams truly adopt SRE best practices. Today, we are very excited to announce our new SLO platform, giving teams a shared language on how to focus their engineering efforts.

Fostering Psychological Safety in Remote Teams

Psychological safety is a crucial component of any organization’s culture. Psychologically safe organizations are free to create, discuss, disagree, take risks, and make mistakes. These organizations are often the ones we see as key innovators in their unique industries. In other words, cultivating a culture of psychological safety is paramount in order to succeed.